# Python Pandas For Your Grandpa - 2.6 Series Vectorization

Contents

In this section, we’ll learn about vectorization and why using natively built Series methods is usually better than rolling your own custom methods.

Suppose you have a series `x` with 1M random uniform values between 1 and 2, and you want to calculate its mean.

``````import numpy as np
import pandas as pd

x = pd.Series(np.random.uniform(low=1, high=2, size=10**6))
``````

You’re a confident, competent coder, so forget about using somebody else’s code - you’re gonna write your own function to calculate the average. So you come up with something like this.

``````def average(x):
avg = 0.0
for i in range(x.size):
avg += x.iloc[i]/x.size

return avg
``````

Now you give it a whirl, only to find out.. it’s slow as hell!

``````average(x)
## 1.4999500244410642
``````

By contrast, take a look at the built-in Series method, `mean()`.

``````x.mean()
## 1.4999500244410409
``````

It’s blazing fast. So, why is the built-in method so much faster than your custom function? The answer is vectorization.

It’s important to understand that Python loops are slow. With each iteration of a Python loop, you’re essentially giving your computer a new set of instructions to perform some task, so in this case it’s like you’re giving your computer 1M different sets of instructions.

By contrast, the Series `mean()` method is basically an alias for the NumPy `mean()` method, which is vectorized. In this case, NumPy hands off the entire array of data to a lower level function in C with a single set of instructions for what to do. C then executes those instructions on the entire array, calculating the mean, before handing the result back to Python.

To recap, Python loops are slow and you shouldn’t use them, except in some rare circumstances. Instead, opt for a vectorized solution using one of the many Pandas and NumPy methods that implement fast algorithms using C.