Python Pandas For Your Grandpa - 2.3 Series Basic Operations

Contents

In this video, we’ll see how you can add two series together. And even though we’re only talking about addition, everything we discuss applies to other basic math operations like subtraction, multiplication, and division.

Suppose you have the following series of integers, 1 2 3 4, and you add a scalar to it.

import numpy as np
import pandas as pd

x = pd.Series([1, 2, 3, 4])
x + 1
## 0    2
## 1    3
## 2    4
## 3    5
## dtype: int64

Just like in NumPy, Pandas uses broadcasting to add the scalar to each element of the Series. But watch what happens when we wrap that scalar into a Series before adding it.

x + pd.Series(1)
## 0    2.0
## 1    NaN
## 2    NaN
## 3    NaN
## dtype: float64

Strange.. the result is a Series of length 4 where the first element is clearly the sum of the 1st elements of the inputs, but the remaining three elements are NaN. It turns out that Series arithmetic is fundamentally different than NumPy arithmetic. The difference has to do with the Series index. When you add two Series x and y, pandas only combines elements with the same index label. This process is called index alignment.

Looking back at our example, the temporary Series we created, pd.Series(1), automatically gets an index label of 0, so it only gets added to elements of x which also have an index label of 0..

pd.Series(1)
## 0    1
## dtype: int64

This behavior causes a lot of confusion because when you create a Series from scratch, Pandas automatically gives the Series a sequential index starting from 0. So if you make two corresponding Series, A and B, and you add them together, it looks like the addition is happening element-wise by position.

A = pd.Series([10, 20, 30, 40, 50])
B = pd.Series([1, 2, 3, 4, 5])
A + B
## 0    11
## 1    22
## 2    33
## 3    44
## 4    55
## dtype: int64

But that’s actually not what’s happening. It’s easy to see if you reverse A’s index before adding it to B.

A.index = np.array([4, 3, 2, 1, 0])
A + B
## 0    51
## 1    42
## 2    33
## 3    24
## 4    15
## dtype: int64

The same behavior holds true for subtraction, multiplication, division, and other basic math operations.

So, what if you didn’t want this behavior? What if you just wanna add A and B element-wise by position? The answer is a recurring theme in this course - drop down to the numpy level and do it there.

For example, if we do A.to_numpy() + B.to_numpy(), we’ll simply be adding two NumPy arrays, and addition between NumPy arrays happens element-wise by position. Of course, the result will also be a NumPy array so you’ll have to convert it back to a Series if that’s what you want.

pd.Series(A.to_numpy() + B.to_numpy())
## 0    11
## 1    22
## 2    33
## 3    44
## 4    55
## dtype: int64

Alternatively, if you add a Series to a NumPy array, the addition happens element-wise by position and the result will be a Series with the same index labels as the original Series. So, if we do A + B.to_numpy(), it works just like the last example except A’s index values are retained in the result.

A + B.to_numpy()
## 4    11
## 3    22
## 2    33
## 1    44
## 0    55
## dtype: int64

Alright, so what if we have Series x like before, and another Series y like this, and we want to add y to x based on matching index labels, but we don’t want the result to include NaN values, where x doesn’t have a matching index in y.

x = pd.Series([1, 2, 3, 4])
print(x)
## 0    1
## 1    2
## 2    3
## 3    4
## dtype: int64
y = pd.Series([10, 20], index = [2,1])
print(y)
## 2    10
## 1    20
## dtype: int64

In this case, we can do x.loc[y.index] to pick out elements of x we want to operate on,

x.loc[y.index]
## 2    3
## 1    2
## dtype: int64

And then += y to add y to those values.

x.loc[y.index] += y
print(x)
## 0     1
## 1    22
## 2    13
## 3     4
## dtype: int64