Share on:

Python Pandas For Your Grandpa | Section 2.2 | Series Basic Operations
March 18, 2020

Table Of Contents

  1. Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Operations
    2.3 Series Basic Indexing
    2.4 Series Overwriting Data
    2.5 Series Apply
    2.6 Series Concatenation
    2.7 Series Boolean Indexing
    2.8 Series View Vs Copy
    2.9 Series Missing Values
    2.10 Series Challenges

import numpy as np
import pandas as pd

Suppose you have the following Series of integers, [1,2,3,4], and you add a scalar to it.

x = pd.Series([1, 2, 3, 4])
x + 1
## 0    2
## 1    3
## 2    4
## 3    5
## dtype: int64

Just like NumPy, pandas uses broadcasting to add the scalar to each element of the Series. But watch what happens when we wrap that scalar into a Series before adding it.

x + pd.Series(1)
## 0    2.0
## 1    NaN
## 2    NaN
## 3    NaN
## dtype: float64

Strange.. the result is a Series of length 4 where the first element is clearly the sum of the 1st elements of the inputs, but the remaining 4 elements are NaN. It turns out that Series arithmetic is fundamentally different than NumPy arithmetic. The difference has to do with that pesky index thing we’ve been avoiding.

When you add two Series a and b, pandas only combines elements with the same index label. Looking back at our example, the temporary Series we created, pd.Series(1), automatically gets an index label of 0, so it only gets added to elements of x which also have an index label of 0.

y = pd.Series(1)
x + y
## 0    2.0
## 1    NaN
## 2    NaN
## 3    NaN
## dtype: float64

This behavior causes a lot of confusion because when you create a Series from scratch, pandas automatically gives the Series a sequential index starting from 0. So if you make two corresponding Series, a and b, and you add them together, it looks like the addition is happening element-wise by position.

a = pd.Series([10, 20, 30, 40, 50])
b = pd.Series([1, 2, 3, 4, 5])
a + b
## 0    11
## 1    22
## 2    33
## 3    44
## 4    55
## dtype: int64

But actually this isn’t the case. It’s easy to see if you reverse the index values of one of the arrays before adding them together.

a.index = np.array([4, 3, 2, 1, 0])
a + b
## 0    51
## 1    42
## 2    33
## 3    24
## 4    15
## dtype: int64

The same behavior holds true for subtraction, multiplication, division, and other basic math operations. We’ll take a closer look at indexes and why they’re useful shortly, but what if you didn’t want this behavior? In other words, what if you just wanted to add Series a and b element-wise by position? The answer is a recurring theme in this course - drop down to the NumPy level and do it there.

For example, if we do a.to_numpy() + b.to_numpy(), we’ll simply be adding two NumPy arrays, and addition between NumPy arrays happens element-wise by position, as you’d expect. Of course, the result will also be a NumPy array so you’ll have to convert it back to a Series if that’s what you want.

a.to_numpy() + b.to_numpy()
## array([11, 22, 33, 44, 55])
pd.Series(a.to_numpy() + b.to_numpy())
## 0    11
## 1    22
## 2    33
## 3    44
## 4    55
## dtype: int64

Alternatively, if you add a Series to a NumPy array, the addition happens element-wise by position and the result will be a Series with the same index labels as the original Series. So, here we do a + b.to_numpy() and it works just like the last example, except a’s index values are retained in the result.

a + b.to_numpy()
## 4    11
## 3    22
## 2    33
## 1    44
## 0    55
## dtype: int64

Also worth discussing - when we added pd.Series([1, 2, 3, 4]) + pd.Series(1) up above, you may have noticed the result was a Series of floats, even though both inputs were Series of ints. This is because the result had NaN values and NaNs only exist in NumPy as a floating point constant, so Pandas was forced to coerce the result from ints to floats. That behavior may change soon though, since Pandas recently released its own Nullable integer data type.


comments powered by Disqus