Python Pandas For Your Grandpa | Section 2.7 | Series Boolean Indexing

Course Contents

  1. Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Operations
    2.3 Series Basic Indexing
    2.4 Series Overwriting Data
    2.5 Series Apply
    2.6 Series Concatenation
    2.7 Series Boolean Indexing
    2.8 Series View Vs Copy
    2.9 Series Missing Values
    2.10 Series Challenges

import numpy as np
import pandas as pd

Just like NumPy arrays, you can subset a pandas Series using a boolean index. For example, if you have the Series

foo = pd.Series([20, 50, 11, 45, 17, 31])

and you check foo < 20, you’ll get back a corresponding series of boolean values.

lt20 = foo < 20
lt20
## 0    False
## 1    False
## 2     True
## 3    False
## 4     True
## 5    False
## dtype: bool

Then you can use this Series of boolean values to subset foo via foo.loc[lt20], the result of which is the subset of foo with elements less than 20.

foo.loc[lt20]
## 2    11
## 4    17
## dtype: int64

Or if you wanted to avoid the intermediate step, you can do a one-liner like

foo.loc[foo < 20]
## 2    11
## 4    17
## dtype: int64

Now, you might be thinking that this operation works something like this:

For each element in lt20, if the value is True, keep the corresponding element of foo otherwise exclude it.

And you’d kind of be right. But watch what happens if we swap index labels 4 and 5 in foo and then we do the same exact boolean subset using lt20.

foo.index = [0, 1, 2, 3, 5, 4]
foo.loc[lt20]
## 2    11
## 4    31
## dtype: int64

This time, the result includes 31 and excludes 17. That’s because foo.loc is looking for the elements of foo whose index label matches those of lt20 where lt20 has a True value.

Usually this behavior is fine, but in some cases it might not be what you want and if you’d rather just include or exclude the corresponding values of foo by the position of True and False elements of lt20, just use the underlying NumPy array to subset foo since the underlying NumPy array doesn’t have index labels.

foo.loc[lt20.to_numpy()]
## 2    11
## 5    17
## dtype: int64

Combining Boolean Series

If you want to combine boolean Series together, you can do that too using an ampersand for ‘and’ and a pipe for ‘or’. for example, you can build complex logical subsets like

foo.loc[(foo < 40) & (foo > 20)]
## 4    31
## dtype: int64

and

foo.loc[(foo > 40) | (foo < 20)]
## 1    50
## 2    11
## 3    45
## 5    17
## dtype: int64

You can also negate a boolean series with a tilde like

foo.loc[~(foo % 10 == 0)]
## 2    11
## 3    45
## 5    17
## 4    31
## dtype: int64

Just make sure you wrap each condition in parentheses, otherwise the interpreter will read things in the wrong order and you’ll probably get an error.

foo.loc[foo > 40 | foo < 20]  # error