Contents

Python Pandas For Your Grandpa - 2.4 Series Boolean Indexing

In this section, we’ll see how to use boolean indexing to select values from a Series based on logical conditions. Just like NumPy arrays, you can subset a Pandas Series using a boolean index For example, if you have a Series of integers like this one called foo

import numpy as np
import pandas as pd

foo = pd.Series([20, 50, 11, 45, 17, 31])
print(foo)
## 0    20
## 1    50
## 2    11
## 3    45
## 4    17
## 5    31
## dtype: int64

if you check foo < 20, you’ll get back a corresponding Series of boolean values.

foo < 20
## 0    False
## 1    False
## 2     True
## 3    False
## 4     True
## 5    False
## dtype: bool

If you assign that Series to a variable called mask, you can use it to subset foo picking out values less than 20.

mask = foo < 20
foo.loc[mask]
## 2    11
## 4    17
## dtype: int64

Or if you wanted to avoid the intermediate step, you can do a one-liner like

foo.loc[foo < 20]
## 2    11
## 4    17
## dtype: int64

Now, you might think that the ith value in foo gets returned if the ith value in mask is True. And you’d kind of be right, but watch what happens if we swap the index labels, 4 and 5 in foo, and then we do the same exact boolean subset using mask.

foo.index = [0, 1, 2, 3, 5, 4]
foo.loc[mask]
## 2    11
## 4    31
## dtype: int64

This time, the result includes 31 instead of 17. That’s because foo.loc[mask] picks out the elements of foo whose index label matches those of mask where mask has a True value. Usually this is fine, but in some cases it might not be what you want and if you’d rather just include or exclude values of foo by corresponding positions of True and False values in mask, just use mask's underlying NumPy array to subset foo, like

foo.loc[mask.to_numpy()]
## 2    11
## 5    17
## dtype: int64

In this case the third and fifth values of mask are True, so we get back the third and fifth values of foo.

If you want to combine boolean Series together, you can do that too using an & for and and a | for or. Note than when you combine two boolean Series, Pandas matches and combines boolean values based on their index.

For example, suppose we have a Series called ages with the age of five people,

ages = pd.Series(
    data = [42, 43, 14, 18, 1],
    index = ['peter', 'lois', 'chris', 'meg', 'stewie']
)
print(ages)
## peter     42
## lois      43
## chris     14
## meg       18
## stewie     1
## dtype: int64

and a corresponding Series called genders with their gender.

genders = pd.Series(
    data = ['female', 'female', 'male', 'male', 'male'],
    index = ['lois', 'meg', 'chris', 'peter', 'stewie'],
    dtype = 'string'
)
print(genders)
## lois      female
## meg       female
## chris       male
## peter       male
## stewie      male
## dtype: string

Even though their indexes are in a different order, we can still answer questions like,

Who’s a male younger than 18?

mask = (genders == 'male') & (ages < 18)
mask.loc[mask]
## chris     True
## stewie    True
## dtype: bool

In this case, we make a Series to identify whether each person is a male, and a second Series to identify whether each person is younger than 18. Then we combine them with an ampersand - i.e. the elementwise and operator - to identify whether each person is a male and younger than 18. Then if we assign that to a variable called mask, we can index it with itself to get the names of males less than 18. In this case the names’ll be in the index.

We can also use the ~ operator to negate a boolean Series. So for example, if we do ~mask, we can determine “Who’s not a male and less than 18?". In other words, “Who is a female or is at least 18?".

~mask
## chris     False
## lois       True
## meg        True
## peter      True
## stewie    False
## dtype: bool

When you combine boolean Series, make sure you wrap each condition in parentheses, otherwise the interpreter will read things in the wrong order and you’ll probably get an error. For example if we try to determine people between 18 and 42 like this, we’ll get an error.

ages.loc[ages >= 18 & ages <= 42]  # ERROR

The solution here is just to wrap the conditions in parentheses like this.

ages.loc[(ages >= 18) & (ages <= 42)]
## peter    42
## meg      18
## dtype: int64

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts