Contents

Python Pandas For Your Grandpa - 3.5 DataFrame apply()

Just as Series has an apply() method for applying some function to each element in a Series, DataFrame has an apply() method that let’s you apply a function to each row or column in a DataFrame. In this video, we’ll see how and when to use it.

We’ll start by making a very simple three-row, two-column DataFrame called df.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'A': [5.2, 1.7, 9.4],
    'B': [3.9, 4.0, 7.8]
})
print(df)
##      A    B
## 0  5.2  3.9
## 1  1.7  4.0
## 2  9.4  7.8

DataFrame’s apply() method has two primary arguments, func and axis. func tells apply() what function to use and axis tells apply() whether to apply the function along axis 0 (the row axis), or along axis 1 (the column axis). For example, if we call df.apply(func=np.sum, axis=0),

df.apply(func=np.sum, axis=0)
## A    16.3
## B    15.7
## dtype: float64

we get back a 2-element Series that’s the result of calling np.sum() on each column of df. If we do the same thing with axis=1, we get back a 3-element Series that’s the result of calling np.sum on each row of df.

df.apply(func=np.sum, axis=1)
## 0     9.1
## 1     5.7
## 2    17.2
## dtype: float64

Now suppose we have a DataFrame called kids like this one with mixed column types.

kids = pd.DataFrame({
    'name': pd.Series(['alice', 'mary', 'jimmy', 'johnny', 'susan'], dtype="string"),
    'age': [9, 13, 11, 15, 8],
    'with_adult': [True, False, False, True, True]
})

Our goal is to determine whether each child should be allowed in a haunted house. The rules for getting in the house are: you have to be at least 12 or you need to have adult supervision.

For tasks like these, apply() works great. In this case, we’ll start by making a function called is_allowed() that inputs age - a number, and with_adult - a boolean, and and returns a boolean indicating whether the kid is allowed to enter the haunted house.

def is_allowed(age, with_adult):
    return age >= 12 or with_adult

Now we’ll call kids.apply() passing in our function, is_allowed, and axis=1 because we want the function to be applied on a row-by-row basis.

kids.apply(is_allowed, axis=1)  # ERROR

Of course, this won’t work because haven’t told Pandas what values of each row to use for the arguments of our function. In fact, it’s not even clear what’s being passed into our function.

To understand apply() with axis = 1, let’s pick out the first row of our DataFrame.

row_0 = kids.iloc[0]
print(row_0)
## name          alice
## age               9
## with_adult     True
## Name: 0, dtype: object

What we get back is a Series. Now, you might remember me saying you can’t have a Series of mixed types, so how do we have a Series with a string, an int, and and a boolean? The answer is we don’t - we actually have a Series of pointers - i.e. memory addresses. That’s why the dtype is reported as ‘object’, because even though every pointer in the Series is an integer, what it’s pointing to could be any type of object in memory.

In any case, this Series is exactly what the apply() method uses for the function input. So, let’s modify our function to expect and operate on this type of input.

def is_allowed(kid_series):
    return kid_series.loc['age'] >= 12 or kid_series.loc['with_adult']

And now the same kids.apply() call we made earlier works.

kids.apply(is_allowed, axis=1)
## 0     True
## 1     True
## 2    False
## 3     True
## 4     True
## dtype: bool

But let’s not do that because it ruins a perfectly clean and generic is_allowed() function. Instead, let’s use a lambda function as a bridge between the Series input and our original is_allowed() function.

We’ll start by reverting our is_allowed() function back to what it was.

def is_allowed(age, with_adult):
    return age >= 12 or with_adult

And then we’ll say kids.apply(), and pass in lambda row: is_allowed(row.loc['age'], row.loc['with_adult'])

kids.apply(lambda row: is_allowed(row.loc['age'], row.loc['with_adult']), axis=1)
## 0     True
## 1     True
## 2    False
## 3     True
## 4     True
## dtype: bool

In this case we’re using lambda as a wrapper for our is_allowed() function.

Tacking that onto our kids DataFrame, we can see exactly whose allowed in our haunted house and who’s not.

kids['allowed'] = kids.apply(lambda row: is_allowed(row.loc['age'], row.loc['with_adult']), axis=1)
print(kids)
##      name  age  with_adult  allowed
## 0   alice    9        True     True
## 1    mary   13       False     True
## 2   jimmy   11       False    False
## 3  johnny   15        True     True
## 4   susan    8        True     True

Sorry Jimmy!


Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts