Python Pandas For Your Grandpa | Section 2.10 | Series Challenges

Course Contents

  1. Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Operations
    2.3 Series Basic Indexing
    2.4 Series Overwriting Data
    2.5 Series Apply
    2.6 Series Concatenation
    2.7 Series Boolean Indexing
    2.8 Series View Vs Copy
    2.9 Series Missing Values
    2.10 Series Challenges

import numpy as np
import pandas as pd

Strings

Given a Series of strings, count how many times these words occur: {football, shoe, dog}

# Setup
strings = pd.Series(['cat', 'shoe', 'car', 'dog', np.nan, 'football', 'shoe', 'football', 'cat'])

Solution

strings.value_counts().loc[['football', 'shoe', 'dog']]
## football    2
## shoe        2
## dog         1
## dtype: int64

value_counts() is a really handy method that counts the occurrence of each unique element in a Series. It spits back a Series of sorted frequencies, indexed by the unique elements in the original Series.


Double A

Given two Series of the same length, A and B, if the ith element of B is NaN, double the ith value of A.

# Setup
A = pd.Series([5, 20, 10, 3, 100, 2], index = [0, 2, 3, 1, 4, 5])
B = pd.Series([True, np.nan, False, np.nan, np.nan, False], index = [5, 2, 4, 1, 0, 3])

Solution

A.loc[pd.isna(B).to_numpy()] *= 2
A
## 0      5
## 2     40
## 3     10
## 1      6
## 4    200
## 5      2
## dtype: int64

Step 1: pd.isna(B) returns a boolean Series indicating which elements of B are nan
Step 2: to_numpy() converts that Series to a NumPy array
Step 3: Wrapping that with A.loc[] selects elements of A where corresponding array values are True
Step 4: *= 2 updates each element e equal to e * 2

The tricky part of this solution is knowing to use the to_numpy() method. Without to_numpy(), you’ll end up selecting elements of A whose index matches the index of True values in B, instead of elements of A whose position matches the position of True values of B.

# Selects element of A whose index matches a B index with a True value
A.loc[pd.isna(B)]
## 0     5
## 2    40
## 1     6
## dtype: int64
# Selects element of A whose position matches a B position with a True value
A.loc[pd.isna(B).to_numpy()]
## 2     40
## 1      6
## 4    200
## dtype: int64

Cars

Given a Series of car asking prices and another Series of car fair values, determine which cars for sale are a good buy. The result should be a list of integer indices corresponding to the good buys in asking_prices.

# Setup
asking_prices = pd.Series(
  data=[5000, 7500, 9000, 8500, 7000], 
  index=['escape', 'escape', 'ranger', 'mustang', 'mustang']
)
asking_prices
## escape     5000
## escape     7500
## ranger     9000
## mustang    8500
## mustang    7000
## dtype: int64
fair_prices = pd.Series(
  data=[5500, 7500, 7500], 
  index=['escape', 'mustang', 'ranger']
)
fair_prices
## escape     5500
## mustang    7500
## ranger     7500
## dtype: int64

Solution

Here’s one of many solutions.

Step 1: Create a new Series, fair_prices_tall, with the same index as asking_prices but whose values correspond to fair_prices.

fair_prices_tall = fair_prices.loc[asking_prices.index]
fair_prices_tall
## escape     5500
## escape     5500
## ranger     7500
## mustang    7500
## mustang    7500
## dtype: int64

Step 2: Calculate asking_prices - fair_prices_tall and store that into a variable called off_market_prices.

off_market_prices = asking_prices - fair_prices_tall
off_market_prices
## escape     -500
## escape     2000
## ranger     1500
## mustang    1000
## mustang    -500
## dtype: int64

Step 3: Convert that to a boolean Series that says which off_market_prices are below 0.

below_fair_price = (off_market_prices < 0)
below_fair_price
## escape      True
## escape     False
## ranger     False
## mustang    False
## mustang     True
## dtype: bool

Step 4: At this point, we can see that the 1st and 5th asking prices would be good buys, so we know our result should be the list of integer indices [0,4]. There are a few different ways we could make that list. Here, we start by resetting the index of below_fair_price so that it becomes a range index starting from 0. We set drop = True to tell pandas to drop the existing index.

below_fair_price.reset_index(inplace=True, drop=True)
below_fair_price
## 0     True
## 1    False
## 2    False
## 3    False
## 4     True
## dtype: bool

Step 5: Then we make a new Series of good_buys using conditional indexing on below_fair_price.

good_buys = below_fair_price.loc[below_fair_price]
good_buys
## 0    True
## 4    True
## dtype: bool

Step 6: Lastly we convert the index to a list using good_buys.index.to_list().

good_buys.index.to_list()
## [0, 4]

Stocks

Given a Series of stock prices at times 1,2,3… determine which time had the greatest price increase from the day before.

# Setup
np.random.seed(123)
stock_prices = pd.Series(
  data=np.random.uniform(low=100, high=200, size=10),
  index=np.random.choice(np.arange(10), size=10, replace=False)
)
stock_prices
## 2    169.646919
## 1    128.613933
## 8    122.685145
## 7    155.131477
## 9    171.946897
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 3    148.093190
## 0    139.211752
## dtype: float64

Solution

Step 1: Sort stock_prices by its index using sort_index().

stock_prices.sort_index(inplace=True)
stock_prices
## 0    139.211752
## 1    128.613933
## 2    169.646919
## 3    148.093190
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 7    155.131477
## 8    122.685145
## 9    171.946897
## dtype: float64

Step 2: Calculate the previous day’s price using shift().

stock_prices_prev = stock_prices.shift(periods=1)
stock_prices_prev
## 0           NaN
## 1    139.211752
## 2    128.613933
## 3    169.646919
## 4    148.093190
## 5    142.310646
## 6    198.076420
## 7    168.482974
## 8    155.131477
## 9    122.685145
## dtype: float64

Step 3: Calculate the day-to-day change.

daily_changes = stock_prices - stock_prices_prev
daily_changes
## 0          NaN
## 1   -10.597818
## 2    41.032985
## 3   -21.553728
## 4    -5.782544
## 5    55.765774
## 6   -29.593446
## 7   -13.351497
## 8   -32.446332
## 9    49.261752
## dtype: float64

Step 4: Identify the day with the biggest change using idxmax().

daily_changes.idxmax()
## 5

Note If multiple values equal the maximum, the first row label with that value is returned.