Share on:

Python Pandas For Your Grandpa | Section 2.10 | Series Challenges
March 18, 2020

Strings

Given a Series of strings, count how many times these words occur: {football, shoe, dog}

# Setup
strings = pd.Series(['cat', 'shoe', 'car', 'dog', np.nan, 'football', 'shoe', 'football', 'cat'])

Solution

strings.value_counts().loc[['football', 'shoe', 'dog']]
## football    2
## shoe        2
## dog         1
## dtype: int64

value_counts() is a really handy method that counts the occurrence of each unique element in a Series. It spits back a Series of sorted frequencies, indexed by the unique elements in the original Series.


Double A

Given two Series of the same length, A and B, if the ith element of B is NaN, double the ith value of A.

# Setup
A = pd.Series([5, 20, 10, 3, 100, 2], index = [0, 2, 3, 1, 4, 5])
B = pd.Series([True, np.nan, False, np.nan, np.nan, False], index = [5, 2, 4, 1, 0, 3])

Solution

A.loc[pd.isna(B).to_numpy()] *= 2
A
## 0      5
## 2     40
## 3     10
## 1      6
## 4    200
## 5      2
## dtype: int64

Step 1: pd.isna(B) returns a boolean Series indicating which elements of B are nan
Step 2: to_numpy() converts that Series to a NumPy array
Step 3: Wrapping that with A.loc[] selects elements of A where corresponding array values are True
Step 4: *= 2 updates each element e equal to e * 2

The tricky part of this solution is knowing to use the to_numpy() method. Without to_numpy(), you’ll end up selecting elements of A whose index matches the index of True values in B, instead of elements of A whose position matches the position of True values of B.

# Selects element of A whose index matches a B index with a True value
A.loc[pd.isna(B)]
## 0     5
## 2    40
## 1     6
## dtype: int64
# Selects element of A whose position matches a B position with a True value
A.loc[pd.isna(B).to_numpy()]
## 2     40
## 1      6
## 4    200
## dtype: int64

Cars

Given a Series of car asking prices and another Series of car fair values, determine which cars for sale are a good buy. The result should be a list of integer indices corresponding to the good buys in asking_prices.

# Setup
asking_prices = pd.Series(
  data=[5000, 7500, 9000, 8500, 7000], 
  index=['escape', 'escape', 'ranger', 'mustang', 'mustang']
)
asking_prices
## escape     5000
## escape     7500
## ranger     9000
## mustang    8500
## mustang    7000
## dtype: int64
fair_prices = pd.Series(
  data=[5500, 7500, 7500], 
  index=['escape', 'mustang', 'ranger']
)
fair_prices
## escape     5500
## mustang    7500
## ranger     7500
## dtype: int64

Solution

Here’s one of many solutions.

Step 1: Create a new Series, fair_prices_tall, with the same index as asking_prices but whose values correspond to fair_prices.

fair_prices_tall = fair_prices.loc[asking_prices.index]
fair_prices_tall
## escape     5500
## escape     5500
## ranger     7500
## mustang    7500
## mustang    7500
## dtype: int64

Step 2: Calculate asking_prices - fair_prices_tall and store that into a variable called off_market_prices.

off_market_prices = asking_prices - fair_prices_tall
off_market_prices
## escape     -500
## escape     2000
## ranger     1500
## mustang    1000
## mustang    -500
## dtype: int64

Step 3: Convert that to a boolean Series that says which off_market_prices are below 0.

below_fair_price = (off_market_prices < 0)
below_fair_price
## escape      True
## escape     False
## ranger     False
## mustang    False
## mustang     True
## dtype: bool

Step 4: At this point, we can see that the 1st and 5th asking prices would be good buys, so we know our result should be the list of integer indices [0,4]. There are a few different ways we could make that list. Here, we start by resetting the index of below_fair_price so that it becomes a range index starting from 0. We set drop = True to tell pandas to drop the existing index.

below_fair_price.reset_index(inplace=True, drop=True)
below_fair_price
## 0     True
## 1    False
## 2    False
## 3    False
## 4     True
## dtype: bool

Step 5: Then we make a new Series of good_buys using conditional indexing on below_fair_price.

good_buys = below_fair_price.loc[below_fair_price]
good_buys
## 0    True
## 4    True
## dtype: bool

Step 6: Lastly we convert the index to a list using good_buys.index.to_list().

good_buys.index.to_list()
## [0, 4]

Stocks

Given a Series of stock prices at times 1,2,3… determine which time had the greatest price increase from the day before.

# Setup
np.random.seed(123)
stock_prices = pd.Series(
  data=np.random.uniform(low=100, high=200, size=10),
  index=np.random.choice(np.arange(10), size=10, replace=False)
)
stock_prices
## 2    169.646919
## 1    128.613933
## 8    122.685145
## 7    155.131477
## 9    171.946897
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 3    148.093190
## 0    139.211752
## dtype: float64

Solution

Step 1: Sort stock_prices by its index using sort_index().

stock_prices.sort_index(inplace=True)
stock_prices
## 0    139.211752
## 1    128.613933
## 2    169.646919
## 3    148.093190
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 7    155.131477
## 8    122.685145
## 9    171.946897
## dtype: float64

Step 2: Calculate the previous day’s price using shift().

stock_prices_prev = stock_prices.shift(periods=1)
stock_prices_prev
## 0           NaN
## 1    139.211752
## 2    128.613933
## 3    169.646919
## 4    148.093190
## 5    142.310646
## 6    198.076420
## 7    168.482974
## 8    155.131477
## 9    122.685145
## dtype: float64

Step 3: Calculate the day-to-day change.

daily_changes = stock_prices - stock_prices_prev
daily_changes
## 0          NaN
## 1   -10.597818
## 2    41.032985
## 3   -21.553728
## 4    -5.782544
## 5    55.765774
## 6   -29.593446
## 7   -13.351497
## 8   -32.446332
## 9    49.261752
## dtype: float64

Step 4: Identify the day with the biggest change using idxmax().

daily_changes.idxmax()
## 5

Note If multiple values equal the maximum, the first row label with that value is returned.


comments powered by Disqus