# Course Contents

1. Introduction
2. Series
2.1 Series Creation
2.2 Series Basic Operations
2.3 Series Basic Indexing
2.4 Series Overwriting Data
2.5 Series Apply
2.6 Series Concatenation
2.7 Series Boolean Indexing
2.8 Series View Vs Copy
2.9 Series Missing Values
2.10 Series Challenges

``````import numpy as np
import pandas as pd
``````

## Strings

Given a Series of strings, count how many times these words occur: {football, shoe, dog}

``````# Setup
strings = pd.Series(['cat', 'shoe', 'car', 'dog', np.nan, 'football', 'shoe', 'football', 'cat'])
``````

### Solution

``````strings.value_counts().loc[['football', 'shoe', 'dog']]
## football    2
## shoe        2
## dog         1
## dtype: int64
``````

`value_counts()` is a really handy method that counts the occurrence of each unique element in a Series. It spits back a Series of sorted frequencies, indexed by the unique elements in the original Series.

## Double A

Given two Series of the same length, `A` and `B`, if the ith element of `B` is NaN, double the ith value of `A`.

``````# Setup
A = pd.Series([5, 20, 10, 3, 100, 2], index = [0, 2, 3, 1, 4, 5])
B = pd.Series([True, np.nan, False, np.nan, np.nan, False], index = [5, 2, 4, 1, 0, 3])
``````

### Solution

``````A.loc[pd.isna(B).to_numpy()] *= 2
A
## 0      5
## 2     40
## 3     10
## 1      6
## 4    200
## 5      2
## dtype: int64
``````

Step 1: `pd.isna(B)` returns a boolean Series indicating which elements of `B` are nan
Step 2: `to_numpy()` converts that Series to a NumPy array
Step 3: Wrapping that with `A.loc[]` selects elements of A where corresponding array values are True
Step 4: `*= 2` updates each element e equal to e * 2

The tricky part of this solution is knowing to use the `to_numpy()` method. Without `to_numpy()`, you’ll end up selecting elements of `A` whose index matches the index of True values in `B`, instead of elements of `A` whose position matches the position of True values of `B`.

``````# Selects element of A whose index matches a B index with a True value
A.loc[pd.isna(B)]
## 0     5
## 2    40
## 1     6
## dtype: int64
``````
``````# Selects element of A whose position matches a B position with a True value
A.loc[pd.isna(B).to_numpy()]
## 2     40
## 1      6
## 4    200
## dtype: int64
``````

## Cars

Given a Series of car asking prices and another Series of car fair values, determine which cars for sale are a good buy. The result should be a list of integer indices corresponding to the good buys in `asking_prices`.

``````# Setup
data=[5000, 7500, 9000, 8500, 7000],
index=['escape', 'escape', 'ranger', 'mustang', 'mustang']
)
## escape     5000
## escape     7500
## ranger     9000
## mustang    8500
## mustang    7000
## dtype: int64
``````
``````fair_prices = pd.Series(
data=[5500, 7500, 7500],
index=['escape', 'mustang', 'ranger']
)
fair_prices
## escape     5500
## mustang    7500
## ranger     7500
## dtype: int64
``````

### Solution

Here’s one of many solutions.

Step 1: Create a new Series, `fair_prices_tall`, with the same index as `asking_prices` but whose values correspond to `fair_prices`.

``````fair_prices_tall = fair_prices.loc[asking_prices.index]
fair_prices_tall
## escape     5500
## escape     5500
## ranger     7500
## mustang    7500
## mustang    7500
## dtype: int64
``````

Step 2: Calculate `asking_prices - fair_prices_tall` and store that into a variable called `off_market_prices`.

``````off_market_prices = asking_prices - fair_prices_tall
off_market_prices
## escape     -500
## escape     2000
## ranger     1500
## mustang    1000
## mustang    -500
## dtype: int64
``````

Step 3: Convert that to a boolean Series that says which `off_market_prices` are below 0.

``````below_fair_price = (off_market_prices < 0)
below_fair_price
## escape      True
## escape     False
## ranger     False
## mustang    False
## mustang     True
## dtype: bool
``````

Step 4: At this point, we can see that the 1st and 5th asking prices would be good buys, so we know our result should be the list of integer indices `[0,4]`. There are a few different ways we could make that list. Here, we start by resetting the index of `below_fair_price` so that it becomes a range index starting from 0. We set `drop = True` to tell pandas to drop the existing index.

``````below_fair_price.reset_index(inplace=True, drop=True)
below_fair_price
## 0     True
## 1    False
## 2    False
## 3    False
## 4     True
## dtype: bool
``````

Step 5: Then we make a new Series of `good_buys` using conditional indexing on `below_fair_price`.

``````good_buys = below_fair_price.loc[below_fair_price]
## 0    True
## 4    True
## dtype: bool
``````

Step 6: Lastly we convert the index to a list using `good_buys.index.to_list()`.

``````good_buys.index.to_list()
## [0, 4]
``````

## Stocks

Given a Series of stock prices at times 1,2,3… determine which time had the greatest price increase from the day before.

``````# Setup
np.random.seed(123)
stock_prices = pd.Series(
data=np.random.uniform(low=100, high=200, size=10),
index=np.random.choice(np.arange(10), size=10, replace=False)
)
stock_prices
## 2    169.646919
## 1    128.613933
## 8    122.685145
## 7    155.131477
## 9    171.946897
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 3    148.093190
## 0    139.211752
## dtype: float64
``````

### Solution

Step 1: Sort `stock_prices` by its index using `sort_index()`.

``````stock_prices.sort_index(inplace=True)
stock_prices
## 0    139.211752
## 1    128.613933
## 2    169.646919
## 3    148.093190
## 4    142.310646
## 5    198.076420
## 6    168.482974
## 7    155.131477
## 8    122.685145
## 9    171.946897
## dtype: float64
``````

Step 2: Calculate the previous day’s price using `shift()`.

``````stock_prices_prev = stock_prices.shift(periods=1)
stock_prices_prev
## 0           NaN
## 1    139.211752
## 2    128.613933
## 3    169.646919
## 4    148.093190
## 5    142.310646
## 6    198.076420
## 7    168.482974
## 8    155.131477
## 9    122.685145
## dtype: float64
``````

Step 3: Calculate the day-to-day change.

``````daily_changes = stock_prices - stock_prices_prev
daily_changes
## 0          NaN
## 1   -10.597818
## 2    41.032985
## 3   -21.553728
## 4    -5.782544
## 5    55.765774
## 6   -29.593446
## 7   -13.351497
## 8   -32.446332
## 9    49.261752
## dtype: float64
``````

Step 4: Identify the day with the biggest change using `idxmax()`.

``````daily_changes.idxmax()
## 5
``````

Note If multiple values equal the maximum, the first row label with that value is returned.