Python Pandas For Your Grandpa - 4.10 Challenge: OB-GYM
Contents
Setup
You own a gym for pregnant women called “OB-GYM” and you recently opened a second location. You’d like to analyze its performance, but your reporting software has given you the sales data in an awkward format. Reshape it into a DataFrame like this
# sales_1 sales_2 members_1 members_2
# date
# 2020-01-01 737.54 629.00 22 20
# 2020-01-02 750.75 699.01 23 22
# 2020-01-03 750.60 640.20 20 24
# 2020-01-04 752.65 695.64 21 22
# 2020-01-05 747.02 632.40 20 22
import numpy as np
import pandas as pd
generator = np.random.default_rng(314)
sales = pd.DataFrame({
'date':pd.date_range(start = '2020-01-01', periods=5).repeat(2),
'store_id':np.tile([1,2], 5),
'sales1':np.round(generator.normal(loc=750, scale=20, size=10), 2),
'sales2':np.round(generator.normal(loc=650, scale=40, size=10), 2),
'members':generator.integers(low=20, high=25, size=10)
})
sales.loc[sales.store_id == 2, 'sales1'] = np.nan
sales.loc[sales.store_id == 1, 'sales2'] = np.nan
print(sales)
## date store_id sales1 sales2 members
## 0 2020-01-01 1 737.54 NaN 22
## 1 2020-01-01 2 NaN 629.00 20
## 2 2020-01-02 1 750.75 NaN 23
## 3 2020-01-02 2 NaN 699.01 22
## 4 2020-01-03 1 750.60 NaN 20
## 5 2020-01-03 2 NaN 640.20 24
## 6 2020-01-04 1 752.65 NaN 21
## 7 2020-01-04 2 NaN 695.64 22
## 8 2020-01-05 1 747.02 NaN 20
## 9 2020-01-05 2 NaN 632.40 22
Solution 1
sol1 = sales.pivot(index='date', columns='store_id').dropna(axis=1)
sol1.columns = [f'{x[:-1]}_{y}' if x[-1] in ('1', '2') else f'{x}_{y}' for x,y in sol1.columns]
print(sol1)
## sales_1 sales_2 members_1 members_2
## date
## 2020-01-01 737.54 629.00 22 20
## 2020-01-02 750.75 699.01 23 22
## 2020-01-03 750.60 640.20 20 24
## 2020-01-04 752.65 695.64 21 22
## 2020-01-05 747.02 632.40 20 22
Solution 2
sol2 = sales.set_index(['date','store_id']).unstack().dropna(axis=1)
sol2.columns = [f'{x[:-1]}_{y}' if x[-1] in ('1', '2') else f'{x}_{y}' for x,y in sol2.columns]
print(sol2)
## sales_1 sales_2 members_1 members_2
## date
## 2020-01-01 737.54 629.00 22 20
## 2020-01-02 750.75 699.01 23 22
## 2020-01-03 750.60 640.20 20 24
## 2020-01-04 752.65 695.64 21 22
## 2020-01-05 747.02 632.40 20 22
Course Curriculum
- Introduction
1.1 Introduction - Series
2.1 Series Creation
2.2 Series Basic Indexing
2.3 Series Basic Operations
2.4 Series Boolean Indexing
2.5 Series Missing Values
2.6 Series Vectorization
2.7 Seriesapply()
2.8 Series View vs Copy
2.9 Challenge: Baby Names
2.10 Challenge: Bees Knees
2.11 Challenge: Car Shopping
2.12 Challenge: Price Gouging
2.13 Challenge: Fair Teams - DataFrame
3.1 DataFrame Creation
3.2 DataFrame To And From CSV
3.3 DataFrame Basic Indexing
3.4 DataFrame Basic Operations
3.5 DataFrameapply()
3.6 DataFrame View vs Copy
3.7 DataFramemerge()
3.8 DataFrame Aggregation
3.9 DataFramegroupby()
3.10 Challenge: Hobbies
3.11 Challenge: Party Time
3.12 Challenge: Vending Machines
3.13 Challenge: Cradle Robbers
3.14 Challenge: Pot Holes - Advanced
4.1 Strings
4.2 Dates And Times
4.3 Categoricals
4.4 MultiIndex
4.5 DataFrame Reshaping
4.6 Challenge: Class Transitions
4.7 Challenge: Rose Thorn
4.8 Challenge: Product Volumes
4.9 Challenge: Session Groups
4.10 Challenge: OB-GYM - Final Boss
5.1 Challenge: COVID Tracing
5.2 Challenge: Pickle
5.3 Challenge: TV Commercials
5.4 Challenge: Family IQ
5.5 Challenge: Concerts