Contents

Python Pandas For Your Grandpa - 4.10 Challenge: OB-GYM

Setup

You own a gym for pregnant women called “OB-GYM” and you recently opened a second location. You’d like to analyze its performance, but your reporting software has given you the sales data in an awkward format. Reshape it into a DataFrame like this

#               sales_1  sales_2  members_1  members_2
# date
# 2020-01-01    737.54    629.00         22         20
# 2020-01-02    750.75    699.01         23         22
# 2020-01-03    750.60    640.20         20         24
# 2020-01-04    752.65    695.64         21         22
# 2020-01-05    747.02    632.40         20         22
import numpy as np
import pandas as pd

generator = np.random.default_rng(314)

sales = pd.DataFrame({
    'date':pd.date_range(start = '2020-01-01', periods=5).repeat(2),
    'store_id':np.tile([1,2], 5),
    'sales1':np.round(generator.normal(loc=750, scale=20, size=10), 2),
    'sales2':np.round(generator.normal(loc=650, scale=40, size=10), 2),
    'members':generator.integers(low=20, high=25, size=10)
})
sales.loc[sales.store_id == 2, 'sales1'] = np.nan
sales.loc[sales.store_id == 1, 'sales2'] = np.nan
print(sales)
##         date  store_id  sales1  sales2  members
## 0 2020-01-01         1  737.54     NaN       22
## 1 2020-01-01         2     NaN  629.00       20
## 2 2020-01-02         1  750.75     NaN       23
## 3 2020-01-02         2     NaN  699.01       22
## 4 2020-01-03         1  750.60     NaN       20
## 5 2020-01-03         2     NaN  640.20       24
## 6 2020-01-04         1  752.65     NaN       21
## 7 2020-01-04         2     NaN  695.64       22
## 8 2020-01-05         1  747.02     NaN       20
## 9 2020-01-05         2     NaN  632.40       22

Solution 1

sol1 = sales.pivot(index='date', columns='store_id').dropna(axis=1)
sol1.columns = [f'{x[:-1]}_{y}' if x[-1] in ('1', '2') else f'{x}_{y}' for x,y in sol1.columns]
print(sol1)
##             sales_1  sales_2  members_1  members_2
## date                                              
## 2020-01-01   737.54   629.00         22         20
## 2020-01-02   750.75   699.01         23         22
## 2020-01-03   750.60   640.20         20         24
## 2020-01-04   752.65   695.64         21         22
## 2020-01-05   747.02   632.40         20         22

Solution 2

sol2 = sales.set_index(['date','store_id']).unstack().dropna(axis=1)
sol2.columns = [f'{x[:-1]}_{y}' if x[-1] in ('1', '2') else f'{x}_{y}' for x,y in sol2.columns]
print(sol2)
##             sales_1  sales_2  members_1  members_2
## date                                              
## 2020-01-01   737.54   629.00         22         20
## 2020-01-02   750.75   699.01         23         22
## 2020-01-03   750.60   640.20         20         24
## 2020-01-04   752.65   695.64         21         22
## 2020-01-05   747.02   632.40         20         22

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts