Contents

Python Pandas For Your Grandpa - 5.5 Challenge: Concerts

Setup

Given a dataset of concerts, count the number of concerts per (artist, venue) pair, per year-month. Make the resulting table be a wide table - one row per year-month with a column for each unique (artist, venue) pair. Use the cross product of the artists and venues Series to determine which (artist, venue) pairs to include in the result.

import numpy as np
import pandas as pd

pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

generator = np.random.default_rng(1357)
artists = ['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'The Rolling Sums']
venues = ['Super Dome', 'STAPLES Center', 'Madison Square Garden']

concerts = pd.DataFrame({
    'artist':generator.choice(['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'Just-in-Timeberlake'], size=200, replace=True),
    'venue':generator.choice(venues + ['Red Rocks'], size=200, replace=True),
    'date':pd.to_datetime('2020-01-01') + pd.to_timedelta(generator.choice(365, size=200, replace=True), unit='D'),
}).drop_duplicates().convert_dtypes()
print(concerts)
##                   artist                  venue       date
## 0        Binary Brothers         STAPLES Center 2020-04-10
## 1        Binary Brothers         STAPLES Center 2020-07-31
## 2              Mouse Rat  Madison Square Garden 2020-06-21
## 3              Mouse Rat              Red Rocks 2020-04-17
## 4        Binary Brothers              Red Rocks 2020-10-05
## ..                   ...                    ...        ...
## 195        Lady Gigabyte              Red Rocks 2020-08-10
## 196            Mouse Rat  Madison Square Garden 2020-06-30
## 197      Binary Brothers             Super Dome 2020-05-02
## 198  Just-in-Timeberlake  Madison Square Garden 2020-06-21
## 199      Binary Brothers  Madison Square Garden 2020-04-14
## 
## [200 rows x 3 columns]

Solution 1

artist_venues = pd.MultiIndex.from_product([artists, venues], names = ['artist', 'venue'])
concerts2 = pd.merge(
    left=pd.DataFrame(index=artist_venues),
    right=concerts.set_index(['artist', 'venue']),
    on=['artist', 'venue'],
    how='left'
)
concerts2['yearmonth'] = concerts2.date.dt.to_period('M')
concerts2.pivot_table(
    index='yearmonth',
    values='date',
    columns=['artist', 'venue'],
    aggfunc='count',
    fill_value=0,
    dropna=False
)
## artist          Binary Brothers                                   Lady Gigabyte                                       Mouse Rat                                The Rolling Sums                          
## venue     Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome
## yearmonth                                                                                                                                                                                                
## 2020-01                       0              1          2                     0              0          3                     1              0          4                     0              0          0
## 2020-02                       2              0          0                     0              0          0                     1              1          0                     0              0          0
## 2020-03                       0              1          2                     1              1          1                     0              1          0                     0              0          0
## 2020-04                       2              2          1                     1              0          0                     2              2          1                     0              0          0
## 2020-05                       2              1          2                     0              2          2                     1              2          3                     0              0          0
## 2020-06                       0              0          1                     0              0          1                     4              0          1                     0              0          0
## 2020-07                       1              3          0                     0              0          0                     1              0          2                     0              0          0
## 2020-08                       0              1          1                     0              1          0                     1              1          2                     0              0          0
## 2020-09                       3              1          2                     0              1          0                     1              0          0                     0              0          0
## 2020-10                       0              2          0                     0              1          3                     0              2          3                     0              0          0
## 2020-11                       0              2          0                     0              2          1                     1              1          1                     0              0          0
## 2020-12                       0              1          2                     0              1          3                     1              1          0                     0              0          0

Solution 2

concerts['artist'] = pd.Categorical(concerts.artist, categories=artists)
concerts['venue'] = pd.Categorical(concerts.venue, categories=venues)
concerts2 = concerts.dropna()
concerts2['yearmonth'] = concerts2.date.dt.to_period('M')
## <string>:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
concerts2.pivot_table(
    index='yearmonth',
    values='date',
    columns=['artist', 'venue'],
    aggfunc='count',
    fill_value=0,
    dropna=False
)
## artist     Mouse Rat                                      Binary Brothers                                      Lady Gigabyte                                      The Rolling Sums                                     
## venue     Super Dome STAPLES Center Madison Square Garden      Super Dome STAPLES Center Madison Square Garden    Super Dome STAPLES Center Madison Square Garden       Super Dome STAPLES Center Madison Square Garden
## yearmonth                                                                                                                                                                                                              
## 2020-01            4              0                     1               2              1                     0             3              0                     0                0              0                     0
## 2020-02            0              1                     1               0              0                     2             0              0                     0                0              0                     0
## 2020-03            0              1                     0               2              1                     0             1              1                     1                0              0                     0
## 2020-04            1              2                     2               1              2                     2             0              0                     1                0              0                     0
## 2020-05            3              2                     1               2              1                     2             2              2                     0                0              0                     0
## 2020-06            1              0                     4               1              0                     0             1              0                     0                0              0                     0
## 2020-07            2              0                     1               0              3                     1             0              0                     0                0              0                     0
## 2020-08            2              1                     1               1              1                     0             0              1                     0                0              0                     0
## 2020-09            0              0                     1               2              1                     3             0              1                     0                0              0                     0
## 2020-10            3              2                     0               0              2                     0             3              1                     0                0              0                     0
## 2020-11            1              1                     1               0              2                     0             1              2                     0                0              0                     0
## 2020-12            0              1                     1               2              1                     0             3              1                     0                0              0                     0

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts