Python Pandas For Your Grandpa - 5.5 Challenge: Concerts
Contents
Setup
Given a dataset of concerts
, count the number of concerts per (artist, venue) pair, per year-month. Make the resulting table be a wide table - one row per year-month with a column for each unique (artist, venue) pair. Use the cross product of the artists
and venues
Series to determine which (artist, venue) pairs to include in the result.
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
generator = np.random.default_rng(1357)
artists = ['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'The Rolling Sums']
venues = ['Super Dome', 'STAPLES Center', 'Madison Square Garden']
concerts = pd.DataFrame({
'artist':generator.choice(['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'Just-in-Timeberlake'], size=200, replace=True),
'venue':generator.choice(venues + ['Red Rocks'], size=200, replace=True),
'date':pd.to_datetime('2020-01-01') + pd.to_timedelta(generator.choice(365, size=200, replace=True), unit='D'),
}).drop_duplicates().convert_dtypes()
print(concerts)
## artist venue date
## 0 Binary Brothers STAPLES Center 2020-04-10
## 1 Binary Brothers STAPLES Center 2020-07-31
## 2 Mouse Rat Madison Square Garden 2020-06-21
## 3 Mouse Rat Red Rocks 2020-04-17
## 4 Binary Brothers Red Rocks 2020-10-05
## .. ... ... ...
## 195 Lady Gigabyte Red Rocks 2020-08-10
## 196 Mouse Rat Madison Square Garden 2020-06-30
## 197 Binary Brothers Super Dome 2020-05-02
## 198 Just-in-Timeberlake Madison Square Garden 2020-06-21
## 199 Binary Brothers Madison Square Garden 2020-04-14
##
## [200 rows x 3 columns]
Solution 1
artist_venues = pd.MultiIndex.from_product([artists, venues], names = ['artist', 'venue'])
concerts2 = pd.merge(
left=pd.DataFrame(index=artist_venues),
right=concerts.set_index(['artist', 'venue']),
on=['artist', 'venue'],
how='left'
)
concerts2['yearmonth'] = concerts2.date.dt.to_period('M')
concerts2.pivot_table(
index='yearmonth',
values='date',
columns=['artist', 'venue'],
aggfunc='count',
fill_value=0,
dropna=False
)
## artist Binary Brothers Lady Gigabyte Mouse Rat The Rolling Sums
## venue Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome Madison Square Garden STAPLES Center Super Dome
## yearmonth
## 2020-01 0 1 2 0 0 3 1 0 4 0 0 0
## 2020-02 2 0 0 0 0 0 1 1 0 0 0 0
## 2020-03 0 1 2 1 1 1 0 1 0 0 0 0
## 2020-04 2 2 1 1 0 0 2 2 1 0 0 0
## 2020-05 2 1 2 0 2 2 1 2 3 0 0 0
## 2020-06 0 0 1 0 0 1 4 0 1 0 0 0
## 2020-07 1 3 0 0 0 0 1 0 2 0 0 0
## 2020-08 0 1 1 0 1 0 1 1 2 0 0 0
## 2020-09 3 1 2 0 1 0 1 0 0 0 0 0
## 2020-10 0 2 0 0 1 3 0 2 3 0 0 0
## 2020-11 0 2 0 0 2 1 1 1 1 0 0 0
## 2020-12 0 1 2 0 1 3 1 1 0 0 0 0
Solution 2
concerts['artist'] = pd.Categorical(concerts.artist, categories=artists)
concerts['venue'] = pd.Categorical(concerts.venue, categories=venues)
concerts2 = concerts.dropna()
concerts2['yearmonth'] = concerts2.date.dt.to_period('M')
## <string>:1: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
concerts2.pivot_table(
index='yearmonth',
values='date',
columns=['artist', 'venue'],
aggfunc='count',
fill_value=0,
dropna=False
)
## artist Mouse Rat Binary Brothers Lady Gigabyte The Rolling Sums
## venue Super Dome STAPLES Center Madison Square Garden Super Dome STAPLES Center Madison Square Garden Super Dome STAPLES Center Madison Square Garden Super Dome STAPLES Center Madison Square Garden
## yearmonth
## 2020-01 4 0 1 2 1 0 3 0 0 0 0 0
## 2020-02 0 1 1 0 0 2 0 0 0 0 0 0
## 2020-03 0 1 0 2 1 0 1 1 1 0 0 0
## 2020-04 1 2 2 1 2 2 0 0 1 0 0 0
## 2020-05 3 2 1 2 1 2 2 2 0 0 0 0
## 2020-06 1 0 4 1 0 0 1 0 0 0 0 0
## 2020-07 2 0 1 0 3 1 0 0 0 0 0 0
## 2020-08 2 1 1 1 1 0 0 1 0 0 0 0
## 2020-09 0 0 1 2 1 3 0 1 0 0 0 0
## 2020-10 3 2 0 0 2 0 3 1 0 0 0 0
## 2020-11 1 1 1 0 2 0 1 2 0 0 0 0
## 2020-12 0 1 1 2 1 0 3 1 0 0 0 0
Course Curriculum
- Introduction
1.1 Introduction - Series
2.1 Series Creation
2.2 Series Basic Indexing
2.3 Series Basic Operations
2.4 Series Boolean Indexing
2.5 Series Missing Values
2.6 Series Vectorization
2.7 Seriesapply()
2.8 Series View vs Copy
2.9 Challenge: Baby Names
2.10 Challenge: Bees Knees
2.11 Challenge: Car Shopping
2.12 Challenge: Price Gouging
2.13 Challenge: Fair Teams - DataFrame
3.1 DataFrame Creation
3.2 DataFrame To And From CSV
3.3 DataFrame Basic Indexing
3.4 DataFrame Basic Operations
3.5 DataFrameapply()
3.6 DataFrame View vs Copy
3.7 DataFramemerge()
3.8 DataFrame Aggregation
3.9 DataFramegroupby()
3.10 Challenge: Hobbies
3.11 Challenge: Party Time
3.12 Challenge: Vending Machines
3.13 Challenge: Cradle Robbers
3.14 Challenge: Pot Holes - Advanced
4.1 Strings
4.2 Dates And Times
4.3 Categoricals
4.4 MultiIndex
4.5 DataFrame Reshaping
4.6 Challenge: Class Transitions
4.7 Challenge: Rose Thorn
4.8 Challenge: Product Volumes
4.9 Challenge: Session Groups
4.10 Challenge: OB-GYM - Final Boss
5.1 Challenge: COVID Tracing
5.2 Challenge: Pickle
5.3 Challenge: TV Commercials
5.4 Challenge: Family IQ
5.5 Challenge: Concerts