Contents

Python Pandas For Your Grandpa - 5.3 Challenge: TV Commercials

Setup

You own a national restaurant chain called Applewasps. To increase sales, you decide to launch a multi-regional television marketing campaign.

At the end of the campaign you have a table of commercials indicating when and where each commercial aired, and a table of sales indicating when and where customers generated sales.

In order to analyze the performance of each commercial, map each sale to the commercial that aired prior to the sale, in the same region.

import numpy as np
import pandas as pd

generator = np.random.default_rng(5555)
regions = ['north', 'south', 'east', 'west']

commercials = pd.DataFrame({
    'commercial_id': range(10),
    'region': generator.choice(regions, size=10),
    'date_time': pd.to_datetime('2020-01-01') + pd.to_timedelta(generator.integers(240, size=10), unit='h')
})
print(commercials)
##    commercial_id region           date_time
## 0              0   west 2020-01-10 12:00:00
## 1              1  north 2020-01-10 16:00:00
## 2              2  south 2020-01-09 01:00:00
## 3              3   east 2020-01-10 19:00:00
## 4              4  south 2020-01-08 22:00:00
## 5              5   east 2020-01-03 02:00:00
## 6              6  south 2020-01-07 15:00:00
## 7              7   west 2020-01-05 22:00:00
## 8              8   east 2020-01-03 04:00:00
## 9              9   west 2020-01-05 04:00:00
sales = pd.DataFrame({
    'sale_id': range(10),
    'region': generator.choice(regions, size=10),
    'date_time': pd.to_datetime('2020-01-01') + pd.to_timedelta(generator.integers(240, size=10), unit='h'),
    'revenue': np.round(generator.normal(loc=20, scale=5, size=10), 2)
})
print(sales)
##    sale_id region           date_time  revenue
## 0        0   west 2020-01-05 08:00:00    20.14
## 1        1   east 2020-01-08 22:00:00    22.98
## 2        2  south 2020-01-07 21:00:00    22.98
## 3        3   west 2020-01-05 17:00:00    16.82
## 4        4   west 2020-01-02 12:00:00    20.47
## 5        5   east 2020-01-10 09:00:00    26.93
## 6        6  north 2020-01-08 19:00:00    20.25
## 7        7  south 2020-01-01 08:00:00    23.38
## 8        8  south 2020-01-01 17:00:00    25.74
## 9        9  south 2020-01-10 22:00:00    22.28

Solution

commercials.sort_values('date_time', inplace=True)
sales.sort_values('date_time', inplace=True)
pd.merge_asof(left=sales, right=commercials, on='date_time', by='region', direction='backward')
##    sale_id region           date_time  revenue  commercial_id
## 0        7  south 2020-01-01 08:00:00    23.38            NaN
## 1        8  south 2020-01-01 17:00:00    25.74            NaN
## 2        4   west 2020-01-02 12:00:00    20.47            NaN
## 3        0   west 2020-01-05 08:00:00    20.14            9.0
## 4        3   west 2020-01-05 17:00:00    16.82            9.0
## 5        2  south 2020-01-07 21:00:00    22.98            6.0
## 6        6  north 2020-01-08 19:00:00    20.25            NaN
## 7        1   east 2020-01-08 22:00:00    22.98            8.0
## 8        5   east 2020-01-10 09:00:00    26.93            8.0
## 9        9  south 2020-01-10 22:00:00    22.28            2.0

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts