Contents

Python Pandas For Your Grandpa - 5.1 Challenge: COVID Tracing

You track the whereabouts of 100 individuals in a DataFrame called whereabouts. Each person has a corresponding list of place ids indicating the places they’ve visited in the recent week. You also track which places have been exposed to COVID-19 in a list called exposed. For each person, identify the places they visited which have been exposed. Make this a new list-column in whereabouts called exposures.

import numpy as np
import pandas as pd

# exposed places
exposed = [0,5,9]

# whereabouts of each person
generator = np.random.default_rng(2468)
Nplaces = 10
Npersons = 10
place_ids = np.arange(Nplaces)
visits = generator.choice(place_ids, size=3*Nplaces, replace=True)
split_idxs = np.sort(generator.choice(len(visits), size=9, replace=True))
whereabouts = pd.DataFrame({
    'person_id': range(Npersons),
    'places': [np.unique(x).tolist() for x in np.array_split(visits, split_idxs)]
})
print(whereabouts)
##    person_id              places
## 0          0        [3, 4, 5, 6]
## 1          1                  []
## 2          2                 [3]
## 3          3           [6, 8, 9]
## 4          4                 [3]
## 5          5  [0, 2, 5, 6, 7, 8]
## 6          6              [2, 7]
## 7          7        [0, 5, 8, 9]
## 8          8           [2, 7, 9]
## 9          9        [0, 5, 8, 9]

Solution 1

whereabouts['exposures'] = whereabouts['places'].apply(lambda x: list(set(x) & set(exposed)))
print(whereabouts)
##    person_id              places  exposures
## 0          0        [3, 4, 5, 6]        [5]
## 1          1                  []         []
## 2          2                 [3]         []
## 3          3           [6, 8, 9]        [9]
## 4          4                 [3]         []
## 5          5  [0, 2, 5, 6, 7, 8]     [0, 5]
## 6          6              [2, 7]         []
## 7          7        [0, 5, 8, 9]  [0, 9, 5]
## 8          8           [2, 7, 9]        [9]
## 9          9        [0, 5, 8, 9]  [0, 9, 5]

Solution 2

expanded = whereabouts.explode(column='places')
filtered = expanded.loc[expanded.places.isin(exposed)]
aggregated = filtered.groupby('person_id')[['places']].agg(list)
aggregated.rename(columns={'places':'exposures'}, inplace=True)
whereabouts = pd.merge(left=whereabouts, right=aggregated, how='left', on='person_id')
whereabouts['exposures'] = whereabouts.exposures.apply(lambda x: x if isinstance(x, list) else [])

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts