Contents

Python Pandas For Your Grandpa - 4.6 Challenge: Class Transitions

Setup

You have a DataFrame called schedules that represents the daily schedule of each student in a school. For example, If Ryan attends four classes - math, english, history, and chemistry, your schedules DataFrame will have four rows for Ryan in the order he attends each class.

You have this idea that the sequence of class-to-class transitions affects students' grades. For instance, you suspect Ryan would do better in his Chemistry class if it immediately followed his math class instead of his History class.

Determine the average and median Chemistry grade for groups of students based on the class they have immediately prior to Chemistry. Also report how many students fall into each group.

import numpy as np
import pandas as pd

generator = np.random.default_rng(seed=1234)
classes = ['english', 'math', 'history', 'chemistry', 'gym', 'civics', 'writing', 'engineering']

schedules = pd.DataFrame({
    'student_id':np.repeat(np.arange(100), 4),
    'class':generator.choice(classes, size=400, replace=True)
}).drop_duplicates()
schedules['grade'] = generator.integers(101, size=schedules.shape[0])
print(schedules)
##      student_id        class  grade
## 0             0  engineering     86
## 3             0    chemistry     75
## 4             1         math     85
## 5             1  engineering      0
## 6             1      english     73
## ..          ...          ...    ...
## 394          98      writing     16
## 395          98       civics     89
## 396          99  engineering     90
## 398          99         math     55
## 399          99      history     31
## 
## [339 rows x 3 columns]

Solution

schedules['prev_class'] = schedules.groupby('student_id')['class'].transform(pd.Series.shift)
class_pairs = schedules.groupby(['prev_class', 'class']).agg(
    students = ('student_id', 'count'),
    avg_grade = ('grade', 'mean'),
    med_grade = ('grade', 'median')
)
class_pairs.xs(key='chemistry', axis=0, level=1, drop_level=False).sort_values('med_grade')
##                        students  avg_grade  med_grade
## prev_class  class                                    
## math        chemistry         3  26.000000       23.0
## history     chemistry         3  35.333333       27.0
## english     chemistry         6  32.666667       31.5
## engineering chemistry         6  43.333333       43.5
## writing     chemistry         3  45.333333       46.0
## civics      chemistry         2  55.000000       55.0
## gym         chemistry         2  68.500000       68.5

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts

Additional Content

  1. Python NumPy For Your Grandma
  2. Neural Networks For Your Dog
  3. Introduction To Google Colab