Python Pandas For Your Grandpa - 4.6 Challenge: Class Transitions
Contents
Setup
You have a DataFrame called schedules
that represents the daily schedule of each student in a school. For example, If Ryan attends four classes - math, english, history, and chemistry, your schedules
DataFrame will have four rows for Ryan in the order he attends each class.
You have this idea that the sequence of class-to-class transitions affects students’ grades. For instance, you suspect Ryan would do better in his Chemistry class if it immediately followed his math class instead of his History class.
Determine the average and median Chemistry grade for groups of students based on the class they have immediately prior to Chemistry. Also report how many students fall into each group.
import numpy as np
import pandas as pd
generator = np.random.default_rng(seed=1234)
classes = ['english', 'math', 'history', 'chemistry', 'gym', 'civics', 'writing', 'engineering']
schedules = pd.DataFrame({
'student_id':np.repeat(np.arange(100), 4),
'class':generator.choice(classes, size=400, replace=True)
}).drop_duplicates()
schedules['grade'] = generator.integers(101, size=schedules.shape[0])
print(schedules)
## student_id class grade
## 0 0 engineering 86
## 3 0 chemistry 75
## 4 1 math 85
## 5 1 engineering 0
## 6 1 english 73
## .. ... ... ...
## 394 98 writing 16
## 395 98 civics 89
## 396 99 engineering 90
## 398 99 math 55
## 399 99 history 31
##
## [339 rows x 3 columns]
Solution
schedules['prev_class'] = schedules.groupby('student_id')['class'].transform(pd.Series.shift)
class_pairs = schedules.groupby(['prev_class', 'class']).agg(
students = ('student_id', 'count'),
avg_grade = ('grade', 'mean'),
med_grade = ('grade', 'median')
)
class_pairs.xs(key='chemistry', axis=0, level=1, drop_level=False).sort_values('med_grade')
## students avg_grade med_grade
## prev_class class
## math chemistry 3 26.000000 23.0
## history chemistry 3 35.333333 27.0
## english chemistry 6 32.666667 31.5
## engineering chemistry 6 43.333333 43.5
## writing chemistry 3 45.333333 46.0
## civics chemistry 2 55.000000 55.0
## gym chemistry 2 68.500000 68.5
Course Curriculum
- Introduction
1.1 Introduction - Series
2.1 Series Creation
2.2 Series Basic Indexing
2.3 Series Basic Operations
2.4 Series Boolean Indexing
2.5 Series Missing Values
2.6 Series Vectorization
2.7 Seriesapply()
2.8 Series View vs Copy
2.9 Challenge: Baby Names
2.10 Challenge: Bees Knees
2.11 Challenge: Car Shopping
2.12 Challenge: Price Gouging
2.13 Challenge: Fair Teams - DataFrame
3.1 DataFrame Creation
3.2 DataFrame To And From CSV
3.3 DataFrame Basic Indexing
3.4 DataFrame Basic Operations
3.5 DataFrameapply()
3.6 DataFrame View vs Copy
3.7 DataFramemerge()
3.8 DataFrame Aggregation
3.9 DataFramegroupby()
3.10 Challenge: Hobbies
3.11 Challenge: Party Time
3.12 Challenge: Vending Machines
3.13 Challenge: Cradle Robbers
3.14 Challenge: Pot Holes - Advanced
4.1 Strings
4.2 Dates And Times
4.3 Categoricals
4.4 MultiIndex
4.5 DataFrame Reshaping
4.6 Challenge: Class Transitions
4.7 Challenge: Rose Thorn
4.8 Challenge: Product Volumes
4.9 Challenge: Session Groups
4.10 Challenge: OB-GYM - Final Boss
5.1 Challenge: COVID Tracing
5.2 Challenge: Pickle
5.3 Challenge: TV Commercials
5.4 Challenge: Family IQ
5.5 Challenge: Concerts