# Python Pandas For Your Grandpa - 3.14 Challenge: Pot Holes

Contents

## Setup

Fed up with your city’s roads, you go around collecting data on `potholes` in your area. Due to an unfortunate coffee spill, you lost bits and pieces of your data. So, given your DataFrame of pothole measurements, discard rows where more than half the values are `NaN`, otherwise impute `NaN`s with the average value per column unless the column is non-numeric, in which case use the mode.

``````import numpy as np
import pandas as pd

potholes = pd.DataFrame({
'length':[5.1, np.nan, 6.2, 4.3, 6.0, 5.1, 6.5, 4.3, np.nan, np.nan],
'width':[2.8, 5.8, 6.5, 6.1, 5.8, np.nan, 6.3, 6.1, 5.4, 5.0],
'depth':[2.6, np.nan, 4.2, 0.8, 2.6, np.nan, 3.9, 4.8, 4.0, np.nan],
'location':pd.Series(['center', 'north edge', np.nan, 'center', 'north edge', 'center', 'west edge',
'west edge', np.nan, np.nan], dtype='string')
})
``````

## Solution

``````drop_rows = potholes.isnull().sum(axis=1) > potholes.shape[1]/2
potholes.fillna(potholes.mean(), inplace=True)
potholes.location.fillna(potholes.location.mode().iat[0], inplace=True)
potholes = potholes.loc[~drop_rows]
print(potholes)
##      length     width     depth    location
## 0  5.100000  2.800000  2.600000      center
## 1  5.357143  5.800000  3.271429  north edge
## 2  6.200000  6.500000  4.200000      center
## 3  4.300000  6.100000  0.800000      center
## 4  6.000000  5.800000  2.600000  north edge
## 5  5.100000  5.533333  3.271429      center
## 6  6.500000  6.300000  3.900000   west edge
## 7  4.300000  6.100000  4.800000   west edge
## 8  5.357143  5.400000  4.000000      center
``````