Contents

Python Pandas For Your Grandpa - 3.1 DataFrame Creation

In this section, we’ll look at different ways to create a DataFrame from scratch.

Perhaps the easiest way to make a DataFrame from scratch is to use the DataFrame() constructor, passing in a dictionary of ‘column name:column-values’ pairs. For example, here we build a DataFrame with two columns: ‘name’ and ‘age’, and for each column we pass in a corresponding three-element list of values.

import numpy as np
import pandas as pd

df = pd.DataFrame({'name': ['Bob', 'Sue', 'Mary'], 'age': [39, 57, 28]})
print(df)
##    name  age
## 0   Bob   39
## 1   Sue   57
## 2  Mary   28

Let’s pause for a second to talk about what exactly a DataFrame is. In short, a DataFrame is just a table of data with a row index. In this case, the row index is that unlabeled column of values on the far left. To be a little more pedantic, a DataFrame is a collection of identically-sized Series, all of which share the same index. Additionally, DataFrames have a column index for selecting and subsetting columns. We’ll touch on that more later.

Another way you can build a DataFrame is from a list of lists. In this case each inner list represents a row, so you could build the same DataFrame as before using

df = pd.DataFrame([
    ['Bob', 39],
    ['Sue', 57],
    ['Mary', 28]
], columns=['name', 'age'])
print(df)
##    name  age
## 0   Bob   39
## 1   Sue   57
## 2  Mary   28

Before we move on, let’s touch on a few important tools for inspecting DataFrames. df.info() is a great tool that basically reports everything you’d want to know about a DataFrame including its size, index type, and column types.

df.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 3 entries, 0 to 2
## Data columns (total 2 columns):
##  #   Column  Non-Null Count  Dtype 
## ---  ------  --------------  ----- 
##  0   name    3 non-null      object
##  1   age     3 non-null      int64 
## dtypes: int64(1), object(1)
## memory usage: 176.0+ bytes

df.shape tells you how many rows and columns df has, just like NumPy does with a 2d array.

df.shape
## (3, 2)

df.axes returns the row and column indexes.

df.axes
## [RangeIndex(start=0, stop=3, step=1), Index(['name', 'age'], dtype='object')]

and df.size tells you the total number of elements in the DataFrame.

df.size
## 6

And because this question comes up so frequently, I’ll deal with it here. To change the column names inside a DataFrame, you can use the .rename() method, and pass in a dictionary of ‘old-name:new-name’ pairs. And you probably want to set inplace=True, otherwise, instead of actually modifying the DataFrame you’re working with, you’ll get back a new, modified copy of it.

So in this case, if we want to change the column-name ‘age’ to ‘years’, we would do

df.rename(columns={'age':'years'}, inplace=True)
print(df)
##    name  years
## 0   Bob     39
## 1   Sue     57
## 2  Mary     28

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts