Python Pandas For Your Grandpa - 3.1 DataFrame Creation
In this section, we’ll look at different ways to create a DataFrame from scratch.
Perhaps the easiest way to make a DataFrame from scratch is to use the DataFrame()
constructor, passing in a dictionary of ‘column name:column-values’ pairs. For example, here we build a DataFrame with two columns: ‘name’ and ‘age’, and for each column we pass in a corresponding three-element list of values.
import numpy as np
import pandas as pd
df = pd.DataFrame({'name': ['Bob', 'Sue', 'Mary'], 'age': [39, 57, 28]})
print(df)
## name age
## 0 Bob 39
## 1 Sue 57
## 2 Mary 28
Let’s pause for a second to talk about what exactly a DataFrame is. In short, a DataFrame is just a table of data with a row index. In this case, the row index is that unlabeled column of values on the far left. To be a little more pedantic, a DataFrame is a collection of identically-sized Series, all of which share the same index. Additionally, DataFrames have a column index for selecting and subsetting columns. We’ll touch on that more later.
Another way you can build a DataFrame is from a list of lists. In this case each inner list represents a row, so you could build the same DataFrame as before using
df = pd.DataFrame([
['Bob', 39],
['Sue', 57],
['Mary', 28]
], columns=['name', 'age'])
print(df)
## name age
## 0 Bob 39
## 1 Sue 57
## 2 Mary 28
Before we move on, let’s touch on a few important tools for inspecting DataFrames. df.info()
is a great tool that basically reports everything you’d want to know about a DataFrame including its size, index type, and column types.
df.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 3 entries, 0 to 2
## Data columns (total 2 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 name 3 non-null object
## 1 age 3 non-null int64
## dtypes: int64(1), object(1)
## memory usage: 176.0+ bytes
df.shape
tells you how many rows and columns df
has, just like NumPy does with a 2d array.
df.shape
## (3, 2)
df.axes
returns the row and column indexes.
df.axes
## [RangeIndex(start=0, stop=3, step=1), Index(['name', 'age'], dtype='object')]
and df.size
tells you the total number of elements in the DataFrame.
df.size
## 6
And because this question comes up so frequently, I’ll deal with it here. To change the column names inside a DataFrame, you can use the .rename()
method, and pass in a dictionary of ‘old-name:new-name’ pairs. And you probably want to set inplace=True
, otherwise, instead of actually modifying the DataFrame you’re working with, you’ll get back a new, modified copy of it.
So in this case, if we want to change the column-name ‘age’ to ‘years’, we would do
df.rename(columns={'age':'years'}, inplace=True)
print(df)
## name years
## 0 Bob 39
## 1 Sue 57
## 2 Mary 28
Course Curriculum
- Introduction
1.1 Introduction - Series
2.1 Series Creation
2.2 Series Basic Indexing
2.3 Series Basic Operations
2.4 Series Boolean Indexing
2.5 Series Missing Values
2.6 Series Vectorization
2.7 Seriesapply()
2.8 Series View vs Copy
2.9 Challenge: Baby Names
2.10 Challenge: Bees Knees
2.11 Challenge: Car Shopping
2.12 Challenge: Price Gouging
2.13 Challenge: Fair Teams - DataFrame
3.1 DataFrame Creation
3.2 DataFrame To And From CSV
3.3 DataFrame Basic Indexing
3.4 DataFrame Basic Operations
3.5 DataFrameapply()
3.6 DataFrame View vs Copy
3.7 DataFramemerge()
3.8 DataFrame Aggregation
3.9 DataFramegroupby()
3.10 Challenge: Hobbies
3.11 Challenge: Party Time
3.12 Challenge: Vending Machines
3.13 Challenge: Cradle Robbers
3.14 Challenge: Pot Holes - Advanced
4.1 Strings
4.2 Dates And Times
4.3 Categoricals
4.4 MultiIndex
4.5 DataFrame Reshaping
4.6 Challenge: Class Transitions
4.7 Challenge: Rose Thorn
4.8 Challenge: Product Volumes
4.9 Challenge: Session Groups
4.10 Challenge: OB-GYM - Final Boss
5.1 Challenge: COVID Tracing
5.2 Challenge: Pickle
5.3 Challenge: TV Commercials
5.4 Challenge: Family IQ
5.5 Challenge: Concerts