Python Pandas For Your Grandpa - 3.6 DataFrame View vs Copy
As with Series, when you work with DataFrames, it’s important to be aware of when you’re copying data and when you’re referencing data. In this section, we’ll see examples of both.
To start, we’ll make a simple DataFrame called df
.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3],
'y': [10, 20, 30]
})
print(df)
## x y
## 0 1 10
## 1 2 20
## 2 3 30
Now let’s create a variable v1
and set it equal to the Series extracted from column x
of df
.
v1 = df['x']
print(v1)
## 0 1
## 1 2
## 2 3
## Name: x, dtype: int64
If we modify v1
, notice that df
also gets modified.
v1.iloc[0] = 999
print(df)
## x y
## 0 999 10
## 1 2 20
## 2 3 30
However, if we create another variable v2
equal to y + 1
,
v2 = df['y'] + 1
print(v2)
## 0 11
## 1 21
## 2 31
## Name: y, dtype: int64
Then if we modify v2
this time, df
doesn’t change.
v2.iloc[0] = 0
print(df)
## x y
## 0 999 10
## 1 2 20
## 2 3 30
Now let’s set v1
equal to the first two rows of df
using iloc
with a list of indices.
v1 = df.iloc[[0, 1]]
print(v1)
## x y
## 0 999 10
## 1 2 20
And let’s set v2
equal to the first two rows of df
using iloc
with slicing.
v2 = df.iloc[:2]
print(v2)
## x y
## 0 999 10
## 1 2 20
If we modify v1
, Pandas throws up a warning telling us that we’re modifying a copy of a slice of df
. And indeed, when we inspect df
after modifying v1
, we see that it’s unchanged because v1
really was a copy.
v1.iloc[1] = 111
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:1636: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
## self._setitem_single_block(indexer, value, name)
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:691: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
## iloc._setitem_with_indexer(indexer, value, self.name)
print(df)
## x y
## 0 999 10
## 1 2 20
## 2 3 30
On the other hand if we modify v2
, df
also changes which means v2
is just a view or reference of df
. In other words, v2
is literally just pointing to df
's data in memory.
v2.iloc[1] = 222
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:1636: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
## self._setitem_single_block(indexer, value, name)
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:691: SettingWithCopyWarning:
## A value is trying to be set on a copy of a slice from a DataFrame
##
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
## iloc._setitem_with_indexer(indexer, value, self.name)
print(df)
## x y
## 0 999 10
## 1 222 222
## 2 3 30
To avoid this behavior, use DataFrame’s .copy()
method to explicitly copy the underlying data. For example,
df.iloc[:2].copy()
## x y
## 0 999 10
## 1 222 222
This example is particularly interesting because we get the same warning saying that we’re modifying a copy of a slice from a DataFrame - but that’s actually incorrect. In this case, v2
is a view - not a copy. Kind of annoyingly, this warning is known for throwing up false positives.
Now, this behavior of assigning a view of df
that has direct access to df
's underlying data is actually pretty cool, and can be handy in some scenarios. However, if it’s is not what you intended, it can cause all sorts of trouble since, sometimes you’ll be modifying a view and not realize that you’re also modifying the original DataFrame.
Course Curriculum
- Introduction
1.1 Introduction - Series
2.1 Series Creation
2.2 Series Basic Indexing
2.3 Series Basic Operations
2.4 Series Boolean Indexing
2.5 Series Missing Values
2.6 Series Vectorization
2.7 Seriesapply()
2.8 Series View vs Copy
2.9 Challenge: Baby Names
2.10 Challenge: Bees Knees
2.11 Challenge: Car Shopping
2.12 Challenge: Price Gouging
2.13 Challenge: Fair Teams - DataFrame
3.1 DataFrame Creation
3.2 DataFrame To And From CSV
3.3 DataFrame Basic Indexing
3.4 DataFrame Basic Operations
3.5 DataFrameapply()
3.6 DataFrame View vs Copy
3.7 DataFramemerge()
3.8 DataFrame Aggregation
3.9 DataFramegroupby()
3.10 Challenge: Hobbies
3.11 Challenge: Party Time
3.12 Challenge: Vending Machines
3.13 Challenge: Cradle Robbers
3.14 Challenge: Pot Holes - Advanced
4.1 Strings
4.2 Dates And Times
4.3 Categoricals
4.4 MultiIndex
4.5 DataFrame Reshaping
4.6 Challenge: Class Transitions
4.7 Challenge: Rose Thorn
4.8 Challenge: Product Volumes
4.9 Challenge: Session Groups
4.10 Challenge: OB-GYM - Final Boss
5.1 Challenge: COVID Tracing
5.2 Challenge: Pickle
5.3 Challenge: TV Commercials
5.4 Challenge: Family IQ
5.5 Challenge: Concerts