Contents

Python Pandas For Your Grandpa - 3.6 DataFrame View vs Copy

As with Series, when you work with DataFrames, it’s important to be aware of when you’re copying data and when you’re referencing data. In this section, we’ll see examples of both.

To start, we’ll make a simple DataFrame called df.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [10, 20, 30]
})
print(df)
##    x   y
## 0  1  10
## 1  2  20
## 2  3  30

Now let’s create a variable v1 and set it equal to the Series extracted from column x of df.

v1 = df['x']
print(v1)
## 0    1
## 1    2
## 2    3
## Name: x, dtype: int64

If we modify v1, notice that df also gets modified.

v1.iloc[0] = 999
print(df)
##      x   y
## 0  999  10
## 1    2  20
## 2    3  30

However, if we create another variable v2 equal to y + 1,

v2 = df['y'] + 1
print(v2)
## 0    11
## 1    21
## 2    31
## Name: y, dtype: int64

Then if we modify v2 this time, df doesn’t change.

v2.iloc[0] = 0
print(df)
##      x   y
## 0  999  10
## 1    2  20
## 2    3  30

Now let’s set v1 equal to the first two rows of df using iloc with a list of indices.

v1 = df.iloc[[0, 1]]
print(v1)
##      x   y
## 0  999  10
## 1    2  20

And let’s set v2 equal to the first two rows of df using iloc with slicing.

v2 = df.iloc[:2]
print(v2)
##      x   y
## 0  999  10
## 1    2  20

If we modify v1, Pandas throws up a warning telling us that we’re modifying a copy of a slice of df. And indeed, when we inspect df after modifying v1, we see that it’s unchanged because v1 really was a copy.

v1.iloc[1] = 111
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:1637: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   self._setitem_single_block(indexer, value, name)
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:692: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   iloc._setitem_with_indexer(indexer, value, self.name)
print(df)
##      x   y
## 0  999  10
## 1    2  20
## 2    3  30

On the other hand if we modify v2, df also changes which means v2 is just a view or reference of df. In other words, v2 is literally just pointing to df’s data in memory.

v2.iloc[1] = 222
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:1637: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   self._setitem_single_block(indexer, value, name)
## /Users/bgorman/opt/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/indexing.py:692: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   iloc._setitem_with_indexer(indexer, value, self.name)
print(df)
##      x    y
## 0  999   10
## 1  222  222
## 2    3   30

To avoid this behavior, use DataFrame’s .copy() method to explicitly copy the underlying data. For example,

df.iloc[:2].copy()
##      x    y
## 0  999   10
## 1  222  222

This example is particularly interesting because we get the same warning saying that we’re modifying a copy of a slice from a DataFrame - but that’s actually incorrect. In this case, v2 is a view - not a copy. Kind of annoyingly, this warning is known for throwing up false positives.

Now, this behavior of assigning a view of df that has direct access to df’s underlying data is actually pretty cool, and can be handy in some scenarios. However, if it’s is not what you intended, it can cause all sorts of trouble since, sometimes you’ll be modifying a view and not realize that you’re also modifying the original DataFrame.


Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts

Additional Content

  1. Python NumPy For Your Grandma
  2. Neural Networks For Your Dog
  3. Introduction To Google Colab