Share on:

Python Pandas For Your Grandpa | Section 2.8 | Series View Vs Copy
March 18, 2020

Table Of Contents

  1. Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Operations
    2.3 Series Basic Indexing
    2.4 Series Overwriting Data
    2.5 Series Apply
    2.6 Series Concatenation
    2.7 Series Boolean Indexing
    2.8 Series View Vs Copy
    2.9 Series Missing Values
    2.10 Series Challenges

import numpy as np
import pandas as pd

Let’s suppose you have this series, x

x = pd.Series(
    data=[2, 3, 5, 7, 11, 13],
    index=[2, 11, 12, 30, 30, 51]
)

and then you set a new variable, y, equal to x

y = x

Then you modify the first element of y to be 999.

y.iloc[0] = 999

Obviously this modifies y, but does it also modify x???

x
## 2     999
## 11      3
## 12      5
## 30      7
## 30     11
## 51     13
## dtype: int64

You might be surprised to see that even though we clearly changed y, x also gets modified. The reason this happens is because when we set y equal to x, pandas didn’t make a copy of x, it merely made y a reference to x so that the variable y actually points to the data stored by x. This is known as assignment by reference and some people would call y a “view” of x.

In order to avoid this type of behavior, when you create y, you’ll want to explicitly set it equal to a copy of x using something like

# Avoid assignment by reference
y = x.copy()

Now if you change y, x is unchanged because y’s data is stored completely separate from x’s data.

y.iloc[1] = -333

# x is unchanged
print(x)
## 2     999
## 11      3
## 12      5
## 30      7
## 30     11
## 51     13
## dtype: int64

One of the reasons this is so confusing is because assignment by reference only happens under some circumstances which aren’t clearly documented and aren’t always obvious. For example, if we have the Series

foo = pd.Series(['a', 'b', 'c', 'd'])

and we set bar as

bar = foo.loc[foo <= 'b']

and we modify bar

bar.iloc[0] = 'z'

foo doesn’t get changed which means under the hood, pandas copied data from foo to create bar.

foo
## 0    a
## 1    b
## 2    c
## 3    d
## dtype: object

Now, if we set baz = foo.iloc[:2], which is the same exact subset of bar,

baz = foo.iloc[:2]

and we modify baz

baz.iloc[0] = 'z'

this time, foo gets changed.

foo
## 0    z
## 1    b
## 2    c
## 3    d
## dtype: object

As far as I can tell, when it comes to Series, if you assign a equal to b.loc[something], pandas returns a copy, otherwise it returns a view, but this is undocumented and the rules change when we start using DataFrames. So I don’t recommend memorizing any hard and fast rules. Instead, you kind of just have to play around with things. Use .copy() to be safe and just be aware that this quirky behavior exists. I know it sounds weird, but this is the kind of thing you get a feel for over time.

Another situation where it’s important to understand if pandas is copying data is when it comes to pretty much any pandas function that modifies a Series’ data. For example, every Series has a method called replace() which basically let’s you replace values with other values. In the case of foo, we can do something like replace every ‘a’ with ‘q’ and every ‘d’ with ‘p’. For example,

foo.replace({'a':'q', 'd':'p'})
## 0    z
## 1    b
## 2    c
## 3    p
## dtype: object

The result of this method is a copy of foo with the replaced values. So we’re not actually modifying foo, we’re just building a brand new Series from it.

Of course, if you wanted to update foo with these replacements, you could just set

foo = foo.replace({'a':'q', 'd':'q'})
foo
## 0    z
## 1    b
## 2    c
## 3    q
## dtype: object

This works, but it’s highly inefficient since internally pandas creates a whole new Series, reassigns foo to it, and then deletes the old Series. To circumvent this, lots of pandas functions have a parameter called inplace which, when True, tells pandas to modify the data you’re operating on rather than return a modified copy of it. So, rather than do foo = foo.replace({'a':'q', 'd':'q'}), you can just call foo.replace({'a':'q', 'd':'q'}, inplace=True).

foo.replace({'a':'q', 'd':'q'}, inplace=True)
foo
## 0    z
## 1    b
## 2    c
## 3    q
## dtype: object


Enjoyed this article? Show your support and buy some GormAnalysis merch.
comments powered by Disqus