Contents

Python NumPy For Your Grandma - 5.2 View vs Copy

In this section, we’ll shed some light on when array indexing produces a view and when it produces a copy.

Say we have this 2d array, squid.

import numpy as np

squid = np.arange(12).reshape(3,-1)
print(squid)
## [[ 0  1  2  3]
##  [ 4  5  6  7]
##  [ 8  9 10 11]]

And we index the array like this, where we use slices to pick out every row and the first two columns.

ward = squid[:, :2]
print(ward)
## [[0 1]
##  [4 5]
##  [8 9]]

Instead of using a slice for the column subset, we could accomplish the same thing with an index array, like this.

sponge = squid[:, [0,1]]
print(sponge)
## [[0 1]
##  [4 5]
##  [8 9]]

On the surface these techniques seem to produce the same result. But there’s a subtle and really important difference. ward is actually a view of the squid array whereas sponge is a copy of the squid array - well at least the portion we subsetted.

You can see this a couple different ways. If I set the first element of sponge to 100, obviously sponge gets modified but squid and ward stay the same, as you’d expect.

sponge[0,0] = 100
print(sponge)
## [[100   1]
##  [  4   5]
##  [  8   9]]
print(squid)
## [[ 0  1  2  3]
##  [ 4  5  6  7]
##  [ 8  9 10 11]]
print(ward)
## [[0 1]
##  [4 5]
##  [8 9]]

But if I set the first element of ward to 100, not only does it modify ward, but it also changes squid.

ward[0,0] = 100
print(ward)
## [[100   1]
##  [  4   5]
##  [  8   9]]
print(squid)
## [[100   1   2   3]
##  [  4   5   6   7]
##  [  8   9  10  11]]

This can lead to some nasty, undetected bugs if you’re not careful. Another way you can see this is if you look up the memory address of the beginning of each array.

print(squid.__array_interface__)
## {'data': (140702101464656, False), 'strides': None, 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 4), 'version': 3}
print(ward.__array_interface__)
## {'data': (140702101464656, False), 'strides': (32, 8), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 2), 'version': 3}
print(sponge.__array_interface__)
## {'data': (140702103415088, False), 'strides': (8, 24), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 2), 'version': 3}

You can see where squid and ward have the same memory address, unlike sponge. But that memory address points to the beginning of each array, so this method only identifies a view if it includes the 1st element of the original array.

A better technique is to use NumPy’s shares_memory() function which is specifically designed to tell if two arrays have overlapping memory. Here you can see it in action.

print(np.shares_memory(squid, ward))
## True
print(np.shares_memory(squid, sponge))
## False

In general, when you subset an array using nothing but slices, you’re gonna get back a view of the original array. You can force NumPy to copy the data by appending .copy() to the end of your statement, like this.

squid[:, :2].copy()
## array([[100,   1],
##        [  4,   5],
##        [  8,   9]])

And when you subset an array using at least one index array, NumPy will automatically copy the data so you don’t have to worry about overwriting the original array.


Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Basic Array Stuff
    2.1 NumPy Array Motivation
    2.2 NumPy Array Basics
    2.3 Creating NumPy Arrays
    2.4 Indexing 1-D Arrays
    2.5 Indexing Multidimensional Arrays
    2.6 Basic Math On Arrays
    2.7 Challenge: High School Reunion
    2.8 Challenge: Gold Miner
    2.9 Challenge: Chic-fil-A
  3. Intermediate Array Stuff
    3.1 Broadcasting
    3.2 newaxis
    3.3 reshape()
    3.4 Boolean Indexing
    3.5 nan
    3.6 infinity
    3.7 random
    3.8 Challenge: Love Distance
    3.9 Challenge: Professor Prick
    3.10 Challenge: Psycho Parent
  4. Common Operations
    4.1 where()
    4.2 Math Functions
    4.3 all() and any()
    4.4 concatenate()
    4.5 Stacking
    4.6 Sorting
    4.7 unique()
    4.8 Challenge: Movie Ratings
    4.9 Challenge: Big Fish
    4.10 Challenge: Taco Truck
  5. Advanced Array Stuff
    5.1 Advanced Array Indexing
    5.2 View vs Copy
    5.3 Challenge: Population Verification
    5.4 Challenge: Prime Locations
    5.5 Challenge: The Game of Doors
    5.6 Challenge: Peanut Butter
  6. Final Boss
    6.1 as_strided()
    6.2 einsum()
    6.3 Challenge: One-Hot-Encoding
    6.4 Challenge: Cumulative Rainfall
    6.5 Challenge: Table Tennis
    6.6 Challenge: Where’s Waldo
    6.7 Challenge: Outer Product