# Python NumPy For Your Grandma - 5.2 View vs Copy

Contents

In this section, we’ll shed some light on when array indexing produces a view and when it produces a copy.

Say we have this 2d array, `squid`.

``````import numpy as np

squid = np.arange(12).reshape(3,-1)
print(squid)
## [[ 0  1  2  3]
##  [ 4  5  6  7]
##  [ 8  9 10 11]]
``````

And we index the array like this, where we use slices to pick out every row and the first two columns.

``````ward = squid[:, :2]
print(ward)
## [[0 1]
##  [4 5]
##  [8 9]]
``````

Instead of using a slice for the column subset, we could accomplish the same thing with an index array, like this.

``````sponge = squid[:, [0,1]]
print(sponge)
## [[0 1]
##  [4 5]
##  [8 9]]
``````

On the surface these techniques seem to produce the same result. But there’s a subtle and really important difference. `ward` is actually a view of the `squid` array whereas `sponge` is a copy of the `squid` array - well at least the portion we subsetted.

You can see this a couple different ways. If I set the first element of `sponge` to 100, obviously `sponge` gets modified but `squid` and `ward` stay the same, as you’d expect.

``````sponge[0,0] = 100
print(sponge)
## [[100   1]
##  [  4   5]
##  [  8   9]]
print(squid)
## [[ 0  1  2  3]
##  [ 4  5  6  7]
##  [ 8  9 10 11]]
print(ward)
## [[0 1]
##  [4 5]
##  [8 9]]
``````

But if I set the first element of `ward` to 100, not only does it modify `ward`, but it also changes `squid`.

``````ward[0,0] = 100
print(ward)
## [[100   1]
##  [  4   5]
##  [  8   9]]
print(squid)
## [[100   1   2   3]
##  [  4   5   6   7]
##  [  8   9  10  11]]
``````

This can lead to some nasty, undetected bugs if you’re not careful. Another way you can see this is if you look up the memory address of the beginning of each array.

``````print(squid.__array_interface__)
## {'data': (140212626556672, False), 'strides': None, 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 4), 'version': 3}
print(ward.__array_interface__)
## {'data': (140212626556672, False), 'strides': (32, 8), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 2), 'version': 3}
print(sponge.__array_interface__)
## {'data': (140212894476336, False), 'strides': (8, 24), 'descr': [('', '<i8')], 'typestr': '<i8', 'shape': (3, 2), 'version': 3}
``````

You can see where `squid` and `ward` have the same memory address, unlike `sponge`. But that memory address points to the beginning of each array, so this method only identifies a view if it includes the 1st element of the original array.

A better technique is to use NumPy’s `shares_memory()` function which is specifically designed to tell if two arrays have overlapping memory. Here you can see it in action.

``````print(np.shares_memory(squid, ward))
## True
print(np.shares_memory(squid, sponge))
## False
``````

In general, when you subset an array using nothing but slices, you’re gonna get back a view of the original array. You can force NumPy to copy the data by appending `.copy()` to the end of your statement, like this.

``````squid[:, :2].copy()
## array([[100,   1],
##        [  4,   5],
##        [  8,   9]])
``````

And when you subset an array using at least one index array, NumPy will automatically copy the data so you don’t have to worry about overwriting the original array.