Python NumPy For Your Grandma - 5.1 Advanced Array Indexing
Earlier in the course we discussed array indexing techniques, but the truth is I glossed over a lot of gritty details and complex scenarios. In this section, we’ll take a deeper dive into how array indexing works.
Let’s start by setting up a 3x2x4 array of integers called
import numpy as np foo = np.arange(3*2*4).reshape((3,2,4)) print(foo) ## [[[ 0 1 2 3] ## [ 4 5 6 7]] ## ## [[ 8 9 10 11] ## [12 13 14 15]] ## ## [[16 17 18 19] ## [20 21 22 23]]]
What do you think the result of
foo[:,:,0] will return? Recall the mental model for interpretting an N-dimensional array..
- If you have a 1d array, think of a single row of values.
- If you have a 2d array, think of a matrix.
- If you have a 3d array, think of a row of matrices.
- If you have a 4d array, think of a matrix of matrices, and so on.
In this case, we can think of
foo as a row of matrices where element (i,j,k) corresponds to the ith matrix, the jth row, and the kth column. So when we do
foo[:,:,0], we’re requesting “every matrix, every row, column 0”, and so we get back this sub-array.
foo[:,:,0] ## array([[ 0, 4], ## [ 8, 12], ## [16, 20]])
The thing that might be surprising in this case is that we started with a 3-dimensional array and we got back a 2-dimensional array. It kind of makes sense because we only picked out one column from
foo so there’s really no need for NumPy to retain that third dimension. But as soon as we select more than one column, for example like this
foo[:,:,[0,1]] ## array([[[ 0, 1], ## [ 4, 5]], ## ## [[ 8, 9], ## [12, 13]], ## ## [[16, 17], ## [20, 21]]])
we’ll get back an array with three dimensions. I’ll explain why that is in a minute, but let’s look at a few more examples first.
What do you think we’ll get back if we do a subset like
foo[[[0,2], [2,0], [1,1]], [[0,0], [0,0], [1,1]], [[0,1], [0,2], [0,3]]]? Did you come up with this output array?
foo[[[0,2], [2,0], [1,1]], [[0,0], [0,0], [1,1]], [[0,1], [0,2], [0,3]]] ## array([[ 0, 17], ## [16, 2], ## [12, 15]])
This is a really critical thing to understand. When every dimension is indexed with an array, and each of those arrays is the same shape, the output array will be the same shape as the index arrays.
In this case each of our index arrays is a 3x2 matrix, so we know our result will also be a 3x2 matrix. To understand the values in the result, you could imagine zipping our three index matrices into a matrix of 3-element tuples where each tuple gives the location of the corresponding output element.
Just to drive this point home, take a look at this exmple, where we use 4-dimensional index arrays to extract values from our 3-dimensional array,
foo, and we get back a 4-dimensional array.
idx = np.zeros(shape=(2,2,2,2), dtype='int64') result = foo[idx, idx, idx] result.shape ## (2, 2, 2, 2)
Again, the shape of the output is dependent on the shape of the index arrays, not the shape of the array you’re indexing into.
Okay, so we’ve considered the case where every dimension has an index array and each of the index arrays are the same shape. What if every dimension has an index array, but the index arrays have different shapes? For example, what do you think the output of
foo[[0,1], [0,1], [,,]] will be?
Did you get this?
foo[[0,1], [0,1], [,,]] ## array([[ 0, 12], ## [ 1, 13], ## [ 2, 14]])
In cases like this, NumPy broadcasts the index arrays, so that in essence, we get back to the previous case where each index array is the same shape. In this case, we have a (2,) array, another (2,) array and a (3,1) array so those are going to broadcast into (3,2) arrays.
foo with equivalently shaped index arrays is straight-forward based on what we already covered.
Lastly, we consider the case where our array contains slice indexers, i.e. colons. For example, what do you think the result of
foo[[0,0,2,2],:,[,,]] will be?
Did you come up with this 3x4x2 array?
foo[[0,0,2,2],:,[,,]] ## array([[[ 0, 4], ## [ 0, 4], ## [16, 20], ## [16, 20]], ## ## [[ 1, 5], ## [ 1, 5], ## [17, 21], ## [17, 21]], ## ## [[ 2, 6], ## [ 2, 6], ## [18, 22], ## [18, 22]]])
What’s going on here? The mental goal to figuring these things out is to build fully expanded, same-shaped index arrays for each dimension. And we do that as follows:
- broadcast all index arrays
- for each slicer: 2.1 copy each index array N times along a new last axis where N equals the size of the current slicer’s dimension 2.2 represent the current slicer with an index array
Alright, let’s walk through that last example. We start by broadcasting our indexing arrays. In this case we have a (4,) array in the i index and a (3,1) array in the k index. Those broadcast to the following (3,4) arrays
# i # [[0, 0, 2, 2], # [0, 0, 2, 2], # [0, 0, 2, 2]] # k # [[0, 0, 0, 0], # [1, 1, 1, 1], # [2, 2, 2, 2]]
Now let’s make up an index array for j, the same shape as these guys and fill it with 0s.
# j # [[0, 0, 0, 0], # [0, 0, 0, 0], # [0, 0, 0, 0]]
These index arrays almost give us what we want. The problem is that j is always picking out the first row of
foo, but we want it to span every row of
foo. In other words, given some (i,k) pair, we want to retrieve every possible j. And since
foo has 2 rows, you can imagine that we replace each 0 in j with an array of every possible row index. So in this case, we replace each 0 with the array [0,1]. And then we’ll need to expand i and k accordingly by copying each of their elements twice. So our expanded index arrays all have the shape (3,4,2) and indeed that’s the same shape as our output.
Note that the slicer (i.e. the colon) we used in this example is a full slice but we could use fancier slices like
::2 which gets every second element or
:3:-1 which gets the first three elements in backwards order. And with these fancier slices I hope it’s clear how you’d tweak our methodology to determine a resulting array.
Another thing to note is that we can leave out trailing indexes in which case NumPy assumes we want all the values from those excluded dimensions. For example, if we do
foo[[0,1]] it’s like saying “give me the complete first and second matrices in
foo”, and it’d be equivalent to doing
foo[[0,1], :, :].
foo[[0,1]] ## array([[[ 0, 1, 2, 3], ## [ 4, 5, 6, 7]], ## ## [[ 8, 9, 10, 11], ## [12, 13, 14, 15]]])
Now let’s circle back to an earlier question. When we do
foo[:,:,0], why do we get back a two-dimensional array when we started with a 3-dimensional array? Well, let’s follow our algorithm for determining the output shape.. We start by broadcasting all the index arrays. In this case there’s only one so we don’t have to do any broadcasting, but what’s the shape of a scalar like 0? It’s actually empty or 0-dimensional. You can see this by calling
np.array(0).shape ## ()
So we have an empty dimensional index. Then we concatenate the size of each dimension with a slice, so we end up with a 3x2 result. If we had wrapped that 0 in square brackets, it’d be a 1-dimensional array, and so our result would be 3-dimensional.
foo[:,:,] ## array([[[ 0], ## [ 4]], ## ## [[ 8], ## ], ## ## [, ## ]])
- Basic Array Stuff
2.1 NumPy Array Motivation
2.2 NumPy Array Basics
2.3 Creating NumPy Arrays
2.4 Indexing 1-D Arrays
2.5 Indexing Multidimensional Arrays
2.6 Basic Math On Arrays
2.7 Challenge: High School Reunion
2.8 Challenge: Gold Miner
2.9 Challenge: Chic-fil-A
- Intermediate Array Stuff
3.4 Boolean Indexing
3.8 Challenge: Love Distance
3.9 Challenge: Professor Prick
3.10 Challenge: Psycho Parent
- Common Operations
4.2 Math Functions
4.8 Challenge: Movie Ratings
4.9 Challenge: Big Fish
4.10 Challenge: Taco Truck
- Advanced Array Stuff
5.1 Advanced Array Indexing
5.2 View vs Copy
5.3 Challenge: Population Verification
5.4 Challenge: Prime Locations
5.5 Challenge: The Game of Doors
5.6 Challenge: Peanut Butter
- Final Boss
6.3 Challenge: One-Hot-Encoding
6.4 Challenge: Cumulative Rainfall
6.5 Challenge: Table Tennis
6.6 Challenge: Where’s Waldo
6.7 Challenge: Outer Product