Contents

Python NumPy For Your Grandma - 4.6 Sorting

In this section, we’ll see how you can use NumPy’s sort() function to sort the elements of an array.

sort() takes three primary parameters:

  1. the array you want to sort
  2. the axis along which to sort - the default, -1, sorts along the last axis
  3. the kind of sort you want NumPy to implement. By default, NumPy implements quicksort.

For example, here we make a 1d array, foo, and then sort it in ascending order.

import numpy as np

foo = np.array([1, 7, 3, 9, 0, 9, 1])
np.sort(foo)
## array([0, 1, 1, 3, 7, 9, 9])

Note that the original array remains unchanged.

print(foo)
## [1 7 3 9 0 9 1]

If you want to sort the values of foo “in place”, use the .sort() method of the array object. In other words, if you do foo.sort(), this time foo updates with its values in sorted order.

foo.sort()
print(foo)
## [0 1 1 3 7 9 9]

If you have an array with nan values like this one, sort() pushes them to the end of the array.

bar = np.array([5, np.nan, 3, 11])
np.sort(bar)
## array([ 3.,  5., 11., nan])

Unfortunately NumPy doesn’t have an easy, direct way of sorting arrays in descending order. However, with a bit of thought, we can cook something up. Two methods really stand out.

The first is to sort the array in ascending order and then reverse the result.

np.sort(bar)[::-1]
## array([nan, 11.,  5.,  3.])

The second is to negate the array’s values, sort those in ascending order, and then negate that result.

-np.sort(-bar)
## array([11.,  5.,  3., nan])

The main difference between these techniques is that the 1st method pushes nan values to the front of the array and the 2nd method pushes nans to the back. Also, the second method won’t work on strings since you can’t negate a string.

What if you wanted to sort a multidimensional array like this?

boo = np.array([
    [55, 10, 12],
    [20, 0, 33],
    [55, 92, 3]
])

In this case, you can use the axis parameter of the sort() function to specify which axis to sort along. For example, if we do np.sort(boo, axis = 0), it sorts boo along axis 0. In other words, it sorts each column of boo.

np.sort(boo, axis = 0)
## array([[20,  0,  3],
##        [55, 10, 12],
##        [55, 92, 33]])

And if we do np.sort(boo, axis = 1), it sorts boo along axis 1. In other words, it sorts each row of boo.

np.sort(boo, axis = 1)
## array([[10, 12, 55],
##        [ 0, 20, 33],
##        [ 3, 55, 92]])

We can also set axis = -1 to tell NumPy to sort the last axis of an array. In this case, it’d be like sorting boo along axis 1.

np.sort(a = boo, axis = -1)
## array([[10, 12, 55],
##        [ 0, 20, 33],
##        [ 3, 55, 92]])

Cool, but what if we wanted to sort the rows of boo according to, say, the values in the 1st column? If we had something to give us the array [1, 0, 2], we could pop that into the the row index and get back our desired sorted array.

boo[[1, 0, 2]]
## array([[20,  0, 33],
##        [55, 10, 12],
##        [55, 92,  3]])

The tool we’re looking for is argsort(). argsort() works just like sort(), except it returns an array of indices indicating the position each value of the array would map to in the sorted case.

For example, if you had the array [3, 0, 10, 5] and you called argsort() on it, you’d get back the array [1, 0, 3, 2], because the smallest element is at index 1, the second smallest element is at index 0, and so on. If you used this array to index the original array, you’d get back a sorted array just as if you called np.sort().

goo = np.array([3, 0, 10, 5])  # [3, 0, 10, 5]
np.argsort(goo)                # [1, 0,  3,  2]
## array([1, 0, 3, 2])
goo[np.argsort(goo)]           # [0, 3,  5  10]
## array([ 0,  3,  5, 10])

So, if you wanted to sort a 2d array’s rows based on a certain column, you just have to call argsort() on that column’s values and use the result to select rows from the original array.

Looking back at our 2d array, boo, we could sort its rows by the 1st column, ascending like this.

boo[np.argsort(boo[:, 0])]
## array([[20,  0, 33],
##        [55, 10, 12],
##        [55, 92,  3]])

This brings up an important question. If our array has repeated values, like the 55 in this case, how do we guarantee that sorting them won’t alter the order they appear in the original array? For example, these two matrices are both valid sorts of boo along its first column, but only the first matrix doesn’t mix up the order of the 55s.

print(boo[[1, 0, 2]])
## [[20  0 33]
##  [55 10 12]
##  [55 92  3]]
print(boo[[1, 2, 0]])
## [[20  0 33]
##  [55 92  3]
##  [55 10 12]]

This is what’s known as a stable sorting algorithm. By default, np.sort() and np.argsort don’t use a stable sort algorithm, but you can force it to by setting kind=stable.


Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Basic Array Stuff
    2.1 NumPy Array Motivation
    2.2 NumPy Array Basics
    2.3 Creating NumPy Arrays
    2.4 Indexing 1-D Arrays
    2.5 Indexing Multidimensional Arrays
    2.6 Basic Math On Arrays
    2.7 Challenge: High School Reunion
    2.8 Challenge: Gold Miner
    2.9 Challenge: Chic-fil-A
  3. Intermediate Array Stuff
    3.1 Broadcasting
    3.2 newaxis
    3.3 reshape()
    3.4 Boolean Indexing
    3.5 nan
    3.6 infinity
    3.7 random
    3.8 Challenge: Love Distance
    3.9 Challenge: Professor Prick
    3.10 Challenge: Psycho Parent
  4. Common Operations
    4.1 where()
    4.2 Math Functions
    4.3 all() and any()
    4.4 concatenate()
    4.5 Stacking
    4.6 Sorting
    4.7 unique()
    4.8 Challenge: Movie Ratings
    4.9 Challenge: Big Fish
    4.10 Challenge: Taco Truck
  5. Advanced Array Stuff
    5.1 Advanced Array Indexing
    5.2 View vs Copy
    5.3 Challenge: Population Verification
    5.4 Challenge: Prime Locations
    5.5 Challenge: The Game of Doors
    5.6 Challenge: Peanut Butter
  6. Final Boss
    6.1 as_strided()
    6.2 einsum()
    6.3 Challenge: One-Hot-Encoding
    6.4 Challenge: Cumulative Rainfall
    6.5 Challenge: Table Tennis
    6.6 Challenge: Where’s Waldo
    6.7 Challenge: Outer Product

Additional Content

  1. Python Pandas For Your Grandpa
  2. Neural Networks For Your Dog
  3. Introduction To Google Colab