Contents

Python NumPy For Your Grandma | Section 4.6 | Sorting

Contents

Course Contents

  1. Introduction
  2. NumPy Arrays
    2.1 What’s A NumPy Array
    2.2 Creating NumPy Arrays
    2.3 Indexing And Modifying 1-D Arrays
    2.4 Indexing And Modifying Multidimensional Arrays
    2.5 Basic Math
  3. Intermediate Array Stuff
    3.1 Broadcasting
    3.2 newaxis
    3.3 reshape
    3.4 boolean indexing
    3.5 nan
    3.6 infinity
    3.7 random
  4. Common Operations
    4.1 where
    4.2 Math Funcs
    4.3 all and any
    4.4 concatenate
    4.5 Stacking
    4.6 Sorting
    4.7 unique
  5. Challenges

This video covers how to sort a NumPy array using the sort() function.

Code

import numpy as np


# sort a 1d array in ascending order
foo = np.array([1, 7, 3, 9, 0, 9, 1])
np.sort(foo)

# If you have an array with nan values, sort pushes them to the end of the array
bar = np.array([5, np.nan, 3, 11])
np.sort(bar)  # [ 3.,  5., 11., nan]

# sort an array in descending order
np.sort(bar)[::-1]  # reverse the sorted array
-np.sort(-bar)      # negate the sorted values of the negated array

# If you need a stable sorting algorithm, set kind = 'stable'
np.sort(np.array([2, 1, 3, 2]), kind='stable')

# sort on a 2d array
boo = np.array([
    [10, 55, 12],
    [0, 81, 33],
    [92, 11, 3]
])
np.sort(a = boo, axis = 0)   # sorts along the row axis
np.sort(a = boo, axis = 1)   # sorts along the column axis
np.sort(a = boo, axis = -1)  # (default) sorts along the last axis (in this case, the column axis)

# sort the rows of foo based on the values in the 1st column
boo[np.array([1, 0, 2])]

# argsort()
goo = np.array([3, 0, 10, 5])  # [3, 0, 10, 5]
np.argsort(goo)                # [1, 0,  3,  2]
np.sort(goo)                   # [0, 3,  5  10]

# sort a 2d array's rows based on a certain column
boo[np.argsort(boo[:, 1])]   # sort by column 1 ascending
boo[np.argsort(-boo[:, -1])]  # sort by last column, descending

Transcript

You can use numpy’s sort() function to return a sorted copy of an array.
sort() takes three primary parameters:

  • the array you want
  • the axis along which to sort. The default, -1, sorts along the last axis
  • and the kind of sort you want numpy to implement. By default, numpy implements quicksort

For example, here we sort a 1d array in ascending order. (Note that the original array remains unchanged.) If you have an array with nan values, sort pushes them to the end of the array.

Unfortunately numpy doesn’t have an easy, direct way of sorting arrays in descending order. However, with a bit of thought, we can cook something up. Two methods really stand out.

  1. The first is to sort the array in ascending order and then reverse the result.
  2. The second is to negate the array’s values, sort those in ascending order, and then negate the result.

The main difference between these techniques is that the 1st method pushes nan values to the front of the array and the 2nd method pushes nans to the back. Also, the second method won’t work on strings since you can’t negate a string.
Let’s see an example.
Following the 1st technique, we sort bar in ascending order, and then use slicing to reverse the output. Following the 2nd technique, we negate bar, sort the negated array, and then negate the result.
Notice how the resulting arrays are almost, but not quite the same.

It’s really important to understand that numpy’s default sorting algorithm, quicksort, is unstable. When you sort an array with repeated values, they’ll always end up next to each other but they might be in a different order than the original array.
If you need a stable sorting algorithm, set kind = ‘stable’. Depending on your data, numpy will use either timsort or radix sort, but the important thing is that your data will be stably sorted.
Let’s see an example.
If you run np.sort() on this 4-element array, you’ll get back a sorted copy of it. But, it’s possible that the 1st 2 in the input is not the 1st 2 in the output.
If you run np.sort() with kind set to ‘stable’, you’re guaranteed to get back a sorted array where the order of repeated elements matches the input.

What if you wanted to sort a multidimensional array? In such cases, you can use the axis parameter of the sort() function to specify which axis to sort along. For example, if we have this 2d array called boo, we can

  • sort it along axis 0, which sorts the columns of boo
  • or sort it along axis 1, which sorts the rows of boo

You can also set axis = -1 to sort the last axis of an array. In this case, it’d be like sorting boo along axis 1.

Cool, but what if we wanted to sort the rows of boo according to, say, the values in the 1st column? If we had something to give us the array [1, 0, 2], we could pop that into the the row index and get back our desired sorted array.

The tool we’re looking for is argsort(). argsort() works just like sort(), except it returns an array of indices indicating the position each value of the array would map to in the sorted case.
For example, if you had the array [3, 0, 10, 5] and you called argsort on it, you’d get back the array [1, 0, 3, 2]. These indices tell you where each element maps to in the sorted case. If you used this array as indices to the original array, you’d get back a sorted array just as if you called np.sort().
So, if you wanted to sort a 2d array’s rows based on a certain column, you just have to call argsort() on that column’s values and use the result to select rows from the original array.
Looking back at our 2d array, boo, we can do things like

  • sort its rows by the 2nd column, ascending
  • or sort its rows by the last column, descending