Contents

Python NumPy For Your Grandma - 3.7 random

In this section, we’ll see how you can use NumPy’s random module to shuffle arrays, sample values from arrays, and draw values from a host of probability distributions. And then we’ll see why everything I just showed you is deprecated, and how to updated it to modern standards.

Let’s see an example of how you might simulate rolling a 6-sided die 3 times. In other words, we want to draw three integers from the range 1 to 6, with replacement. For this we can use the randint() function from NumPy’s random module.

import numpy as np

np.random.randint(low=1, high=7, size=3)
## array([6, 3, 1])

If you try running this on your machine, you’ll probably get something different. However, we can get reproducible results by setting a random number seed immediately before we generate random numbers. To set a seed, use the seed() function with your favorite value passed in. Try this example on your machine and you should get the same result.

np.random.seed(123)
np.random.randint(low=1, high=7, size=3)
## array([6, 3, 5])

Now, what if we wanted to draw three values between 1 and 6 without replacement? For this we can use the choice() function, giving it

  • a 1d array of values to choose from
  • the number of samples we want to draw, whether values should be replaced which is False by default
  • and a 1d array of probabilities corresponding to our 1d array of options, which by default gives equal probability to each option.

choice() is like a generalized version of randint(). Let’s see some examples.

First we’ll draw 3 ints between 1 and 6 without replacement.

np.random.seed(2357)
np.random.choice(
    a = np.arange(1, 7),
    size = 3,
    replace = False,
    p = None
)
## array([6, 5, 1])

Next we’ll do the same thing, but we’ll give a probability to each element.

np.random.choice(
    a = np.arange(1, 7),
    size = 3,
    replace = False,
    p = np.array([0.1, 0.1, 0.1, 0.1, 0.3, 0.3])
)
## array([5, 2, 6])

Lastly we’ll draw 3 elements from an array of strings

np.random.choice(
    a = np.array(['you', 'can', 'use', 'strings', 'too']),
    size = 3,
    replace = False,
    p = None
)
## array(['use', 'you', 'can'], dtype='<U7')

Now, what if you wanted to sample the rows from a 5x2 array like this one?

foo = np.array([
    [1, 2],
    [3, 4],
    [5, 6],
    [7, 8],
    [9, 10]
])

You can use randint() and choice() for that too. The trick is to generate a random 1d array of row indices and use that result to select rows from the array you wanted to sample.

For example, we can use randint() to sample three rows from foo with replacement.

np.random.seed(1234)
rand_rows = np.random.randint(
    low=0,
    high=foo.shape[0],
    size=3
)
print(rand_rows)
## [3 4 4]
foo[rand_rows]
## array([[ 7,  8],
##        [ 9, 10],
##        [ 9, 10]])

And we can use choice() to sample three rows from foo without replacement.

np.random.seed(1234)
rand_rows = np.random.choice(
    a=np.arange(start=0, stop=foo.shape[0]),
    replace=False,
    size=3
)  # [4, 2, 3]
print(rand_rows)
## [4 0 1]
foo[rand_rows]
## array([[ 9, 10],
##        [ 1,  2],
##        [ 3,  4]])

You can also use this technique to shuffle an array, but NumPy makes this even easier with a function called permutation(). For example, if we call np.random.permutation(foo), it’ll randomly shuffle the rows of foo.

np.random.permutation(foo)
## array([[ 7,  8],
##        [ 5,  6],
##        [ 3,  4],
##        [ 1,  2],
##        [ 9, 10]])

Unfortunately though, permutation() only shuffles the data along its first axis, so we can’t shuffle the columns of foo - only the rows.

We can also sample values from a variety of probability distributions. For example, if we wanted to sample 4 values from the uniform distribution between 1 and 2 to populate a 2x2 array, we can do that with

np.random.uniform(low = 1.0, high = 2.0, size = (2, 2))
## array([[1.86066977, 1.15063697],
##        [1.19851876, 1.81516293]])

Or we could sample two values from a standard normal distribution.

np.random.normal(loc = 0.0, scale = 1.0, size = 2)
## array([-0.00867858, -0.32106129])

Or we could build a 3x2 array with random binomial values.

np.random.binomial(n = 10, p = 0.25, size = (3, 2))
## array([[2, 4],
##        [1, 0],
##        [2, 0]])

There’s a whole bunch of other distributions supported by NumPy, so you can sample whatever your heart desires.

Okay, so now let’s see why everything I just showed you is deprecated…

Let’s suppose we’re using NumPy version 1.1 and the current random number generator is ABC1. So, when you do something like

np.random.seed(123)
np.random.randint(3, size=3)

under the hood, the random number generator ABC1 is responsible for making sure you get back a statistically valid sequence of random integers and anyone else using NumPy who writes the same exact code gets back the same exact sequence.

Then somebody discovers a new random number generator, DEF2, actually does a better job of creating statistically valid random numbers. So, NumPy decides to replace the ABC1 generator with the DEF2 generator. So when you upgrade to NumPy version 1.2 and run the same exact code as before, you get a different sequence of random numbers.

This is an issue because one - some people have code or documentation that might break if the random numbers they were generating suddenly change, and two - if people can’t share a reproducible example because they’re on different versions of NumPy, that’s really inconvenient. I mean, think of all the old examples on Stack Overflow that would instantly become non-reproducible because NumPy updated their random number generator.

Another possibility is that someone comes along and creates a new random number generator, GHI3, that’s way faster than DEF2 but slightly less statistically valid. Now NumPy has this problem of deciding whether to use the fast generator or the more accurate generator.

The solution to this was to create a generic Generator class that you pick the random number generator you want to use. For the sake of simplicity, I’m just going to use NumPy’s default_rng() method which, at the moment, selects the PCG64 random number generator. And in fact, when you look at the documentation for functions like randint(), that’s what NumPy suggests.

So, all we have to do is say generator = np.random.default_rng(), and we have the option to pass in a seed like 123.

generator = np.random.default_rng(seed=123)

Now let’s see how we can reconstruct some of the examples from earlier. So, if I want to sample three random integers between 1 and 7, instead of using np.random.randint() I can use generator.integers().

generator.integers(low=1, high=7, size=3)
## array([1, 5, 4])

If I want to choose three values between 0 and 9 with replacement, instead of using np.random.choice() I can use generator.choice()

generator.choice(a=10, size=3, replace=True)
## array([0, 9, 2])

If I want to permute the rows of foo, instead of doing np.random.permutation(foo) I can do generator.permutation(foo), and this time I actually get an axis argument, so if I wanted to, I could permute the columns of foo by setting axis=1.

generator.permutation(foo, axis=1)
## array([[ 1,  2],
##        [ 3,  4],
##        [ 5,  6],
##        [ 7,  8],
##        [ 9, 10]])

And all the distribution sampling methods we covered like uniform, normal, and binomial are implemented as generator methods as well.

generator.uniform(low = 1.0, high = 2.0, size = (2, 2))
## array([[1.1759059 , 1.81209451],
##        [1.923345  , 1.2765744 ]])
generator.normal(loc = 0.0, scale = 1.0, size = 2)
## array([-0.31659545, -0.32238912])
generator.binomial(n = 10, p = 0.25, size = (3, 2))
## array([[2, 2],
##        [4, 1],
##        [3, 3]])

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Basic Array Stuff
    2.1 NumPy Array Motivation
    2.2 NumPy Array Basics
    2.3 Creating NumPy Arrays
    2.4 Indexing 1-D Arrays
    2.5 Indexing Multidimensional Arrays
    2.6 Basic Math On Arrays
    2.7 Challenge: High School Reunion
    2.8 Challenge: Gold Miner
    2.9 Challenge: Chic-fil-A
  3. Intermediate Array Stuff
    3.1 Broadcasting
    3.2 newaxis
    3.3 reshape()
    3.4 Boolean Indexing
    3.5 nan
    3.6 infinity
    3.7 random
    3.8 Challenge: Love Distance
    3.9 Challenge: Professor Prick
    3.10 Challenge: Psycho Parent
  4. Common Operations
    4.1 where()
    4.2 Math Functions
    4.3 all() and any()
    4.4 concatenate()
    4.5 Stacking
    4.6 Sorting
    4.7 unique()
    4.8 Challenge: Movie Ratings
    4.9 Challenge: Big Fish
    4.10 Challenge: Taco Truck
  5. Advanced Array Stuff
    5.1 Advanced Array Indexing
    5.2 View vs Copy
    5.3 Challenge: Population Verification
    5.4 Challenge: Prime Locations
    5.5 Challenge: The Game of Doors
    5.6 Challenge: Peanut Butter
  6. Final Boss
    6.1 as_strided()
    6.2 einsum()
    6.3 Challenge: One-Hot-Encoding
    6.4 Challenge: Cumulative Rainfall
    6.5 Challenge: Table Tennis
    6.6 Challenge: Where’s Waldo
    6.7 Challenge: Outer Product

Additional Content

  1. Python Pandas For Your Grandpa
  2. Neural Networks For Your Dog
  3. Introduction To Google Colab