# Python NumPy For Your Grandma - 3.7 random

Contents

In this section, we’ll see how you can use NumPy’s random module to shuffle arrays, sample values from arrays, and draw values from a host of probability distributions. And then we’ll see why everything I just showed you is deprecated, and how to updated it to modern standards.

Let’s see an example of how you might simulate rolling a 6-sided die 3 times. In other words, we want to draw three integers from the range 1 to 6, with replacement. For this we can use the `randint()` function from NumPy’s random module.

``````import numpy as np

np.random.randint(low=1, high=7, size=3)
## array([6, 3, 1])
``````

If you try running this on your machine, you’ll probably get something different. However, we can get reproducible results by setting a random number seed immediately before we generate random numbers. To set a seed, use the `seed()` function with your favorite value passed in. Try this example on your machine and you should get the same result.

``````np.random.seed(123)
np.random.randint(low=1, high=7, size=3)
## array([6, 3, 5])
``````

Now, what if we wanted to draw three values between 1 and 6 without replacement? For this we can use the `choice()` function, giving it

• a 1d array of values to choose from
• the number of samples we want to draw, whether values should be replaced which is `False` by default
• and a 1d array of probabilities corresponding to our 1d array of options, which by default gives equal probability to each option.

`choice()` is like a generalized version of `randint()`. Let’s see some examples.

First we’ll draw 3 ints between 1 and 6 without replacement.

``````np.random.seed(2357)
np.random.choice(
a = np.arange(1, 7),
size = 3,
replace = False,
p = None
)
## array([6, 5, 1])
``````

Next we’ll do the same thing, but we’ll give a probability to each element.

``````np.random.choice(
a = np.arange(1, 7),
size = 3,
replace = False,
p = np.array([0.1, 0.1, 0.1, 0.1, 0.3, 0.3])
)
## array([5, 2, 6])
``````

Lastly we’ll draw 3 elements from an array of strings

``````np.random.choice(
a = np.array(['you', 'can', 'use', 'strings', 'too']),
size = 3,
replace = False,
p = None
)
## array(['use', 'you', 'can'], dtype='<U7')
``````

Now, what if you wanted to sample the rows from a 5x2 array like this one?

``````foo = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10]
])
``````

You can use `randint()` and `choice()` for that too. The trick is to generate a random 1d array of row indices and use that result to select rows from the array you wanted to sample.

For example, we can use `randint()` to sample three rows from `foo` with replacement.

``````np.random.seed(1234)
rand_rows = np.random.randint(
low=0,
high=foo.shape,
size=3
)
print(rand_rows)
## [3 4 4]
foo[rand_rows]
## array([[ 7,  8],
##        [ 9, 10],
##        [ 9, 10]])
``````

And we can use `choice()` to sample three rows from `foo` without replacement.

``````np.random.seed(1234)
rand_rows = np.random.choice(
a=np.arange(start=0, stop=foo.shape),
replace=False,
size=3
)  # [4, 2, 3]
print(rand_rows)
## [4 0 1]
foo[rand_rows]
## array([[ 9, 10],
##        [ 1,  2],
##        [ 3,  4]])
``````

You can also use this technique to shuffle an array, but NumPy makes this even easier with a function called `permutation()`. For example, if we call `np.random.permutation(foo)`, it’ll randomly shuffle the rows of `foo`.

``````np.random.permutation(foo)
## array([[ 7,  8],
##        [ 5,  6],
##        [ 3,  4],
##        [ 1,  2],
##        [ 9, 10]])
``````

Unfortunately though, `permutation()` only shuffles the data along its first axis, so we can’t shuffle the columns of `foo` - only the rows.

We can also sample values from a variety of probability distributions. For example, if we wanted to sample 4 values from the uniform distribution between 1 and 2 to populate a 2x2 array, we can do that with

``````np.random.uniform(low = 1.0, high = 2.0, size = (2, 2))
## array([[1.86066977, 1.15063697],
##        [1.19851876, 1.81516293]])
``````

Or we could sample two values from a standard normal distribution.

``````np.random.normal(loc = 0.0, scale = 1.0, size = 2)
## array([-0.00867858, -0.32106129])
``````

Or we could build a 3x2 array with random binomial values.

``````np.random.binomial(n = 10, p = 0.25, size = (3, 2))
## array([[2, 4],
##        [1, 0],
##        [2, 0]])
``````

There’s a whole bunch of other distributions supported by NumPy, so you can sample whatever your heart desires.

Okay, so now let’s see why everything I just showed you is deprecated…

Let’s suppose we’re using NumPy version 1.1 and the current random number generator is ABC1. So, when you do something like

``````np.random.seed(123)
np.random.randint(3, size=3)
``````

under the hood, the random number generator ABC1 is responsible for making sure you get back a statistically valid sequence of random integers and anyone else using NumPy who writes the same exact code gets back the same exact sequence.

Then somebody discovers a new random number generator, DEF2, actually does a better job of creating statistically valid random numbers. So, NumPy decides to replace the ABC1 generator with the DEF2 generator. So when you upgrade to NumPy version 1.2 and run the same exact code as before, you get a different sequence of random numbers.

This is an issue because one - some people have code or documentation that might break if the random numbers they were generating suddenly change, and two - if people can’t share a reproducible example because they’re on different versions of NumPy, that’s really inconvenient. I mean, think of all the old examples on Stack Overflow that would instantly become non-reproducible because NumPy updated their random number generator.

Another possibility is that someone comes along and creates a new random number generator, GHI3, that’s way faster than DEF2 but slightly less statistically valid. Now NumPy has this problem of deciding whether to use the fast generator or the more accurate generator.

The solution to this was to create a generic `Generator` class that you pick the random number generator you want to use. For the sake of simplicity, I’m just going to use NumPy’s `default_rng()` method which, at the moment, selects the PCG64 random number generator. And in fact, when you look at the documentation for functions like `randint()`, that’s what NumPy suggests.

So, all we have to do is say `generator = np.random.default_rng()`, and we have the option to pass in a seed like 123.

``````generator = np.random.default_rng(seed=123)
``````

Now let’s see how we can reconstruct some of the examples from earlier. So, if I want to sample three random integers between 1 and 7, instead of using `np.random.randint()` I can use `generator.integers()`.

``````generator.integers(low=1, high=7, size=3)
## array([1, 5, 4])
``````

If I want to choose three values between 0 and 9 with replacement, instead of using `np.random.choice()` I can use `generator.choice()`

``````generator.choice(a=10, size=3, replace=True)
## array([0, 9, 2])
``````

If I want to permute the rows of `foo`, instead of doing `np.random.permutation(foo)` I can do `generator.permutation(foo)`, and this time I actually get an `axis` argument, so if I wanted to, I could permute the columns of `foo` by setting `axis=1`.

``````generator.permutation(foo, axis=1)
## array([[ 1,  2],
##        [ 3,  4],
##        [ 5,  6],
##        [ 7,  8],
##        [ 9, 10]])
``````

And all the distribution sampling methods we covered like uniform, normal, and binomial are implemented as generator methods as well.

``````generator.uniform(low = 1.0, high = 2.0, size = (2, 2))
## array([[1.1759059 , 1.81209451],
##        [1.923345  , 1.2765744 ]])
generator.normal(loc = 0.0, scale = 1.0, size = 2)
## array([-0.31659545, -0.32238912])
generator.binomial(n = 10, p = 0.25, size = (3, 2))
## array([[2, 2],
##        [4, 1],
##        [3, 3]])
``````