# Python Pandas For Your Grandpa | Section 2.9 | Series Missing Values

# Course Contents

- Introduction
- Series

2.1 Series Creation

2.2 Series Basic Operations

2.3 Series Basic Indexing

2.4 Series Overwriting Data

2.5 Series Apply

2.6 Series Concatenation

2.7 Series Boolean Indexing

2.8 Series View Vs Copy

**2.9 Series Missing Values**

2.10 Series Challenges

```
import numpy as np
import pandas as pd
```

One of the fundamental features of pandas is its ability to represent missing or invalid data using NaN.

This is an interesting lecture for me because I basically spent months putting this course together and right before I published my first lecture, pandas dropped a bomb on me - they released version 1.0.0. This version had breaking changes and major new features. Fortunately, most of my lectures weren’t affected - but this lecture, I basically had to re-write it from scratch.

So, back in the day, if you wanted to represent missing or invalid data, you had to use NumPy’s special floating point constant, `np.nan`

. So, if you had a pandas series of integers like this

```
roux = pd.Series([1, 2, 3])
```

And then you set the 2nd element to `np.nan`

```
roux.iloc[1] = np.nan
```

The series would get cast to floats because NaN only existed as a floating point value.

```
print(roux)
## 0 1.0
## 1 NaN
## 2 3.0
## dtype: float64
```

Today, pandas has a Nullable integer datatype called Int64 with a capital *I* to differentiate it from NumPy’s int64 with a lower case *i*. So, let’s rebuild that Series, this time specifying `dtype='Int64'`

.

```
roux = pd.Series([1, 2, 3], dtype='Int64')
```

And, again, let’s set the 2nd element to `np.nan`

```
roux.iloc[1] = np.nan
```

This time, the series retains its Int64 datatype, and doesn’t get cast to float. A couple other, and probably better ways to do this would be

```
roux.iloc[1] = None
roux.iloc[1] = pd.NA
```

Now let’s build a Series of strings, set the 2nd element to `None`

and set the 3rd element to `np.nan`

.

```
gumbo = pd.Series(['a', 'b', 'c'])
gumbo.iloc[1] = None
gumbo.iloc[2] = np.nan
```

If we print the Series, you’ll notice that this time pandas doesn’t really do anything.

```
gumbo
## 0 a
## 1 None
## 2 NaN
## dtype: object
```

That’s because a series of strings is a series of objects, and a series of objects is really just a NumPy array of pointers that can point to anything in memory.

Of course, in pandas 1.0.0, there’s a new experimental string datatype that makes everything I just said somewhat wrong or outdated. Now you can do stuff like this.

```
gumbo = pd.Series(['a', 'b', 'c'], dtype='string')
gumbo.iloc[1] = None
gumbo.iloc[2] = np.nan
print(gumbo)
## 0 a
## 1 <NA>
## 2 <NA>
## dtype: string
```

In any case, pandas provides two helper functions for identifying NaN values. If you have a Series `x`

with some NaN values, and then you check `x == np.nan`

, you’ll get back a series of all False values. That’s because NumPy designed `nan`

so that `nan == nan`

returns False.

```
x = pd.Series([1.0, np.nan, 3.0, np.nan])
x == np.nan
## 0 False
## 1 False
## 2 False
## 3 False
## dtype: bool
```

If you want to pick out NaN values from a Series, you should the function use `pd.isna()`

and if you want to pick out non-NaN values use `pd.notna()`

.

```
pd.isna(x)
## 0 False
## 1 True
## 2 False
## 3 True
## dtype: bool
pd.notna(x)
## 0 True
## 1 False
## 2 True
## 3 False
## dtype: bool
```

If you want to replace NaN values with -1, you could do something like

```
x.loc[pd.isna(x)] = -1
```

and this works, but pandas provides a really convenient `fillna()`

method that makes this even simpler.

```
x.fillna(-1)
## 0 1.0
## 1 -1.0
## 2 3.0
## 3 -1.0
## dtype: float64
```

Just remember that this returns a modified copy of `x`

, so `x`

doesn’t actually get changed here. If you did want to update `x`

, you could do the same thing but set `inplace=True`

.

```
x.fillna(-1, inplace=True)
print(x)
## 0 1.0
## 1 -1.0
## 2 3.0
## 3 -1.0
## dtype: float64
```