Python Pandas For Your Grandpa - 3.2 DataFrame To And From CSV

In this section, we’ll see how you can read from and write to CSV files with Pandas.

One of the most common ways to make a DataFrame is to load it from some pre-existing CSV file. Pandas has an awesome CSV reader for this, but before we use it, let’s make a DataFrame from scratch and write it to CSV so we have something to read. Here I’ll make a DataFrame called df with three columns and a thousand rows.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'id': np.arange(1000),
    'b': np.random.normal(size=1000),
    'c': pd.Series(np.random.choice(['cat', 'dog', 'hippo'], size=1000, replace=True), dtype="string")
##       id         b      c
## 0      0  0.255534    dog
## 1      1  0.454496  hippo
## 2      2 -0.868720    dog
## 3      3 -0.275906  hippo
## 4      4  0.045178  hippo
## ..   ...       ...    ...
## 995  995 -0.376002    cat
## 996  996 -0.645262    cat
## 997  997  0.041527  hippo
## 998  998  0.785299  hippo
## 999  999 -0.183583    dog
## [1000 rows x 3 columns]

To write this DataFrame to CSV it’s pretty simple. You just use the built in DataFrame.to_csv() method and pass in a name for the file. For example


By default, the file will get written to your current working directory and if you don’t know where that is, you can check it by importing os and running os.getcwd().

import os
os.getcwd()  # /your/current/working/directory

Alternatively, you can specify the file path you want to write the data to using the path_or_buf parameter, like df.to_csv(path_or_buf = '/some/special/path/pets.csv').

Also by default, to_csv() includes the row index in the output. So if you open the CSV file in a text editor don’t be surprised to find a nameless column at the front. For me, 9 times out of 10 I don’t want this, and it’s easy to prevent it by setting index = False.

If you look at the documentation for to_csv(), you’ll find there are a ton of parameters that let you really customize the way a DataFrame is written to CSV. We won’t go through them here since it would get boring fast, but they’re pretty well documented so I’ll encourage you to peruse them on your own time.

Now that we have a CSV file on disc, lets try loading it into a new DataFrame called pets. In this case, we’ll use the global function, pd.read_csv(), passing in the name of the file.

pets = pd.read_csv('pets.csv')

It might not seem that impressive, but if you’ve ever tried reading data from a file using a lower level language like C or C++, you should appreciate everything that’s happening under the hood. Not only did Pandas recognize the column names but it also interpreted the column data types correctly without any help from us.

Of course, in the real world, things don’t always go so smoothly. Sometimes you’ll have to steer read_csv() in the right direction using some parameters like

  • sep to specify a value separator if your file is something other than comma-delimited
  • header to tell pandas if your file contains column names
  • index_col to indicate which column if any should be used as the row index
  • usecols to tell pandas “only read a certain subset of columns”

Since read_csv() has even more parameters than to_csv(), we won’t cover all of them either, but they’re also well documented so, again, I’ll encourage you to read through them yourself.

Course Curriculum

  1. Introduction
    1.1 Introduction
  2. Series
    2.1 Series Creation
    2.2 Series Basic Indexing
    2.3 Series Basic Operations
    2.4 Series Boolean Indexing
    2.5 Series Missing Values
    2.6 Series Vectorization
    2.7 Series apply()
    2.8 Series View vs Copy
    2.9 Challenge: Baby Names
    2.10 Challenge: Bees Knees
    2.11 Challenge: Car Shopping
    2.12 Challenge: Price Gouging
    2.13 Challenge: Fair Teams
  3. DataFrame
    3.1 DataFrame Creation
    3.2 DataFrame To And From CSV
    3.3 DataFrame Basic Indexing
    3.4 DataFrame Basic Operations
    3.5 DataFrame apply()
    3.6 DataFrame View vs Copy
    3.7 DataFrame merge()
    3.8 DataFrame Aggregation
    3.9 DataFrame groupby()
    3.10 Challenge: Hobbies
    3.11 Challenge: Party Time
    3.12 Challenge: Vending Machines
    3.13 Challenge: Cradle Robbers
    3.14 Challenge: Pot Holes
  4. Advanced
    4.1 Strings
    4.2 Dates And Times
    4.3 Categoricals
    4.4 MultiIndex
    4.5 DataFrame Reshaping
    4.6 Challenge: Class Transitions
    4.7 Challenge: Rose Thorn
    4.8 Challenge: Product Volumes
    4.9 Challenge: Session Groups
    4.10 Challenge: OB-GYM
  5. Final Boss
    5.1 Challenge: COVID Tracing
    5.2 Challenge: Pickle
    5.3 Challenge: TV Commercials
    5.4 Challenge: Family IQ
    5.5 Challenge: Concerts