Contents

Quick Guide to Regex in Python

The purpose of this guide is to bridge the gap between understanding what a regular expression is and how to use them in Python. If you’re brand new to regular expressions, I highly recommend checking out RegexOne.

For this guide, we’ll use Python’s re module which makes using regular expressions a breeze.

Setup

import re # import the re module

sentence = "We bought our  Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they   have many dogs."

Does the string contain a pattern?

# Does the sentence contain the word “the”?

# disregard adjacent characters
re.search(r"the", sentence) is not None
## True
# consider word boundaries on both sides of the word "the"
re.search(r"\bthe\b", sentence) is not None
## False

Extracting patterns

# What’s the first number that appears in the sentence?

# find the first digit
re.search(r"\d", sentence).group()
## '3'
# find the first sequence of digits
re.search(r"\d+", sentence).group()
## '30'
# find the first match for [^\b]\d+ followed by a word break where 
# [^\b]\d+ = everything except a word boundary followed by 1 or more digits
re.search(r"[^\b]\d+(?=\b)", sentence).group()
## '$30'
# find all sequences of numbers
re.findall(r"\b\d+\b", sentence)
## ['30', '1', '1', '2015', '1017']

Counting matching patterns

# How many times does the word “dog” appear in the sentence?

# count occurences of the word "dog"
len(re.findall(r"dog", sentence))
## 1
# count occurences of the word "dog" and require word boundaries 
# on both sides of the word
len(re.findall(r"\bdog\b", sentence))
## 0

Replacing matching patterns

# Replace the 2nd digit with a 9
# count = 1 means replace the 1st match
re.sub(r"(?<=\d)[^\d]*(\d)", "9", sentence, count = 1)
## 'We bought our  Golden Retriever, Snuggles, for $39 on 1/1/2015 at 1017 Main St. where they   have many dogs.'
# Replace every 0 or 1 with a 6
# count = 0 means replace all matches
re.sub(r"(0|1)", "6", sentence, count = 0)
## 'We bought our  Golden Retriever, Snuggles, for $36 on 6/6/2665 at 6667 Main St. where they   have many dogs.'
# Replace all instances of multiple spaces with a single space
re.sub(r"\s{2,}", " ", sentence)
## 'We bought our Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they have many dogs.'