Share on:

Quick Guide to Regex in Python

April 2, 2016
Python tutorial

The purpose of this guide is to bridge the gap between understanding what a regular expression is and how to use them in Python. If you’re brand new to regular expressions, I highly recommend checking out RegexOne.

For this guide, we’ll use Python’s re module which makes using regular expressions a breeze.

Setup

import re # imoprt the re module

sentence = "We bought our  Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they   have many dogs."

Does the string contain a pattern?

# Does the sentence contain the word “the”?

# disregard adjacent characters
print re.search(r"the", sentence) is not None
## True
# consider word boundaries on both sides of the word "the"
print re.search(r"\bthe\b", sentence) is not None
## False

Extracting patterns

# What’s the first number that appears in the sentence?

# find the first digit
print re.search(r"\d", sentence).group()
## 3
# find the first sequence of digits
print re.search(r"\d+", sentence).group()
## 30
# find the first match for [^\b]\d+ followed by a word break where 
# [^\b]\d+ = everything except a word boundary followed by 1 or more digits
print re.search(r"[^\b]\d+(?=\b)", sentence).group()
## $30
# find all sequences of numbers
print re.findall(r"\b\d+\b", sentence)
## ['30', '1', '1', '2015', '1017']

Counting matching patterns

# How many times does the word “dog” appear in the sentence?

# count occurences of the word "dog"
print len(re.findall(r"dog", sentence))
## 1
# count occurences of the word "dog" and require word boundaries 
# on both sides of the word
print len(re.findall(r"\bdog\b", sentence))
## 0

Replacing matching patterns

# Replace the 2nd digit with a 9
# count = 1 means replace the 1st match
print re.sub(r"(?<=\d)[^\d]*(\d)", "9", sentence, count = 1)
## We bought our  Golden Retriever, Snuggles, for $39 on 1/1/2015 at 1017 Main St. where they   have many dogs.
# Replace every 0 or 1 with a 6
# count = 0 means replace all matches
print re.sub(r"(0|1)", "6", sentence, count = 0)
## We bought our  Golden Retriever, Snuggles, for $36 on 6/6/2665 at 6667 Main St. where they   have many dogs.
# Replace all instances of multiple spaces with a single space
print re.sub(r"\s{2,}", " ", sentence)
## We bought our Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they have many dogs.

Enjoyed this article? Show your support and buy some GormAnalysis merch.
comments powered by Disqus