# Quick Guide to Regex in R




The purpose of this guide is to bridge the gap between understanding what a [regular expression](https://en.wikipedia.org/wiki/Regular_expression) *is* and *how to use them* in R. If you’re brand new to regular expressions, I highly recommend checking out [RegexOne](http://regexone.com/).

Hadley Wickham’s [stringr package](https://github.com/hadley/stringr) makes using regular expressions in R a breeze. I use it to avoid the complexity of base R’s regex functions grep, grepl, regexpr, gregexpr, sub and gsub where even the function names are cryptic.

## Setup

```r
library(stringr)

sentence <- "We bought our  Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they   have many dogs."
```
 
## Does the string contain a pattern?

```r
# Does the sentence contain the word “the”?

# disregard adjacent characters
str_detect(sentence, "the")
## [1] TRUE

# consider word boundaries on both sides of the word "the"
str_detect(sentence, "\\bthe\\b")
## [1] FALSE
```

## Extracting patterns

```r
# What’s the first number that appears in the sentence?

# find the first digit
str_extract(sentence, "\\d")
## [1] "3"

# find the first sequence of digits
str_extract(sentence, "\\d+")
## [1] "30"

# find the first match for [^\\b]\\d+ followed by a word break where 
# [^\\b]\\d+ matches everything except a word boundary followed by 1 or more digits
str_extract(sentence, "[^\\b]\\d+(?=\\b)") 
## [1] "$30"
 
# find all sequences of numbers
str_extract_all(sentence, "\\b\\d+\\b")
## [[1]]
## [1] "30"   "1"    "1"    "2015" "1017"
```

## Counting matching patterns

```r
# How many times does the word “dog” appear in the sentence?

# count occurences of the word "dog"
str_count(sentence, "dog")
## [1] 1

# count occurences of the word "dog" and require word boundaries 
# on both sides of the word
str_count(sentence, "\\bdog\\b")
## [1] 0
```

## Replacing matching patterns

```r
# Replace the 2nd digit with a 9
str_replace(sentence, "(?<=\\d)[^\\d]*(\\d)", "9")
## [1] "We bought our  Golden Retriever, Snuggles, for $39 on 1/1/2015 at 1017 Main St. where they   have many dogs."
 
# Replace every 0 or 1 with a 6
str_replace_all(sentence, "(0|1)", "6")
## [1] "We bought our  Golden Retriever, Snuggles, for $36 on 6/6/2665 at 6667 Main St. where they   have many dogs."
 
# Replace all instances of multiple spaces with a single space
str_replace_all(sentence, "\\s{2,}", " ")
## [1] "We bought our Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St. where they have many dogs."
```

