I think there’s a rule somewhere that says “You can’t call yourself a data scientist until you’ve used a Naive Bayes classifier”. It’s extremely useful, yet beautifully simplistic. This article is my attempt at laying the groundwork for Naive Bayes in a practical and intuitive fashion.
Motivating Problem Let’s start with a problem to motivate our formulation of Naive Bayes. (Feel free to follow along using the Python script or R script found here.

The purpose of this guide is to bridge the gap between understanding what a regular expression is and how to use them in Python. If you’re brand new to regular expressions, I highly recommend checking out RegexOne.
For this guide, we’ll use Python’s re module which makes using regular expressions a breeze.
Setup import re # imoprt the re module sentence = "We bought our Golden Retriever, Snuggles, for $30 on 1/1/2015 at 1017 Main St.

The purpose of this guide is to bridge the gap between understanding what a regular expression is and how to use them in R. If you’re brand new to regular expressions, I highly recommend checking out RegexOne.
Hadley Wickham’s stringr package makes using regular expressions in R a breeze. I use it to avoid the complexity of base R’s regex functions grep, grepl, regexpr, gregexpr, sub and gsub where even the function names are cryptic.

Here’s a practical guide for calculating customer retention and churn from transaction data.
Preface The general idea of customer retention is self explanatory; It’s a measure of how well a business retainins their customers. Unfortunately the specifics of how to calculate a retention metric are not so clear. Likewise, customer churn is the complement of retention; It’s a measure of how many customers end their relationship with a business - i.

Logistic regression is a generalized linear model most commonly used for classifying binary data. It’s output is a continuous range of values between 0 and 1 (commonly representing the probability of some event occurring), and its input can be a multitude of real-valued and discrete predictors.
Motivating Problem Suppose you want to predict the probability someone is a homeowner based solely on their age. You might have a dataset like

R’s rpart package provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example.
Motivating Problem First let’s define a problem. There’s a common scam amongst motorists whereby a person will slam on his breaks in heavy traffic with the intention of being rear-ended. The person will then file an insurance claim for personal injury and damage to his vehicle, alleging that the other driver was at fault.