Decision Trees in R using rpart

Ben Gorman published on 2014-08-24

R’s rpart package provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example. Motivating Problem First let’s define a problem. There’s a common scam amongst motorists whereby a person will slam on his breaks in heavy traffic with the intention of being rear-ended. The person will then file an insurance claim for personal injury and damage to his vehicle, alleging that the other driver was at fault.

R – Data.Table Rolling Joins

Ben Gorman published on 2014-07-26

Rolling joins are commonly used for analyzing data involving time. A simple example – suppose you have a table of product sales and a table of commercials. You might want to associate each product sale with the most recent commercial that aired prior to the sale. In this case, you cannot do a basic join between the sales table and the commercials table because each sale was not tracked with a CommercialId attribute.

R – Introduction to Data.Table Joins

Ben Gorman published on 2014-07-23

R’s data.table package provides fast methods for handling large tables of data with simplistic syntax. The following is an introduction to basic join operations using data.table. Suppose you have two data.tables – a table of insurance policies policies <- data.table( PolicyNumber = c(1, 2, 3), EffectiveDate = as.Date(c("2012-1-1", "2012-1-1", "2012-7-1")), ExpirationDate = as.Date(c("2012-12-31", "2012-6-30", "2012-12-31")) ) policies ## PolicyNumber EffectiveDate ExpirationDate ## 1: 1 2012-01-01 2012-12-31 ## 2: 2 2012-01-01 2012-06-30 ## 3: 3 2012-07-01 2012-12-31 and a table of insurance claims.

R – Introduction to Factors Tutorial

Ben Gorman published on 2014-07-21

A factor variable (commonly called a categorical variable outside of R) is a variable that takes on a limited set of values. For example, a vector that stores days of the week {Sunday, Monday, etc.} or colors from the set {Red, Blue, Green} should be encoded as a factor. By contrast, a vector of person names {Bill, Sue, Jane, …} should generally be designated as a character vector since there is an unlimited set of possible names a person can take.

A Concrete Example Of What An Actuary Does

Ben Gorman published on 2014-07-09

When I was in high school, I knew I wanted to pursue a career involving math. I did an internship working for some mechanical engineers at an oil platform consultant company, but I never witnessed my mentors do more than basic geometry or algebra. That’s when I started looking into actuarial science. It sounded like a more challenging and stimulating career for me. Problem was, I was having a hard time understanding exactly what an actuary does.

Magic Behind Constructing A Decision Tree

Ben Gorman published on 2014-06-02

Before we get started I need to clarify something. Theoretical decision trees can have two or more branches protruding from a single node. However, this can be computationally expensive so most implementations of decision trees only allow binary splits. Recall our example problem – Bill is a user on our online dating site and we want to build a decision tree that predicts whether he would message a certain woman. (If so, we say that she’s “date-worthy”).