# R – Data.Table Rolling Joins

Rolling joins are commonly used for analyzing data involving time. A simple example – suppose you have a table of product sales and a table of commercials. You might want to associate each product sale with the most recent commercial that aired prior to the sale. In this case, you cannot do a basic join between the sales table and the commercials table because each sale was not tracked with a CommercialId attribute.

# R – Introduction to Data.Table Joins

R’s data.table package provides fast methods for handling large tables of data with simplistic syntax. The following is an introduction to basic join operations using data.table. Suppose you have two data.tables – a table of insurance policies policies <- data.table( PolicyNumber = c(1, 2, 3), EffectiveDate = as.Date(c("2012-1-1", "2012-1-1", "2012-7-1")), ExpirationDate = as.Date(c("2012-12-31", "2012-6-30", "2012-12-31")) ) policies ## PolicyNumber EffectiveDate ExpirationDate ## 1: 1 2012-01-01 2012-12-31 ## 2: 2 2012-01-01 2012-06-30 ## 3: 3 2012-07-01 2012-12-31 and a table of insurance claims.

# R – Introduction to Factors Tutorial

A factor variable (commonly called a categorical variable outside of R) is a variable that takes on a limited set of values. For example, a vector that stores days of the week {Sunday, Monday, etc.} or colors from the set {Red, Blue, Green} should be encoded as a factor. By contrast, a vector of person names {Bill, Sue, Jane, …} should generally be designated as a character vector since there is an unlimited set of possible names a person can take.

# A Concrete Example Of What An Actuary Does

When I was in high school, I knew I wanted to pursue a career involving math. I did an internship working for some mechanical engineers at an oil platform consultant company, but I never witnessed my mentors do more than basic geometry or algebra. That’s when I started looking into actuarial science. It sounded like a more challenging and stimulating career for me. Problem was, I was having a hard time understanding exactly what an actuary does.

# Magic Behind Constructing A Decision Tree

Before we get started I need to clarify something. Theoretical decision trees can have two or more branches protruding from a single node. However, this can be computationally expensive so most implementations of decision trees only allow binary splits. Recall our example problem – Bill is a user on our online dating site and we want to build a decision tree that predicts whether he would message a certain woman. (If so, we say that she’s “date-worthy”).

# Introduction To Decision Trees

A decision tree is a model that uses a set of criteria to classify something. Suppose you tell your single friend Bill to go out with your new friend Sally. Since Bill has never met Sally, he asks you a series of questions. Bill: How far from me does she live? You: 15 miles Bill: How tall is she? You: 5’6 Bill: Does she have a college degree? You: No