Logistic regression is a generalized linear model most commonly used for classifying binary data. It’s output is a continuous range of values between 0 and 1 (commonly representing the probability of some event occurring), and its input can be a multitude of real-valued and discrete predictors.
Motivating Problem Suppose you want to predict the probability someone is a homeowner based solely on their age. You might have a dataset like

R’s rpart package provides a powerful framework for growing classification and regression trees. To see how it works, let’s get started with a minimal example.
Motivating Problem First let’s define a problem. There’s a common scam amongst motorists whereby a person will slam on his breaks in heavy traffic with the intention of being rear-ended. The person will then file an insurance claim for personal injury and damage to his vehicle, alleging that the other driver was at fault.

Rolling joins are commonly used for analyzing data involving time. A simple example – suppose you have a table of product sales and a table of commercials. You might want to associate each product sale with the most recent commercial that aired prior to the sale. In this case, you cannot do a basic join between the sales table and the commercials table because each sale was not tracked with a CommercialId attribute.

R’s data.table package provides fast methods for handling large tables of data with simplistic syntax. The following is an introduction to basic join operations using data.table.
Suppose you have two data.tables – a table of insurance policies
policies <- data.table( PolicyNumber = c(1, 2, 3), EffectiveDate = as.Date(c("2012-1-1", "2012-1-1", "2012-7-1")), ExpirationDate = as.Date(c("2012-12-31", "2012-6-30", "2012-12-31")) ) policies ## PolicyNumber EffectiveDate ExpirationDate ## 1: 1 2012-01-01 2012-12-31 ## 2: 2 2012-01-01 2012-06-30 ## 3: 3 2012-07-01 2012-12-31 and a table of insurance claims.

A factor variable (commonly called a categorical variable outside of R) is a variable that takes on a limited set of values. For example, a vector that stores days of the week {Sunday, Monday, etc.} or colors from the set {Red, Blue, Green} should be encoded as a factor. By contrast, a vector of person names {Bill, Sue, Jane, …} should generally be designated as a character vector since there is an unlimited set of possible names a person can take.

When I was in high school, I knew I wanted to pursue a career involving math. I did an internship working for some mechanical engineers at an oil platform consultant company, but I never witnessed my mentors do more than basic geometry or algebra. That’s when I started looking into actuarial science. It sounded like a more challenging and stimulating career for me. Problem was, I was having a hard time understanding exactly what an actuary does.