/logo-square.png

R – Introduction to Factors Tutorial

A factor variable (commonly called a categorical variable outside of R) is a variable that takes on a limited set of values. For example, a vector that stores days of the week {Sunday, Monday, etc.} or colors from the set {Red, Blue, Green} should be encoded as a factor. By contrast, a vector of person names {Bill, Sue, Jane, …} should generally be designated as a character vector since there is an unlimited set of possible names a person can take.

A Concrete Example Of What An Actuary Does

When I was in high school, I knew I wanted to pursue a career involving math. I did an internship working for some mechanical engineers at an oil platform consultant company, but I never witnessed my mentors do more than basic geometry or algebra. That’s when I started looking into actuarial science. It sounded like a more challenging and stimulating career for me. Problem was, I was having a hard time understanding exactly what an actuary does.

Magic Behind Constructing A Decision Tree

Before we get started I need to clarify something. Theoretical decision trees can have two or more branches protruding from a single node. However, this can be computationally expensive so most implementations of decision trees only allow binary splits. Recall our example problem – Bill is a user on our online dating site and we want to build a decision tree that predicts whether he would message a certain woman. (If so, we say that she’s “date-worthy”).

Introduction To Decision Trees

A decision tree is a model that uses a set of criteria to classify something. Suppose you tell your single friend Bill to go out with your new friend Sally. Since Bill has never met Sally, he asks you a series of questions. Bill: How far from me does she live? You: 15 miles Bill: How tall is she? You: 5’6 Bill: Does she have a college degree? You: No