The purpose of this article is to hold your hand through the process of designing and training a neural network. Note that this article is Part 2 of Introduction to Neural Networks. R code for this tutorial is provided here in the Machine Learning Problem Bible. Description of the problem We start with a motivational problem. We have a collection of 2x2 grayscale images. We’ve identified each image as having a “stairs” like pattern or not.
In this tutorial you will learn how to define geometries (points, lines, polygons) plot those geometries execute spatial joins (which points are contained in a polygon?) get the distance between a set of points do all of the above within the context of geospatial data (e.g. cities, roads, counties) Important This tutorial is based on sf version 0.5-3 and ggplot2 version 126.96.36.1990. The Basics To get started we need to learn how to define and operate on abstract geometries without a coordinate reference system (CRS).
If linear regression was a Toyota Camry, then gradient boosting would be a UH-60 Blackhawk Helicopter. A particular implementation of gradient boosting, XGBoost, is consistently used to win machine learning competitions on Kaggle. Unfortunately many practitioners (including my former self) use it as a black box. It’s also been butchered to death by a host of drive-by data scientists’ blogs. As such, the purpose of this article is to lay the groundwork for classical gradient boosting, intuitively and comprehensively.
Introduction Stacking (also called meta ensembling) is a model ensembling technique used to combine information from multiple predictive models to generate a new model. Often times the stacked model (also called 2nd-level model) will outperform each of the individual models due its smoothing nature and ability to highlight each base model where it performs best and discredit each base model where it performs poorly. For this reason, stacking is most effective when the base models are significantly different.
The Problem You sell software that helps stores manage their inventory. You collect leads on thousands of potential customers, and your strategy is to cold-call them and pitch your product. You can only make 100 phone calls per day, so you want to identify leads with a high probability of converting to a sale. By calling leads randomly, you only generate about two sales per day - a 2% hit ratio.
In three months (as of June 2016) the New Orleans Saints will play a football game against the Atlanta Falcons. I want to know who will win. I ask my friend and he says the Saints. Technically this is a predictive model, but it’s probably not worth much. I can improve upon this model by asking other people who they think will win. Someone might pick the Saints because we have a better quarterback.