In August 2016, Red Hat challenged data scientists to create a classification algorithm that accurately identifies which customers have the most potential business value for Red Hat based on their characteristics and activities. The competition was hosted by Kaggle, spanned two months, and included 1,462 teams.
I placed 10th (top 1%) – good enough to win a gold medal and achieve Kaggle’s “Master” status since I’d already achieved the additional requirements (two silver medals).
For this competition, I was able to break into the top 20 pretty quickly even though I joined just a few weeks before the end. I was able to leverage my position to lure Mike Kim (a Kaggle Grand Maser ranked 11th in the world) onto my team. After ensembling our models together, we jumped into the top 10, reaching as high as 6th place on the leaderboard with a few days to go before falling to 10th at the end.
I used a number of tricks to get near the top, but perhaps the most effective trick I used was training my models on different subsets of the training data so that they’d generalize well to the test dataset.