In June 2016, Grupo Bimbo challenged Kagglers to accurately forecast inventory demand based on historical sales data of over 100 products supplied to over a million stores across 45,000 routes in Mexico. The competition was hosted by Kaggle, spanned three months, and included 1,969 teams.
I finished 58th out of 1,969 teams (top 3%).
This competition was particularly challenging due to 1) the massive training dataset (80 Million rows) and 2) the nature of forecasting which requires careful model construction. My best model used Tianqi Chen’s very popular implementation of Gradient Boosting – XGBoost. With its parallel processing capability I was able to iterate through multiple feature ideas before settling on about 60 for my final model which took roughly five hours to run on my Macbook Pro (16GB Ram).