vermorel 6 weeks ago | flag

The M5 forecasting competition was notable on several fronts:

  • It uses a sizeable real-world store-level sales dataset provided by Walmart. With 40k SKUs, it is - to date - the largest publicly assessible dataset for retail sales data.
  • It features a probabilistic perspective letting the participants compete over a series quantile estimates. To date, it's the only forecasting competition featuring a non-average non-median scoring criterion.
  • With 1,137 participants, it was very sizeable event. To date, I don't know any other forecasting competition that did even approach this level of participation.

The findings are not overly surprising: gradient boosted trees and deep learning models - which dominate the vast majority of the Kaggle competitions - end-up dominating the M5 as well.

Caveat emptor, those models are quite dramatically greedy in terms of computing resources. For a retail network of the scale of Walmart, I would not recommend those classes of machine learning models.