3 points by tpmurray97 Sep 25, 2023 | flag | 1 comment

Hello,

I was reading the cross-entropy definition from the Lokad website, and I am having trouble understanding how to use cross-entropy loss in demand forecasting.

https://www.lokad.com/cross-entropy-definition/

I am familiar with cross entropy loss in classification problems, where

$\text{Loss} = -\sum_{i=1}^{n} y_i * \log \hat{y}_{i}$

and $n$ is the number of classes we have. In this case, $y_i$ would always be either 0 or 1, and $\hat{y}_{i}$ is the softmax probability for i-th class. Usually, we can get the average Cross Entropy Loss across all data points, and adjust our weights to minimize that loss with gradient descent.

But how do we formulate this in a forecasting problem? Would our classes be all discrete numbers in $[0,\infty]$, and our $y_i$'s 0, except for the actual demand value, which would be 1?

In the article, it is mentioned that Lokad collected empirical data which supports the claim that Cross Entropy is usually the most efficient metric to optimize, rather than MSE, MAPE, CRPS, etc. Is it possible to view that data?

Thanks,
Tom Murray

vermorel Oct 09, 2023 | flag

Cross-entropy is merely a variant of the likelihood in probability theory. Cross-entropy works on any probability distribution as long as a density function is available. See for example https://docs.lokad.com/reference/jkl/loglikelihood.negativebinomial/

If you can produce a parametric density distribution, then, putting pathological situations aside, you can regress it through differentiable programming. See fleshed out examples at https://www.lokad.com/tv/2023/1/11/lead-time-forecasting/

In the article, it is mentioned that Lokad collected empirical data which supports the claim that Cross Entropy is usually the most efficient metric to optimize, rather than MSE, MAPE, CRPS, etc. Is it possible to view that data?

No, unfortunately for two major reasons.

First, Lokad has strict NDAs in place with all our client companies. We do not share anything, not even derivative data, without the consent of all the parties involved.

Second, this claim should be understood from the perspective the experimental optimization paradigm, which is (most likely) not what you think. See https://www.lokad.com/tv/2021/3/3/experimental-optimization/

Hope it helps,
Joannes