You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It actually makes perfect sense if you think about what the intention of a zero-inflated log normal method is.
Imagine a simple case where a customer has an LTV of either 0$ with 99% probability, or has an LTV of exactly 100$ otherwise.
When we use a zero-inflated method for LTV, we are estimating the probability mass of zero LTV customers (classification) and we are estimating the conditional expected LTV for the non-zero LTV customers.
So in the case above, our perfect model would estimate the customer has a 1% chance of having a non-zero LTV and if they are non-zero LTV then we estimate their LTV EV to be 100$.
But if we just take the regression output then we would say the expected LTV of our customers is 100$ but this is clearly not true. We have to multiply the probability of the customer being non-zero by their expected LTV conditioned on them being non-zero.
If we assume that y is non-negative, then we can see that:
Our model is essentially estimating P(y > 0) with the classification output and it is estimating E(y | y > 0) with the regression output.
So that is why we multiply the probability of non-zero LTV with the conditional customer expected LTV to get the true customer expected LTV that we care about which is E(y)
Ref:
lifetime_value/lifetime_value/zero_inflated_lognormal.py
Line 35 in dd41896
The text was updated successfully, but these errors were encountered: