Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add Lambert W x F distributions #186

Open
gmgeorg opened this issue Jan 28, 2024 · 4 comments
Open

[ENH] Add Lambert W x F distributions #186

gmgeorg opened this issue Jan 28, 2024 · 4 comments
Labels
feature request New feature or request implementing algorithms Implementing algorithms, estimators, objects native to skpro module:probability&simulation probability distributions and simulators

Comments

@gmgeorg
Copy link

gmgeorg commented Jan 28, 2024

Is your feature request related to a problem? Please describe.

For modeling skewed and/or heavy-tailed distributions i'd like to have support for Lambert W x F distributions. On top of modeling, Lambert W x F distribution allow to "Gaussianize" the observed data.

This is especially useful / prevalent for financial time series data, which is often skewed and/or heavy-tailed.

Describe the solution you'd like

This exists in the LambertW R package and the pylambertw Python module, which is an sklearn transformer/estimator wrapper around torchlambertw.

Describe alternatives you've considered

Other heavy-tailed distributions; but none of the typical ones allow the ease of itnerpretation of the heavy-tail parameter, the input/output system view of transformation, and a bijective back-transformation.

Additional context

I'd be happy to open a PR to implement a first version of Lambert W x Gaussian distributions, but would like some guidance/pointers on best practices for skpro.

@gmgeorg gmgeorg added the feature request New feature or request label Jan 28, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 28, 2024

Very interesting. For anyone looking for a mathematical reference, the annals article is available on the arxiv: https://arxiv.org/abs/0912.4554.

I am intrigued since, please confirm if I understad this correctly:

  • Lambert transformed distributions are actually dependent objects - i.e., Lambert tf of distr D, which makes thes compositional
  • the practical intention of introducing them is making distributions more normal for modelling, which is a common assumption in machine learning

If I understand correctly, there are also multiple related "objects":

  • the transformation itself, operating on empirical samples (matrices)
  • the distribution depending on another distribution that gets transformed
  • a regressor that applies the sample transformation on data, and inverts the transformation on a distribution that is predictd

The last one especially is related to the "transformed distribution" proposed in #30.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 28, 2024

I'd be happy to open a PR to implement a first version of Lambert W x Gaussian distributions, but would like some guidance/pointers on best practices for skpro.
Thanks, that would be nice!

skpro generally follows sklearn extension patterns. The distribution extension contract is not that well-documented at the moment, it is maturing - you could however look at the classes in distributions, all methods have proper docstrings. Perhaps the Normal is the best template for now.

The one thing to note, perhaps, is that distributions are of matrix/table shape, i.e., a matrix/table with distributions (possibly dependnent but usually independent) as entries. This is because in tabular probabilistic regression, this object is the output.

Questions:

  • would it not be nicer to have Lambert W x any distribution? Or, are the transforms of Gaussians more explicit than the arbitrary case?
    • this would be representable in the interface, it would be a distribution that takes another distribution as argument.

@fkiraly fkiraly added module:probability&simulation probability distributions and simulators implementing algorithms Implementing algorithms, estimators, objects native to skpro labels Jan 28, 2024
@gmgeorg
Copy link
Author

gmgeorg commented Jan 28, 2024

@fkiraly yes to all your points in first reply.

re 2nd: yes, implementing Lambert W x Gaussian shouldn't be much different from just implementing a Lambert W x F abstract class and then inheriting/setting base_distribution=Gaussian . This is what I ended up doing for torchlambertw as well as the xgboostlss implementations.

I need to get more familiar with skpro first to see how this would actually work in this framework. Will take a look at this and see if I run into any issues trying to implement the generic LambertWDistribution class first, with LambertWGaussian, LambertWExponential, etc as special cases.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 28, 2024

shouldn't be much different from just implementing a Lambert W x F abstract class and then inheriting/setting base_distribution=Gaussian

I see!

I need to get more familiar with skpro first to see how this would actually work in this framework.

I would recommend to look at distributions.normal for an example. We have not gotten round to write an extension template, but I hope the stucture is self-explanatory.

Will take a look at this and see if I run into any issues trying to implement the generic LambertWDistribution class first, with LambertWGaussian, LambertWExponential

The way I imagined it would be sth around the lines:

any_inner_dist = InnerDist(a=a_arr, b=b_arr)
lambert_trafo_dist = LambertW(any_inner_dist, gamma=0.5)

That is, any distribution can be taken as an argument of LambertW - what is passed is the actual distribution, not a string.

In the example, InnerDist could be Gaussian or Laplace or anything else, and it provides th methods that all distributions have. Do you think it can be implemented in this high degree of generality, or do we need to make case distinctions for inner distributions, e.g., due to limitations in our knowledge of the explicit form of distribution generating functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request implementing algorithms Implementing algorithms, estimators, objects native to skpro module:probability&simulation probability distributions and simulators
Projects
None yet
Development

No branches or pull requests

2 participants