Regarding XGBoostLSS #60

dips96 · 2023-09-21T09:58:14Z

dips96
Sep 21, 2023

Is it possible to do point prediction using XGBoostLSS? Because, if it can only predict a prediction interval, then after uncertainty quantification, how could we know whether the predictions are improving and most importantly along with uncertainty quantification we need the point prediction of the parameters in our research. So, any help regarding this problem will be very helpful.

StatMixedML · 2023-09-21T12:19:13Z

StatMixedML
Sep 21, 2023
Maintainer

Thanks for your interest in the project.

You can derive any metric of interest from the predicted distribution, e.g., mean, variance, quantiles, etc. Probably the easiest would be to use the samples that are drawn from the predicted distribution. In the following example, it is assumed you have a trained xgblss-model.

# Number of samples to draw from predicted distribution (make sure to set this to a high number in case quantiles are needed)
n_samples = 10000

# Sample from predicted distribution
pred_samples = xgblss.predict(dtest, pred_type="samples", n_samples=n_samples, seed=123)

# Calculate Mean
mean_pred = pred_samples.mean(axis=1)

# Calculate Std-Dev
std_pred = pred_samples.std(axis=1)

# Calculate Quantiles
quantile_pred = pred_samples.quantile(q=[0.1, 0.5, 0.9], axis=1).T

Let me know in case of open questions.

0 replies

dips96 · 2023-09-21T12:40:14Z

dips96
Sep 21, 2023
Author

Thank you very much for your prompt response. I have few queries regarding this which are as follows:

Can we use XGBoostLSS for active learning process? What I understood, we can easily use the difference of upper quantile and lower quantile as an uncertainty parameter and based on that I can determine the data with maximum uncertainty from my unknown dataset. However, after running this process iteratively, we have to stop at a certain point when we will achieve our required accuracy. So, for measuring this, metrics for some point predictions such as RMSE, MAE etc. are required and also some connection with the algorithms for point prediction (not prediction interval) such as XGBoost is required. So, can we use XGBoost (or any other such kind of algorithms) merging with XGBoostLSS so that it can provide us uncertainty quantification for active learning process (to include all the most uncertain data in training) and atlast the final point prediction for unknown data (finally we need point prediction, not the prediction interval).

0 replies

StatMixedML · 2023-09-21T14:02:02Z

StatMixedML
Sep 21, 2023
Maintainer

I am not exactly sure I properly understand your question. Can you be more specific, ideally with a step-by-step process that you would need to follow to solve your task? That'll help to answer the question.

0 replies

dips96 · 2023-09-21T14:15:13Z

dips96
Sep 21, 2023
Author

Ok. Let me illaborate pointwise.

I need point prediction along with the prediction interval. Suppose, you have typical California housing dataset which contains some lablelled data along with some unlabelled data. One can implement any familiar regression algorithm (XGBR, RFR, GPR etc.) for the prediction of unknown data. But you will not get confidence interval of the unknown data (uncertainty) mwhich can be solved by XGBoostLSS. However, XGBoostLSS can't predict an exact value of those unknown data because it provides output as a prediction interval.
In this case, what can be the solution? Can we use any other algorithm for exact value prediction merging with XGBoostLSS.

0 replies

StatMixedML · 2023-09-21T14:46:59Z

StatMixedML
Sep 21, 2023
Maintainer

I need point prediction along with the prediction interval

Since you can draw samples from the predicted distribution, you get point predictions (as shown above) as well as intervals via quantiles (as shown above). You can derive any quantity from the samples.

Can we use XGBoostLSS for active learning process? What I understood, we can easily use the difference of upper quantile and lower quantile as an uncertainty parameter and based on that I can determine the data with maximum uncertainty from my unknown dataset. However, after running this process iteratively, we have to stop at a certain point when we will achieve our required accuracy. So, for measuring this, metrics for some point predictions such as RMSE, MAE etc. are required and also some connection with the algorithms for point prediction

Since XGBoostLSS is trained on an objective function, but cross-validation or early stopping are evaluated on a metric function, you can use a measure that evaluates the conditional mean (such as RMSE, MSE) for early stopping of the model via the metric function. Is this what you are asking?

0 replies

dips96 · 2023-09-21T15:12:49Z

dips96
Sep 21, 2023
Author

Yes, this is what I was asking. Many thanks for the answer. I have still one doubt whether I can execute active learning process using this.

Suppose I have calculated the uncertainty using the difference between upper and lower bound value of prediction interval for each data instance and pick the data with maximum uncertainty for including in training data iteratively. Could it help me to get better training sample lead to better prediction for unknown data ?
Here, better prediction means I want to indicate the reduction of prediction interval keeping intact the percentage of confidence.
If above process will not work, is there any other process using XGBoostLSS to do the same ?

0 replies

StatMixedML · 2023-09-22T07:57:53Z

StatMixedML
Sep 22, 2023
Maintainer

Suppose I have calculated the uncertainty using the difference between upper and lower bound value of prediction interval for each data instance and pick the data with maximum uncertainty for including in training data iteratively. Could it help me to get better training sample lead to better prediction for unknown data ?

What you are describing sounds interesting. I have never used XGBoost or XGBoostLSS for active learning, but you are invited to give it a go.

Here, better prediction means I want to indicate the reduction of prediction interval keeping intact the percentage of confidence.
If above process will not work, is there any other process using XGBoostLSS to do the same ?

Given that you would use a metric function for early stopping that evaluates the conditional mean (e.g., MSE), I am not sure how this affects estimation of all distributional parameters (for the Normal it is mu and sigma that the model estimates) or the quality of the uncertainty estimation. Hence, there is no guarantee that the estimated distributional parameters are close to the "true" ones. Please feel free to give it a go.

Since right now, the metric function evaluates the NLL for cross-validation or early stopping, do you need assistence in replacing it with the MSE?

0 replies

StatMixedML · 2023-10-04T14:41:33Z

StatMixedML
Oct 4, 2023
Maintainer

Any update on this? Can we close it?

1 reply

dips96 Oct 4, 2023
Author

Sorry for late reply. Can you please share the codes for the metric function evaluates the NLL for cross-validation or early stopping? Then you can close it. Thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding XGBoostLSS #60

{{title}}

Replies: 8 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Regarding XGBoostLSS #60

dips96 Sep 21, 2023

Replies: 8 comments · 1 reply

StatMixedML Sep 21, 2023 Maintainer

dips96 Sep 21, 2023 Author

StatMixedML Sep 21, 2023 Maintainer

dips96 Sep 21, 2023 Author

StatMixedML Sep 21, 2023 Maintainer

dips96 Sep 21, 2023 Author

StatMixedML Sep 22, 2023 Maintainer

StatMixedML Oct 4, 2023 Maintainer

dips96 Oct 4, 2023 Author

dips96
Sep 21, 2023

Replies: 8 comments 1 reply

StatMixedML
Sep 21, 2023
Maintainer

dips96
Sep 21, 2023
Author

StatMixedML
Sep 21, 2023
Maintainer

dips96
Sep 21, 2023
Author

StatMixedML
Sep 21, 2023
Maintainer

dips96
Sep 21, 2023
Author

StatMixedML
Sep 22, 2023
Maintainer

StatMixedML
Oct 4, 2023
Maintainer

dips96 Oct 4, 2023
Author