Treelite gives different predictions than base XGBoost model #585

juliuscoburger · 2024-09-23T08:33:38Z

I noticed that my model returns different scores than the original model. I was able to boil the issue down to using a base_score during training. Can it be that this value is not being translated?

Code to replicate the issue:

import numpy as np
import xgboost as xgb
import treelite

np.random.seed(42)
N = 10
X = np.random.random((N, 10))
y = np.random.random((N,))
dtrain = xgb.DMatrix(X, label=y)
bst = xgb.train({
    'objective': 'count:poisson'
}, dtrain, 10)
bst.save_model('/tmp/bst.json')
tl_model = treelite.frontend.load_xgboost_model('/tmp/bst.json')
# Treelite gives the same predictions as xgboost
np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain))


# Poisson will fail for sufficiently high predictions, see https://github.com/dmlc/xgboost/issues/10486
y = np.random.random((N,)) * 3000
dtrain = xgb.DMatrix(X, label=y)
# But the issue can be mitigated by setting sufficiently high base score
bst = xgb.train({
    'objective': 'count:poisson',
    'base_score': 3000
}, dtrain, 10)
bst.save_model('/tmp/bst.json')

tl_model = treelite.frontend.load_xgboost_model('/tmp/bst.json')
# Unfortunatelly treelite now gives different predictions
np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain))

The text was updated successfully, but these errors were encountered:

hcho3 · 2024-09-26T00:42:13Z

It looks like the floating-point error starts to creep in, from two sources:

Use of large base_scores
Order of summation is different in XGBoost's predictor and Treelite GTIL

The check passes if you relax the required tolerance:

 np.testing.assert_almost_equal(treelite.gtil.predict(tl_model, data=X).squeeze(), bst.predict(dtrain), decimal=2)

juliuscoburger · 2024-09-26T10:52:11Z

The check is just there to showcase that the scores are not equal. I was under the impression that GTIL always returns the same scores.

hcho3 · 2024-09-26T13:26:42Z

Can it be that this value is not being translated?

I double checked and base_scores is being properly translated and handled. So the error is not due to logic error.

I was under the impression that GTIL always returns the same scores.

GTIL may evaluate trees and leaf nodes in different order as XGBoost. Addition of floating-point values is not associative (a + (b + c) != (a+b) +c in general), and error may accumulate, especially if some values in the sum are much larger than the others, like in this example.

To minimize error due to floating-point arithmetic, consider scaling the target by using StandardScaler.

hcho3 · 2024-09-26T13:33:01Z

I'll probably have to add a note to the documentation for GTIL about possibility of floating-point error and how to mitigate it.

hcho3 closed this as completed Sep 26, 2024

hcho3 reopened this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treelite gives different predictions than base XGBoost model #585

Treelite gives different predictions than base XGBoost model #585

juliuscoburger commented Sep 23, 2024

hcho3 commented Sep 26, 2024

juliuscoburger commented Sep 26, 2024

hcho3 commented Sep 26, 2024 •

edited

Loading

hcho3 commented Sep 26, 2024

Treelite gives different predictions than base XGBoost model #585

Treelite gives different predictions than base XGBoost model #585

Comments

juliuscoburger commented Sep 23, 2024

hcho3 commented Sep 26, 2024

juliuscoburger commented Sep 26, 2024

hcho3 commented Sep 26, 2024 • edited Loading

hcho3 commented Sep 26, 2024

hcho3 commented Sep 26, 2024 •

edited

Loading