Duplicated coefficients in LinearClassifier #1066

wsperat · 2024-01-31T15:44:07Z

I recently inherited a legacy project that contains a Logistic Regression for binary classification, stored in ONNX format, and converted with skl2onnx. When trying to do a bit of model introspection, I was surprised to find that both the model coefficients and its intercept are stored twice, where the second copy seems to be the negative of the first. Looking into the conversion code I found the following line in skl2onnx.operator_converters.linear_classifier:

op = operator.raw_operator
coefficients = op.coef_.flatten().astype(float).tolist()
classes = get_label_classes(scope, op)
number_of_classes = len(classes)
# some more code
if number_of_classes == 2:
    coefficients = list(map(lambda x: -1 * x, coefficients)) + coefficients
    intercepts = list(map(lambda x: -1 * x, intercepts)) + intercepts

This certainly explains the duplication, but I was wondering about the reason for storing values in this way.
Thanks!

The text was updated successfully, but these errors were encountered:

sdpython · 2024-02-01T13:53:50Z

The first pass of converters was to go as fast as possible to cover as many models as possible. It could certainly be improved. Are you willing to contribute?

wsperat · 2024-02-02T13:46:49Z

@sdpython Absolutely! However, I'd like to understand the reason behind doing it this way, mostly because I'm not sure how the ONNX runtime deals with the duplicated coefficients.

xadupre · 2024-02-08T13:36:18Z

This was one mayn years. I don't remember the reasons with everything we did. This PR onnx/onnx#5874 changes the way TreeEnsembleRegressor and TreeEnsembleClassifier are defined. It is now one single operator. We could do the same with LinearClassifier and LinearRegressor or just deprecated them as they can be expressed with regular operators.

A change involves three steps:

changes onnx standard if needed
implement a kernel in onnxruntime which runs the modified or added operator in onnx
update the converting library to use the new operator or the new way to convert regression or classification.

How would you like to contribute?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated coefficients in LinearClassifier #1066

Duplicated coefficients in LinearClassifier #1066

wsperat commented Jan 31, 2024

sdpython commented Feb 1, 2024

wsperat commented Feb 2, 2024

xadupre commented Feb 8, 2024

Duplicated coefficients in LinearClassifier #1066

Duplicated coefficients in LinearClassifier #1066

Comments

wsperat commented Jan 31, 2024

sdpython commented Feb 1, 2024

wsperat commented Feb 2, 2024

xadupre commented Feb 8, 2024