Is it possible to access the input column names in onnx converter functions? #1088

paranjapeved15 · 2024-04-23T21:46:45Z

In the below code, I am appending the vehicleType to the prefix to get which column name to use from my input Dataframe. So for example if vehicleType = 'car' then I would return 'features_car' feature value.
Data:
df = pandas.DataFrame([["car",0.1,0.0],["car",0.2,0.0],["suv",0.0,0.2]],columns=['vehicleType','features_car','features_suv'])
Custom Transformer:

class GetScore(BaseEstimator, TransformerMixin):  # type: ignore
    """Apply binarize transform for matching values to filter_value."""

    def __init__(self, prefix: str):
        """Initialize transformer with expected columns."""
        self.prefix = prefix
        pass

    def dot_product(self, x) -> float:
        """Return 1.0 if input == filter_value, else 0."""
        print("type of x:")
        print(type(x))
        return x[self.prefix+x.vehicleType]


    def fit(self, X, y=None):  # type: ignore
        """Fit the transformer."""
        return self

    def transform(self, X: pandas.DataFrame | numpy.ndarray, y: None = None) -> numpy.ndarray:
        """Transform the given data."""
        if type(X) == pandas.DataFrame:
            x = X.apply(lambda x: self.dot_product(x), axis=1)
            return x.values.reshape((-1, 1))
        # elif type(X) == numpy.ndarray:
        #     vector_func = numpy.vectorize(self.dot_product)
        #     x = vector_func(X)
        #     return x.reshape((-1, 1))

    def get_feature_names_out(self) -> None:
        """Return feature names. Required for onnx conversion."""
        pass

sklearn pipeline:

preprocessor = ColumnTransformer(
        transformers=[
            #("",make_pipeline(OneHotEncoder(categories=[["car", "suv"]], sparse_output=False)), ['vehicleType','features_car','features_suv']),
            ("features_computed",GetScore("features_"), ['vehicleType','features_car','features_suv']),
            ],
    #remainder="passthrough",
    verbose_feature_names_out=False,
)

To write a custom converter for my GetScore, I would need to be able to access the input by the column name. Is that accessible in the converter inputs? Or would I have to come up with another approach?

The text was updated successfully, but these errors were encountered:

xadupre · 2024-05-20T09:22:25Z

The library was started before scikit-learn implemented this feature and this information is not used right now. We could probably make a code change to improve the mapping between onnx name and scikit-learn name. I'm not sure it is possible to guarantee an exact mapping in particular if scikit-learn allows name to be reused but it is not a very complicated changed. Do you think it should be the exact name, or extended with a prefix or suffix?

xadupre self-assigned this Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to access the input column names in onnx converter functions? #1088

Is it possible to access the input column names in onnx converter functions? #1088

paranjapeved15 commented Apr 23, 2024 •

edited by xadupre

Loading

xadupre commented May 20, 2024

Is it possible to access the input column names in onnx converter functions? #1088

Is it possible to access the input column names in onnx converter functions? #1088

Comments

paranjapeved15 commented Apr 23, 2024 • edited by xadupre Loading

xadupre commented May 20, 2024

paranjapeved15 commented Apr 23, 2024 •

edited by xadupre

Loading