Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to access the input column names in onnx converter functions? #1088

Open
paranjapeved15 opened this issue Apr 23, 2024 · 1 comment
Assignees

Comments

@paranjapeved15
Copy link

paranjapeved15 commented Apr 23, 2024

In the below code, I am appending the vehicleType to the prefix to get which column name to use from my input Dataframe. So for example if vehicleType = 'car' then I would return 'features_car' feature value.
Data:
df = pandas.DataFrame([["car",0.1,0.0],["car",0.2,0.0],["suv",0.0,0.2]],columns=['vehicleType','features_car','features_suv'])
Custom Transformer:

class GetScore(BaseEstimator, TransformerMixin):  # type: ignore
    """Apply binarize transform for matching values to filter_value."""

    def __init__(self, prefix: str):
        """Initialize transformer with expected columns."""
        self.prefix = prefix
        pass

    def dot_product(self, x) -> float:
        """Return 1.0 if input == filter_value, else 0."""
        print("type of x:")
        print(type(x))
        return x[self.prefix+x.vehicleType]


    def fit(self, X, y=None):  # type: ignore
        """Fit the transformer."""
        return self

    def transform(self, X: pandas.DataFrame | numpy.ndarray, y: None = None) -> numpy.ndarray:
        """Transform the given data."""
        if type(X) == pandas.DataFrame:
            x = X.apply(lambda x: self.dot_product(x), axis=1)
            return x.values.reshape((-1, 1))
        # elif type(X) == numpy.ndarray:
        #     vector_func = numpy.vectorize(self.dot_product)
        #     x = vector_func(X)
        #     return x.reshape((-1, 1))

    def get_feature_names_out(self) -> None:
        """Return feature names. Required for onnx conversion."""
        pass

sklearn pipeline:

preprocessor = ColumnTransformer(
        transformers=[
            #("",make_pipeline(OneHotEncoder(categories=[["car", "suv"]], sparse_output=False)), ['vehicleType','features_car','features_suv']),
            ("features_computed",GetScore("features_"), ['vehicleType','features_car','features_suv']),
            ],
    #remainder="passthrough",
    verbose_feature_names_out=False,
)

To write a custom converter for my GetScore, I would need to be able to access the input by the column name. Is that accessible in the converter inputs? Or would I have to come up with another approach?

@xadupre
Copy link
Collaborator

xadupre commented May 20, 2024

The library was started before scikit-learn implemented this feature and this information is not used right now. We could probably make a code change to improve the mapping between onnx name and scikit-learn name. I'm not sure it is possible to guarantee an exact mapping in particular if scikit-learn allows name to be reused but it is not a very complicated changed. Do you think it should be the exact name, or extended with a prefix or suffix?

@xadupre xadupre self-assigned this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants