Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow normalizers to skip NaN values #333

Merged
merged 2 commits into from
Dec 26, 2024

Conversation

27pchrisl
Copy link
Contributor

Hi,

I have a dataset with a lot of missing data, and I wanted to use the HotDeckImputer with the Gower kernel to fill in the blanks. I had preprocessed my dataset so that nulls were converted to NAN or ? depending on the data type.

The problem was then that Gower expects continuous features to have been normalized. I then wanted to use the MinMaxNormalizer to do this on the continuous features in the dataset, but it doesn't handle NAN - essentially every value is normalized to zero.

I updated the MinMaxNormalizer and the MaxAbsoluteScaler to skip NAN values, and compute min/max or abs only the finite values and leave the NAN values where they were in the original dataset.

Being new to ML, I wasn't sure if this was a valid approach for using the normalizers together with the Gower imputer - feedback welcome!

@andrewdalpino andrewdalpino changed the base branch from master to 3.0 May 23, 2024 17:47
@andrewdalpino
Copy link
Member

Targetting ML 3.0 release with this since it can be construed as a backwards compatibility break.

@andrewdalpino andrewdalpino merged commit 646b1a2 into RubixML:3.0 Dec 26, 2024
1 of 13 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants