Allow normalizers to skip NaN values #333
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I have a dataset with a lot of missing data, and I wanted to use the
HotDeckImputer
with theGower
kernel to fill in the blanks. I had preprocessed my dataset so thatnull
s were converted toNAN
or?
depending on the data type.The problem was then that
Gower
expects continuous features to have been normalized. I then wanted to use theMinMaxNormalizer
to do this on the continuous features in the dataset, but it doesn't handleNAN
- essentially every value is normalized to zero.I updated the
MinMaxNormalizer
and theMaxAbsoluteScaler
to skipNAN
values, and compute min/max or abs only the finite values and leave theNAN
values where they were in the original dataset.Being new to ML, I wasn't sure if this was a valid approach for using the normalizers together with the Gower imputer - feedback welcome!