Missing value handling #687

mnwright · 2023-08-24T08:38:41Z

This is a quick and dirty implementation of missing value handling as done by other packages such as xgboost, lightgbm and scikit-learn. When splitting a node, missing values are ignored for calculating an initial split criterium value (i.e., decrease of impurity). Then for the best split, the missings are tried in the left child node and in the right child node and the choice is made based again on the split criterium value. For prediction, this direction is saved as the "default" direction. If a missing occurs in prediction at a node where there is no default direction, it goes left.

Caveats:

~~So far only implemented for classification with the default (Gini) splitting, also not for probability=TRUE~~
Only missings in the predictors x are handled, not in the outcome y
~~Potential slow implementation, have to check computation speed~~
~~Dirty implementation with repetitive code and no shortcuts if no data is missing at all~~
~~Not well tested~~

mnwright · 2024-07-03T05:30:53Z

Now this is also implemented for probability and regression. It's also better tested and faster (no impact if no missing data).

mayer79 · 2024-12-07T08:45:21Z

Fantastic, thanks @mnwright

mnwright added 2 commits August 24, 2023 10:28

quick and dirty missing value handling for classification

81bf7d8

merge with master

29f3ca0

mnwright added the not to be merged (for now) label May 16, 2024

mnwright added 11 commits May 16, 2024 20:38

Merge branch 'master' into missing_values

191e298

only use NA code if any NAs

a457fc6

tests for missing data

db28116

put NA splitting in seperate functions

1d1bb34

NA splitting only for certain split rules

71558cb

NA splitting only for certain split rules

47ec4c5

revert small changes

6510c76

add missing value handling for probability and regression

0ab1efb

more tests for missings

e779a65

implement na.omit

57a07eb

merge with master

9733b14

mnwright changed the title ~~Draft missing value handling for classification~~ Missing value handling Jul 2, 2024

version and docs

71c3c46

mnwright removed the not to be merged (for now) label Oct 28, 2024

merge missing values with master

6176aed

mnwright merged commit 776cf96 into master Oct 28, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing value handling #687

Missing value handling #687

mnwright commented Aug 24, 2023 •

edited

Loading

mnwright commented Jul 3, 2024

mayer79 commented Dec 7, 2024

Missing value handling #687

Missing value handling #687

Conversation

mnwright commented Aug 24, 2023 • edited Loading

mnwright commented Jul 3, 2024

mayer79 commented Dec 7, 2024

mnwright commented Aug 24, 2023 •

edited

Loading