Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support undersampling and oversampling #221

Open
ecsalomon opened this issue Sep 18, 2017 · 1 comment
Open

Support undersampling and oversampling #221

ecsalomon opened this issue Sep 18, 2017 · 1 comment

Comments

@ecsalomon
Copy link
Contributor

With severely imbalanced classes, people often undersample the more frequent class or oversample the less frequent class (see https://www3.nd.edu/~dial/publications/hoens2013imbalanced.pdf). There are some standardized methods for this that might be good to implement, but even basic random under/oversampling of a given percentage would be good to have.

@thcrock
Copy link
Contributor

thcrock commented Jan 19, 2018

I'm curious how this would be implemented in something like Triage. Triage isn't sampling at all, is it? It just runs on all the data it is told to.

This is potentially achievable before Triage, by selectively including entities in the cohort that Triage is told about.

We could definitely do something like this internally in Triage, though there is a whole other layer of communication it has to do with the user given that there is no sampling currently. Who made it in to the sample and why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants