Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: probabilities do not sum to 1 #4

Open
zhao-zilong opened this issue Sep 11, 2024 · 1 comment
Open

ValueError: probabilities do not sum to 1 #4

zhao-zilong opened this issue Sep 11, 2024 · 1 comment

Comments

@zhao-zilong
Copy link

The bug comes from this line:

draws = np.random.choice(a=range(unique_bnds.shape[0]), p = unique_bnds['cvg'] / self.num_trees, size=n)

And I think why it happens it is because in this line of code: you intentionally set come cvg values to zero:

bnds.loc[bnds['cvg'] == 1/pred.shape[0],'cvg'] = 0

Do you have some insights on this?

@mnwright
Copy link
Member

Can you give a reproducible example for the ValueError?

We set those to zero because we cannot estimate variances for single-observation leaves. In R, we now avoid that completely by using "class-wise min.bucket", which avoids nodes with less than 2 (or whatever the user sets for the parameter) real observations (see imbs-hl/ranger#721). But I think that is not possible with scikit-learn's RandomForestClassifier: There is min_samples_leaf but that can't be class-specific I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants