-
-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Biased estimator for Janitza p-values #555
Comments
For completeness, if I were implementing this I would add the following at line 127 in importance.R:
|
Hm... I'm trying to remember what the problem with the +1 approach was. I think one problem is that you cannot increase the number of "permutation" with the Janitza approach because that's the number of zero-or-negative-importance variables. @snembrini Do you remember anything else? |
I also had that problem so have been returning the number of "permutations" for the Janitza method, so there is some idea of the precision of the estimated p-value. In the end I decided that not being able to increase the number of permutations was not as serious as being unable to do a multiple testing correction (due to the p-values being exactly zero), so I have gone with bootstrapped CIs for the impurity_corrected scores instead. |
@mnwright It is probably because the null distribution has a fixed m. The information underlying m could be summarized using a Binomial confidence interval: With a small m, the CI will be wider. I personally do not think that most P-value corrections can be applicable to the Janitza method, because very often Pvalues are required to be uniform under the null, which is not the case. Also @jonathanishhorowicz, I know it is the same approach of Ishwaran, but CIs where there is no population parameters, is a term that should be avoided because it brings a lot of confusion |
Sorry for taking so long to reply to this! Thank you @snembrini, I will look at the binomial confidence intervals for my application, although I suspect that the Bonferroni correction will make them extremely wide. The approach I am using is not quite that of Ishwaran, since I am using your debaised impurity VIMP measure I was under the impression I did not have to account for the double bootstrap effect, since the impurity is a feature of the tree and is not calculated with additional samples. I have just wrapped my RF fitting in a bootstrap resampling procedure - is this not a valid approach? |
Hi, thanks again for this excellent package.
While using it for genomics applications I have had some problems with p-values that are exactly zero when using the Janitza method. The paper "Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn" by Phipson and Smith, this should not be the case and it is better to use a positively-biased estimator for a p-value with m permutations:
p = (b + 1)/(m + 1),
where b is the number of permutations where the test statistic is greater than the observed value.
This functionality was removed in #206 so is there a reason I should stick with the unbiased estimator (which is b/m)?
Many thanks!
The text was updated successfully, but these errors were encountered: