Biased estimator for Janitza p-values #555

jonathanishhorowicz · 2021-03-14T17:06:47Z

Hi, thanks again for this excellent package.

While using it for genomics applications I have had some problems with p-values that are exactly zero when using the Janitza method. The paper "Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn" by Phipson and Smith, this should not be the case and it is better to use a positively-biased estimator for a p-value with m permutations:
p = (b + 1)/(m + 1),
where b is the number of permutations where the test statistic is greater than the observed value.

This functionality was removed in #206 so is there a reason I should stick with the unbiased estimator (which is b/m)?

Many thanks!

jonathanishhorowicz · 2021-03-14T17:58:08Z

For completeness, if I were implementing this I would add the following at line 127 in importance.R:

pval.biased <- (1+length(vimp)-nSmaller)/(1+length(vimp))

mnwright · 2021-03-15T08:31:59Z

Hm... I'm trying to remember what the problem with the +1 approach was. I think one problem is that you cannot increase the number of "permutation" with the Janitza approach because that's the number of zero-or-negative-importance variables.

@snembrini Do you remember anything else?

jonathanishhorowicz · 2021-03-15T17:00:41Z

I also had that problem so have been returning the number of "permutations" for the Janitza method, so there is some idea of the precision of the estimated p-value. In the end I decided that not being able to increase the number of permutations was not as serious as being unable to do a multiple testing correction (due to the p-values being exactly zero), so I have gone with bootstrapped CIs for the impurity_corrected scores instead.

snembrini · 2021-03-15T17:47:27Z

@mnwright It is probably because the null distribution has a fixed m.
As I had previously shared with you, m could be increased by adding permutations of the regressors, but that inevitably slightly changes the original approach.

The information underlying m could be summarized using a Binomial confidence interval: With a small m, the CI will be wider.

I personally do not think that most P-value corrections can be applicable to the Janitza method, because very often Pvalues are required to be uniform under the null, which is not the case.

Also @jonathanishhorowicz, I know it is the same approach of Ishwaran, but CIs where there is no population parameters, is a term that should be avoided because it brings a lot of confusion

jonathanishhorowicz · 2021-04-03T15:02:19Z

Sorry for taking so long to reply to this!

Thank you @snembrini, I will look at the binomial confidence intervals for my application, although I suspect that the Bonferroni correction will make them extremely wide.

The approach I am using is not quite that of Ishwaran, since I am using your debaised impurity VIMP measure I was under the impression I did not have to account for the double bootstrap effect, since the impurity is a feature of the tree and is not calculated with additional samples. I have just wrapped my RF fitting in a bootstrap resampling procedure - is this not a valid approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Biased estimator for Janitza p-values #555

Biased estimator for Janitza p-values #555

jonathanishhorowicz commented Mar 14, 2021 •

edited

Loading

jonathanishhorowicz commented Mar 14, 2021

mnwright commented Mar 15, 2021

jonathanishhorowicz commented Mar 15, 2021

snembrini commented Mar 15, 2021 •

edited

Loading

jonathanishhorowicz commented Apr 3, 2021

Biased estimator for Janitza p-values #555

Biased estimator for Janitza p-values #555

Comments

jonathanishhorowicz commented Mar 14, 2021 • edited Loading

jonathanishhorowicz commented Mar 14, 2021

mnwright commented Mar 15, 2021

jonathanishhorowicz commented Mar 15, 2021

snembrini commented Mar 15, 2021 • edited Loading

jonathanishhorowicz commented Apr 3, 2021

jonathanishhorowicz commented Mar 14, 2021 •

edited

Loading

snembrini commented Mar 15, 2021 •

edited

Loading