-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MP-Value vs. Grit #8
Comments
Nice to see this comparison, congrats! Edit: I think the last point is a bit confused. A (hopefully) clearer example: if you have three profiles A, B and C such that A = 2B = 4C, then PCC(A,B) = PCC(A,C) = 1, yet the distance(A,B) < distance(A,C). In this case the grit and mp-values would be significantly different. |
Interesting... indeed, in #11, we observed that negative-control based grit was lower than whole-plate grit: see https://raw.githubusercontent.com/broadinstitute/grit-benchmark/main/2.compare-metrics/cell-health/figures/plate_normalization/cell_health_grit_platenormalization_comparison.png Your hypothesis is that mp-value will be unchanged with different normalization methods?
Good point! This supports our original thought that "Having low median replicate correlation might not be reflected well in Mahalanhobis distance calculations." right? In other words, the high MP-value/low Grit perturbations could mean substantially far-away profiles, but with high correlation to controls? Not really sure what that would mean biologically... |
It would be unchanged for any centering / scaling. Other transformations (e.g. log-transform) might change things a lot.
I think so. Either a high correlation to control or a low correlation to replicates. "Close profiles" would not correlate well between replicates and lead to low Grit and log-mp-values while "far-away profiles" would lead to high log-mp-values but not necessarily to high Grit scores, and it is actually matching what you observe! Biologically, I feel that mp-values would be appropriate when dose-dependent effects are of interest (for instance drug concentration or editing efficiency) while grit would be better suited when you want to pool these effects together ("is a drug inducing changes?" rather than "is this drug concentration enough to induce changes?"). |
Interesting! Do you think that calculating grit with respect to distance instead of Pearson correlation would be better? One reason Pearson correlation is preferred, is because the normalization (as long as it's consistent) doesn't matter as much, and comparing grit scores across datasets is easier. For the dose-dependent/edit efficiency effects, we do observe that grit handles this well: https://raw.githubusercontent.com/broadinstitute/grit-benchmark/main/2.compare-metrics/perturb-seq/figures/GSE132080_crispri_grit_relative_activity_comparison.png ☝️ in a CRISPRi dataset, grit tracks nicely with a measure of gene expression knockdown (relative efficiency) |
I feel there's no perfect general solution, it all depends on the specific goal of the experiment. Offering the option of a correlation-based metric is probably better to diversify the options one might have compared to what already exists with mp-values and statistical distances and...
In my experience, this part is particularly tricky with statistical distances. If you want to compare two perturbations, either you compare the Mahalanobis distances which only consider the dispersion of the control, or you compare mp-values (which are empirical p-values) which don't tell you much about the effect size. Other distances still don't have the nice interpretation you mention in the README
That's interesting! Does it hold if you center your data? I can probably look into the code myself anyway... |
In #6 I compare Grit to mp-value. Here are the results:
The patterns are quite interesting! It is especially interesting that many perturbations have high mp-value, but very low grit (cc @shntnu) I am not sure how to rationalize this. I was thinking that these high mp, low grit samples could be ones with relatively low median replicate correlation. Having low median replicate correlation might not be reflected well in Mahalanhobis distance calculations.
I visualized this relationship (using 5,000 permutations), and it looks like it is indeed playing a role, but it probably doesn't paint the full picture.
Perhaps another for this has to do with 5,000 permutations being too few.
@koalive, as an mp-value expert, do you have any thoughts on why this might be happening?
The text was updated successfully, but these errors were encountered: