You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not a big deal or anything, but it might be nice to add some extra context (e.g. another tutorial notebook) about using Qurro in a more ML-ish style—where differential abundance is run on only a subset of the samples (the training samples), and these samples are used to select a log-ratio in Qurro which can then be tested against the held-out testing samples.
The advantage of this approach is that it provides a stronger argument for how reliable a log-ratio's association with some metadata is, since the log-ratio has held up to an extra round of validation. This is kind of a philosophical difference and not really Qurro's problem (you could argue that many differential abundance approaches don't really account for this by default, and/or that Songbird's train/test setup already accounts for this), but it may be worth mentioning somewhere at least.
One way to support this in Qurro's codebase would involve adding a parameter that takes as input a TrainTest column (analogous to what Songbird asks for with the --training-column/--p-training-column parameter), and then generates two separate Qurro visualizations (one for the training samples, one for the testing samples). That might get kind of clunky, though! An extension to this would be adding an Import selected features button (analogous to the Export currently selected features button) so that the user can easily test the same log-ratios in multiple visualizations.
...That all being said, after the user tries, like, more than one log-ratio on both the training and testing datasets this kind of loses its effectiveness! The exploratory data analysis approach Qurro uses might be at odds somewhat with this idea of validation, since it's inherently susceptible to the whole multiple-comparisons thing.
Anyway, I figured I should write this up somewhere, if nothing else to document that this might be worth thinking about more at some point. Partially inspired by going through this preprint :)
The text was updated successfully, but these errors were encountered:
Not a big deal or anything, but it might be nice to add some extra context (e.g. another tutorial notebook) about using Qurro in a more ML-ish style—where differential abundance is run on only a subset of the samples (the training samples), and these samples are used to select a log-ratio in Qurro which can then be tested against the held-out testing samples.
The advantage of this approach is that it provides a stronger argument for how reliable a log-ratio's association with some metadata is, since the log-ratio has held up to an extra round of validation. This is kind of a philosophical difference and not really Qurro's problem (you could argue that many differential abundance approaches don't really account for this by default, and/or that Songbird's train/test setup already accounts for this), but it may be worth mentioning somewhere at least.
One way to support this in Qurro's codebase would involve adding a parameter that takes as input a
TrainTest
column (analogous to what Songbird asks for with the--training-column
/--p-training-column
parameter), and then generates two separate Qurro visualizations (one for the training samples, one for the testing samples). That might get kind of clunky, though! An extension to this would be adding anImport selected features
button (analogous to theExport currently selected features
button) so that the user can easily test the same log-ratios in multiple visualizations....That all being said, after the user tries, like, more than one log-ratio on both the training and testing datasets this kind of loses its effectiveness! The exploratory data analysis approach Qurro uses might be at odds somewhat with this idea of validation, since it's inherently susceptible to the whole multiple-comparisons thing.
Anyway, I figured I should write this up somewhere, if nothing else to document that this might be worth thinking about more at some point. Partially inspired by going through this preprint :)
The text was updated successfully, but these errors were encountered: