Discuss log-ratio generalizability / training and test splits in a tutorial? #317

fedarko · 2021-07-17T05:13:33Z

Not a big deal or anything, but it might be nice to add some extra context (e.g. another tutorial notebook) about using Qurro in a more ML-ish style—where differential abundance is run on only a subset of the samples (the training samples), and these samples are used to select a log-ratio in Qurro which can then be tested against the held-out testing samples.

The advantage of this approach is that it provides a stronger argument for how reliable a log-ratio's association with some metadata is, since the log-ratio has held up to an extra round of validation. This is kind of a philosophical difference and not really Qurro's problem (you could argue that many differential abundance approaches don't really account for this by default, and/or that Songbird's train/test setup already accounts for this), but it may be worth mentioning somewhere at least.

One way to support this in Qurro's codebase would involve adding a parameter that takes as input a TrainTest column (analogous to what Songbird asks for with the --training-column/--p-training-column parameter), and then generates two separate Qurro visualizations (one for the training samples, one for the testing samples). That might get kind of clunky, though! An extension to this would be adding an Import selected features button (analogous to the Export currently selected features button) so that the user can easily test the same log-ratios in multiple visualizations.

...That all being said, after the user tries, like, more than one log-ratio on both the training and testing datasets this kind of loses its effectiveness! The exploratory data analysis approach Qurro uses might be at odds somewhat with this idea of validation, since it's inherently susceptible to the whole multiple-comparisons thing.

Anyway, I figured I should write this up somewhere, if nothing else to document that this might be worth thinking about more at some point. Partially inspired by going through this preprint :)

The text was updated successfully, but these errors were encountered:

fedarko added question Further information is requested docs README, tutorials, demos, etc. labels Jul 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss log-ratio generalizability / training and test splits in a tutorial? #317

Discuss log-ratio generalizability / training and test splits in a tutorial? #317

fedarko commented Jul 17, 2021

Discuss log-ratio generalizability / training and test splits in a tutorial? #317

Discuss log-ratio generalizability / training and test splits in a tutorial? #317

Comments

fedarko commented Jul 17, 2021