-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use h5ad reference in cell-type-wilms-tumor-06 #902
Use h5ad reference in cell-type-wilms-tumor-06 #902
Conversation
…man/OpenScPCA-analysis into sjspielman/866-use-zellkonverter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here look good to me. I reviewed the code and made sure the characterization notebook for the reference was re-rendered and that looks good.
In using ensembl ids, we have a slightly different set of features, which does slightly change the label transfer results. I've run the code and looked at results for a couple samples, and all differences are super minimal and not a cause for concern "scientifically".
I do think generally we want to use ensembl IDs over gene symbols when we can and I also don't think it's a good idea to have outdated notebooks that live in the repo. That seems misleading.
I think we should either a) regenerate everything and leave as is with notebooks, b) update to using scripts so we don't have a bunch of notebooks or c) leave as is with gene symbols so we don't regenerate results. I think that depends on how much we want to put into updating this module and I don't know if I should be the person to answer that question. My inclination is that we should at least choose option a to start and then we can decide if we want to put in the effort to update everything to scripts.
Just to weigh in here - it's probably not this one |
This one unfortunately isn't an option, since the h5 file from CELLxGENE doesn't have gene symbols in the metadata for a reliable conversion that would match however Seurat originally did that conversion. I'm wary of an time sink trying to match their exact gene symbol order, for almost no payoff.
Probably we'll just regenerate everything then, but in a separate PR (or several 🎢 ) since it will do wild things to the GitHub diff. Does this sound ok @allyhawkins ? |
This works for me! |
Issue opened here: #906 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Closes #866
This PR updates the
cell-type-wilms-tumor-06
module to download the fetal kidney reference used for label transfer as a h5ad object with AnnData, instead of a Seurat object, from CELLxGENE. To implement this, I made the following changes:analyses/cell-type-wilms-tumor-06/scripts/prepare-fetal-references.R
is the script that prepares references for label transfer. Instead of directly reading in a Seurat object, I instead read in the h5ad file and create a Seurat object from its raw counts, while grabbing rownames and metadata needed for label transfer.AzimuthReference()
that were set at their defaults in the function, so they weren't neededanalyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd
. I re-rendered this notebook too since I had to update some text in it also.analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd
does the label transfer itself, along withanalyses/cell-type-wilms-tumor-06/notebook_template/utils/label-transfer-functions.R
. We've had to convert to gene symbols for use with Azimuth-formatted references in the past, but now that we are working directly from an h5ad reference and not a Seurat reference, we actually have ensembl ids! This means we can skip conversion for this label transfer, so I implemented that code here. This is also mean, in the module workflow, we don't need to pass in the homolog file parameter to this notebook anymore.