Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use h5ad reference in cell-type-wilms-tumor-06 #902

Merged

Conversation

sjspielman
Copy link
Member

Closes #866

This PR updates the cell-type-wilms-tumor-06 module to download the fetal kidney reference used for label transfer as a h5ad object with AnnData, instead of a Seurat object, from CELLxGENE. To implement this, I made the following changes:

  • analyses/cell-type-wilms-tumor-06/scripts/prepare-fetal-references.R is the script that prepares references for label transfer. Instead of directly reading in a Seurat object, I instead read in the h5ad file and create a Seurat object from its raw counts, while grabbing rownames and metadata needed for label transfer.
    • Note that I also removed some arguments from the call to AzimuthReference() that were set at their defaults in the function, so they weren't needed
    • I also now export a plain Seurat version of this reference for use in analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd. I re-rendered this notebook too since I had to update some text in it also.
  • The notebook analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd does the label transfer itself, along with analyses/cell-type-wilms-tumor-06/notebook_template/utils/label-transfer-functions.R. We've had to convert to gene symbols for use with Azimuth-formatted references in the past, but now that we are working directly from an h5ad reference and not a Seurat reference, we actually have ensembl ids! This means we can skip conversion for this label transfer, so I implemented that code here. This is also mean, in the module workflow, we don't need to pass in the homolog file parameter to this notebook anymore.
    • However this has consequences! In using ensembl ids, we have a slightly different set of features, which does slightly change the label transfer results. I've run the code and looked at results for a couple samples, and all differences are super minimal and not a cause for concern "scientifically". Logistically is another story - this does mean we'll have to rerun label transfer and regenerate notebooks.... To that end, there are a lot of processing notebooks in this module which could be scripts (01 prepares seurat objects, 02a does the first label transfer, 02b does the second label transfer). Whenever something gets updated, we have to re-generate all these notebooks, and GitHub (reasonably) hates the diff. I wonder if it's worth converting some of these notebooks to scripts to avoid that problem altogether before we re-generate all these results...? We can open more issues for however we decide here. The solution may also be "add a note to the readme that notebooks may not be rendered at the most recent code version"? Is that fishy?

@sjspielman sjspielman marked this pull request as ready for review November 19, 2024 21:06
@sjspielman sjspielman requested review from allyhawkins and removed request for jaclyn-taroni November 19, 2024 21:06
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look good to me. I reviewed the code and made sure the characterization notebook for the reference was re-rendered and that looks good.

In using ensembl ids, we have a slightly different set of features, which does slightly change the label transfer results. I've run the code and looked at results for a couple samples, and all differences are super minimal and not a cause for concern "scientifically".

I do think generally we want to use ensembl IDs over gene symbols when we can and I also don't think it's a good idea to have outdated notebooks that live in the repo. That seems misleading.

I think we should either a) regenerate everything and leave as is with notebooks, b) update to using scripts so we don't have a bunch of notebooks or c) leave as is with gene symbols so we don't regenerate results. I think that depends on how much we want to put into updating this module and I don't know if I should be the person to answer that question. My inclination is that we should at least choose option a to start and then we can decide if we want to put in the effort to update everything to scripts.

@jaclyn-taroni
Copy link
Member

b) update to using scripts so we don't have a bunch of notebooks

Just to weigh in here - it's probably not this one

@sjspielman
Copy link
Member Author

c) leave as is with gene symbols so we don't regenerate results.

This one unfortunately isn't an option, since the h5 file from CELLxGENE doesn't have gene symbols in the metadata for a reliable conversion that would match however Seurat originally did that conversion. I'm wary of an time sink trying to match their exact gene symbol order, for almost no payoff.

a) regenerate everything and leave as is with notebooks

Probably we'll just regenerate everything then, but in a separate PR (or several 🎢 ) since it will do wild things to the GitHub diff. Does this sound ok @allyhawkins ?

@allyhawkins
Copy link
Member

Probably we'll just regenerate everything then, but in a separate PR (or several 🎢 ) since it will do wild things to the GitHub diff. Does this sound ok @allyhawkins ?

This works for me!

@sjspielman
Copy link
Member Author

Issue opened here: #906

Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sjspielman sjspielman merged commit c476b7c into AlexsLemonade:main Nov 20, 2024
3 checks passed
@sjspielman sjspielman deleted the sjspielman/866-use-zellkonverter branch November 20, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update wilms-06 kidney reference link
3 participants