Use h5ad reference in cell-type-wilms-tumor-06 #902

sjspielman · 2024-11-19T20:04:50Z

Closes #866

This PR updates the cell-type-wilms-tumor-06 module to download the fetal kidney reference used for label transfer as a h5ad object with AnnData, instead of a Seurat object, from CELLxGENE. To implement this, I made the following changes:

analyses/cell-type-wilms-tumor-06/scripts/prepare-fetal-references.R is the script that prepares references for label transfer. Instead of directly reading in a Seurat object, I instead read in the h5ad file and create a Seurat object from its raw counts, while grabbing rownames and metadata needed for label transfer.
- Note that I also removed some arguments from the call to AzimuthReference() that were set at their defaults in the function, so they weren't needed
- I also now export a plain Seurat version of this reference for use in analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd. I re-rendered this notebook too since I had to update some text in it also.
The notebook analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd does the label transfer itself, along with analyses/cell-type-wilms-tumor-06/notebook_template/utils/label-transfer-functions.R. We've had to convert to gene symbols for use with Azimuth-formatted references in the past, but now that we are working directly from an h5ad reference and not a Seurat reference, we actually have ensembl ids! This means we can skip conversion for this label transfer, so I implemented that code here. This is also mean, in the module workflow, we don't need to pass in the homolog file parameter to this notebook anymore.
- However this has consequences! In using ensembl ids, we have a slightly different set of features, which does slightly change the label transfer results. I've run the code and looked at results for a couple samples, and all differences are super minimal and not a cause for concern "scientifically". Logistically is another story - this does mean we'll have to rerun label transfer and regenerate notebooks.... To that end, there are a lot of processing notebooks in this module which could be scripts (01 prepares seurat objects, 02a does the first label transfer, 02b does the second label transfer). Whenever something gets updated, we have to re-generate all these notebooks, and GitHub (reasonably) hates the diff. I wonder if it's worth converting some of these notebooks to scripts to avoid that problem altogether before we re-generate all these results...? We can open more issues for however we decide here. The solution may also be "add a note to the readme that notebooks may not be rendered at the most recent code version"? Is that fishy?

…man/OpenScPCA-analysis into sjspielman/866-use-zellkonverter

allyhawkins

The changes here look good to me. I reviewed the code and made sure the characterization notebook for the reference was re-rendered and that looks good.

In using ensembl ids, we have a slightly different set of features, which does slightly change the label transfer results. I've run the code and looked at results for a couple samples, and all differences are super minimal and not a cause for concern "scientifically".

I do think generally we want to use ensembl IDs over gene symbols when we can and I also don't think it's a good idea to have outdated notebooks that live in the repo. That seems misleading.

I think we should either a) regenerate everything and leave as is with notebooks, b) update to using scripts so we don't have a bunch of notebooks or c) leave as is with gene symbols so we don't regenerate results. I think that depends on how much we want to put into updating this module and I don't know if I should be the person to answer that question. My inclination is that we should at least choose option a to start and then we can decide if we want to put in the effort to update everything to scripts.

jaclyn-taroni · 2024-11-20T15:54:54Z

b) update to using scripts so we don't have a bunch of notebooks

Just to weigh in here - it's probably not this one

sjspielman · 2024-11-20T17:56:23Z

c) leave as is with gene symbols so we don't regenerate results.

This one unfortunately isn't an option, since the h5 file from CELLxGENE doesn't have gene symbols in the metadata for a reliable conversion that would match however Seurat originally did that conversion. I'm wary of an time sink trying to match their exact gene symbol order, for almost no payoff.

a) regenerate everything and leave as is with notebooks

Probably we'll just regenerate everything then, but in a separate PR (or several 🎢 ) since it will do wild things to the GitHub diff. Does this sound ok @allyhawkins ?

allyhawkins · 2024-11-20T18:32:53Z

Probably we'll just regenerate everything then, but in a separate PR (or several 🎢 ) since it will do wild things to the GitHub diff. Does this sound ok @allyhawkins ?

This works for me!

sjspielman · 2024-11-20T18:57:05Z

Issue opened here: #906

allyhawkins

LGTM

sjspielman added 9 commits November 19, 2024 14:43

update script to parse h5ad

c8527c0

update notebook to match

02750e4

allow gene conversion bypass since this reference has ensembl ids

c9847aa

update link

173cc0c

update workflow file to match changes

f7e5a88

Merge branch 'main' into sjspielman/866-use-zellkonverter

9c6e340

render notebook

9961d57

Merge branch 'sjspielman/866-use-zellkonverter' of github.com:sjspiel…

c5d4fd3

…man/OpenScPCA-analysis into sjspielman/866-use-zellkonverter

be explicit with arguments

c2e5a8e

sjspielman marked this pull request as ready for review November 19, 2024 21:06

sjspielman requested a review from jaclyn-taroni as a code owner November 19, 2024 21:06

sjspielman requested review from allyhawkins and removed request for jaclyn-taroni November 19, 2024 21:06

allyhawkins reviewed Nov 20, 2024

View reviewed changes

Merge branch 'main' into sjspielman/866-use-zellkonverter

20573f4

sjspielman mentioned this pull request Nov 20, 2024

Regenerate notebooks for cell-type-wilms-tumor-06 #906

Open

sjspielman requested a review from allyhawkins November 20, 2024 18:57

allyhawkins approved these changes Nov 20, 2024

View reviewed changes

sjspielman merged commit c476b7c into AlexsLemonade:main Nov 20, 2024
3 checks passed

sjspielman deleted the sjspielman/866-use-zellkonverter branch November 20, 2024 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use h5ad reference in cell-type-wilms-tumor-06 #902

Use h5ad reference in cell-type-wilms-tumor-06 #902

sjspielman commented Nov 19, 2024

allyhawkins left a comment

jaclyn-taroni commented Nov 20, 2024

sjspielman commented Nov 20, 2024

allyhawkins commented Nov 20, 2024

sjspielman commented Nov 20, 2024

allyhawkins left a comment

Use h5ad reference in cell-type-wilms-tumor-06 #902

Use h5ad reference in cell-type-wilms-tumor-06 #902

Conversation

sjspielman commented Nov 19, 2024

allyhawkins left a comment

Choose a reason for hiding this comment

jaclyn-taroni commented Nov 20, 2024

sjspielman commented Nov 20, 2024

allyhawkins commented Nov 20, 2024

sjspielman commented Nov 20, 2024

allyhawkins left a comment

Choose a reason for hiding this comment