Draft annotation #844

maud-p · 2024-10-29T23:20:00Z

Purpose/implementation Section

In this PR, I like to make a first draft of annotations for the Wilms tumor 06 dataset

Please link to the GitHub issue that this pull request addresses.

I opened the issue:
#839

What is the goal of this pull request?

To sumarize the analysis performed so far and try to combine them to annotate the Wilms tumor dataset.

Briefly describe the general approach you took to achieve this goal.

The aim is to combine label transfer and CNV inference to annotate Wilms tumor samples in SCPCP000006. The proposed annotation will be based on the combination of:

the label transfer from the fetal kidney reference (Stewart et al.), in particular the fetal_kidney_predicted.compartment and fetal_kidney:predicted.cell_type, as well as the mapping.score for each compartment,

the predicted CNV calculated using intra-sample endothelial and immune cells (--reference both) as normal reference

In a second time, we will explore and validate the chosen annotation.

We will use some of the markers genes to validate visually the annotations.

The analysis can be summarized as the following:

Where cnv.thr and map.thr need to be discussed

first level annotation	second level annotation	selection of the cells	marker genes for validation	cnv validation
normal	endothelial	compartment == "endothelium" & mapping_score > map.thr & cnv_score < cnv.thr	WVF	no cnv
normal	immune	compartment == "immune" & mapping_score > map.thr & cnv_score < cnv.thr	PTPRC, CD163, CD68	no cnv
normal	kidney	cell_type %in% c("kidney cell", "kidney epithelial", "podocyte") & mapping_score > map.thr & cnv_score < cnv.thr	CDH1, PODXL, LTL	no cnv
normal	stroma	compartment == "stroma" & mapping_score > map.thr & cnv_score < cnv.thr	VIM	no cnv
cancer	stroma	compartment == "stroma" & cnv_score > cnv.thr	VIM	proportion_cnv_chr -1 -4 -11 -16 -17 -18
cancer	blastema	compartment == "fetal_nephron" & cell_type == "mesenchymal cell" & cnv_score > cnv.thr	CITED1	proportion_cnv_chr -1 -4 -11 -16 -17 -18
cancer	epithelial	compartment == "fetal_nephron" & cell_type != "mesenchymal cell" & cnv_score > cnv.thr	CDH1	proportion_cnv_chr -1 -4 -11 -16 -17 -18
unknown	-	the rest of the cells	-	proportion_cnv_chr -1 -4 -11 -16 -17 -18

If known, do you anticipate filing additional pull requests to complete this analysis module?

I think quite some points need to be discussed and can be improved or checked in later analyses .

Provide directions for reviewers

I think the present notebook is not completely done, but I wanted to share with you what I have been able to sumarized and explore so far.
Happy to discuss about every steps and how it can be improved.

What are the software and computational requirements needed to be able to run the code in this PR?

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.

Analysis module and review

This analysis module uses the analysis template and has the expected directory structure.
The analysis module README.md has been updated to reflect code changes in this pull request.
The analytical code is documented and contains comments.
Any results and/or plots this code produces have been added to your S3 bucket for review.

Reproducibility checklist

Code in this pull request has been added to the GitHub Action workflow that runs this module.
The dependencies required to run the code in this pull request have been added to the analysis module Dockerfile.
If applicable, the dependencies required to run the code in this pull request have been added to the analysis module conda environment.yml file.
If applicable, R package dependencies required to run the code in this pull request have been added to the analysis module renv.lock file.

sjspielman

Overall I think this is a really nice first draft of annotations! I've left some initial feedback about where I think we can make the code more robust, and some spots where I have questions about the approach.

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

sjspielman · 2024-10-30T13:46:32Z

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

This currently has the same file name as the 04 notebook, except with 07. We definitely want to more clearly distinguish these, so can you rename this one? Maybe like combined annotation across samples, since it's more than just label transfer?

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

sjspielman · 2024-10-30T14:00:52Z

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

+
+```{r fig.width=10, fig.height=10, out.width='100%', results='asis'}
+
+


Again, you'll want to use dplyr::case_when() here

sjspielman · 2024-10-30T14:06:34Z

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

+cell_type_df$first.level_annotation <- "unknown"
+
+
+# Define normal cells


One question I have here (and more generally for the notebook) is whether you want to use the scores when assigning labels based on results from label transfer.

For example, in your first condition here, you check whether a cell is fetal nephron or stroma. Cells which have scores of say 0.1 (aka, not very confident!) but are labeled nephron will be regarded as nephron, but there is an approach where you might say "any cell with a score below has an UNKNOWN compartment", and then not label these cells at all. This would be a separate condition: If the score for the compartment is less than a certain value, then keep that cell as unknown since we don't have reliable label transfer results.

That said, I don't think this matters quite so much for compartment, since those results are more reliable, but it may matter for the cell_type annotations from label transfer which has many more categories.

This is a good idea, thanks!
I would however only apply it to "normal" cells, as we expect cancer cells to have lower predicted.score?

What about having at the end a quick check of the density of the predicted.scores for each of the first/second.level_annotation and filter out some of the annotations with too low confidency?

I would however only apply it to "normal" cells, as we expect cancer cells to have lower predicted.score?

This makes sense to me, but please add a sentence that explicitly says this in the notebook about this expectation. I see you've added something about using the scores for normal cells (great!) so let's add this explanation too for why you don't use them for cells we're calling as cancer.

What about having at the end a quick check of the density of the predicted.scores for each of the first/second.level_annotation and filter out some of the annotations with too low confidency?

I think it would certainly be worth looking at the distribution of scores here, and then we can think about filtering. But, I might open a separate issue for this as something to circle back to after the deadline!

sjspielman · 2024-10-30T14:08:11Z

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

+
+
+```{r fig.width=20, fig.height=20, out.width='100%', results='asis'}
+ggplot(cell_type_df[cell_type_df$first.level_annotation == "normal",], aes( x = umap.umap_1, y = umap.umap_2, color = second.level_annotation), shape = 19, size = 1)+


I'm not sure this is informative to only show normal cells (and similarly below in your next plot to only show cancer cells). Can you explain more of your reasoning for these plots so I understand how they help interpretation?

For most of the cancer types, I guess we are used to have cancer cells that cluster separetly from normal cells.

For Wilms tumor, I think this is however more complicated as cancer cells comprise epithelial, stroma and blastema cancer cells, that have more (transcriptional) similarities with their normal conterparts (i.e. normal kidney epithelium, normal reactive stroma) than between them.

For that reason, I expect for example epithelial cancer and normal cells to be close, if not mixed, in the umap reduction.

I then found easier to visualize cancer from normal cells separatly. But might be actually better to have to two plots side by side for each of the patient.

I think what might help here is plotting a strategy I have used before to highlight cells in a UMAP - you can make all cells light gray, and then on top of that add a layer of your cells of interest that are colored. This way, you can clearly see the cells you care about, but still see the full context of the UMAP.

Here's an example of how you might code something like this:

# data frame that only contains points of interest subsetted_iris <- iris |> filter(Species == "versicolor") ggplot(iris) + aes(x = Sepal.Length, y = Sepal.Width) + geom_point(color = "gray") + # add layer with points of interest colored geom_point( data = subsetted_iris, aes(color = Species) )

Again though, this might be something to do later after the deadline!

sjspielman · 2024-10-30T14:08:31Z

analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd

+
+```{r fig.width=10, fig.height=10, out.width='100%', results='asis'}
+
+cell_type_df$second.level_annotation[  cell_type_df$compartment %in% c("stroma") &


Again, please use case_when

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

maud-p · 2024-10-30T23:21:14Z

Thank you @sjspielman for looking into it! I just pushed the few changes. Thank you for the case_when suggestion, I didn't know it. I should definitly use dplyr more 😃
Let me know if something/answer are not clear !
Thanks!

sjspielman

Thank you for the case_when suggestion

case_when is so useful, glad I could introduce you to it 😄

I left a few comments about other areas I think this can be improved, but overall I think this notebook is a great "first draft" of your annotations and you should plan (if you want to still contribute!) to do some of the additional code changes later since you may not have time before the deadline. In this case, I would encourage you to open an issue where you can track future updates to this notebook. You can always copy and paste some of my comments there to help write the issue, too! Another benefit to writing this issue is just so everyone knows there is potentially more work planned for this module (even if that work doesn't happen, that's ok! we still will have the record of discussing it!).

For now, here's what we need at least:

Please make the corresponding change to fix the has_cnv_score "bug" in the 06 notebook, so it uses <= and > instead of :.
I do not see any code that actually runs inferCNV across all samples, only the 5 we have explored more in depth. This needs to be part of the workflow. It's clear that you've run the code, so maybe it just isn't committed?
This notebook needs to be added to the workflow and to the module's README file
Remove the old HTML that is still in this PR (from before you renamed the notebook)
Have this notebook export a TSV file of draft annotations that meets these guidelines: https://openscpca.readthedocs.io/en/latest/grant-opportunities/#submission-acceptance-criteria. You can still have your first-level and second-level annotations, but we'll want the columns described in the link above too. Since you did not actually use marker genes for the annotation, just to explore annotation and do a little bit of validation, you won't need to make that second TSV described in the link.

sjspielman · 2024-10-31T18:19:52Z

@maud-p in the interest of time given the deadline, I'm going to go ahead and push code to your branch that addresses a couple of my reviews, including:

I do not see any code that actually runs inferCNV across all samples, only the 5 we have explored more in depth. This needs to be part of the workflow. It's clear that you've run the code, so maybe it just isn't committed?

This notebook needs to be added to the workflow and to the module's README file

Remove the old HTML that is still in this PR (from before you renamed the notebook)

Have this notebook export a TSV file of draft annotations that meets these guidelines: https://openscpca.readthedocs.io/en/latest/grant-opportunities/#submission-acceptance-criteria. You can still have your first-level and second-level annotations, but we'll want the columns described in the link above too. Since you did not actually use marker genes for the annotation, just to explore annotation and do a little bit of validation, you won't need to make that second TSV described in the link.

Then, I will be able to approve this PR which will hopefully make your results eligible in time :)

…both' if they dont have both immune and epithelial

…mal cell if a reference is being used. also, added seed which was missing

maud-p · 2024-10-31T21:53:05Z

@sjspielman thank you so much for your help!
I am trying to catch up on the review of this PR, but I see you are really advanced in the changes!
I think what is remaining to do is the README.md file update and the final annotation tsv file, I will start with this now, is that OK?
Or would it introduce conflicts?
Thank you again so much , really appreciated :)

sjspielman · 2024-10-31T21:55:14Z

Or would it introduce conflicts?

It would definitely introduce conflicts, since along the way I have caught a few bugs and am addressing them too. I will report back here tomorrow with the full details, since I'm still working on it, but I will handle the TSV from here!

maud-p · 2024-10-31T22:02:15Z

Or would it introduce conflicts?

It would definitely introduce conflicts, since along the way I have caught a few bugs and am addressing them too. I will report back here tomorrow with the full details, since I'm still working on it, but I will handle the TSV from here!

OK thank you very much! Don't hesitate to let me know at the end of your working day what/if I can continue tomorrow morning (CEST time)!
Thank you!

sjspielman · 2024-11-01T13:46:57Z

I have implemented the following changes:

Updated documentation in the module to reflect the 07 notebook
Updated the 07 notebook to export a properly-formatted TSV of annotations
Fixed the 0:threshold bug in the 06 notebook
Fixed a few debugs for the inferCNV script which were previously missed because all samples we had run through had normal cells for a reference
Updated 00_run_workflow.R:
- Added step to process all samples, where currently possible, through inferCNV with HMM i3 and "both" reference
- Added step to render the 07 notebook
- Fixed the for loops to only loop over relevant samples and not duplicate samples

This PR can therefore be approved! 🎉

These are the additional review comments which have not been implemented:

@maud-p, you may wish to open a new issue about addressing these comments or other future steps that you might be interested in doing in the future! But, we don't want them in this PR since it's being approved, and we'd like to merge it in to meet the deadline. Either way, thank you again for all your time and effort to get this draft of annotations done 🎆 🥳 !!!

…, so we can get some draft annotations for it

maud-p · 2024-11-01T14:30:16Z

@sjspielman thank you so much for all your reviews, advices etc. I am really happy about the job we did together 🥳 thanks +++ for your great great help these last days to meet the deadline!!!!
I'll open the next issue on Monday/Tuesday, I'd like to take the time to think and summarize what/how I like to pursue this analysis. But definitly I'd like to continue 😃
Thank you!!!

sjspielman · 2024-11-01T14:37:01Z

Noting that I was also able to get one more sample running through inferCNV, so now only 1 sample remains unannotated and it's probably due to some cryptic bug in inferCNV which can be investigated in the future.

…he missing annotations, and updated code accordingly to run this sample

sjspielman · 2024-11-01T15:22:26Z

Alright, I was able to get the last sample working!! All samples now have a draft annotation 🎉

maud-p · 2024-11-01T15:33:21Z

Alright, I was able to get the last sample working!! All samples now have a draft annotation 🎉

Thank you!!!!

Draft annotation

b6256b2

maud-p requested a review from jaclyn-taroni as a code owner October 29, 2024 23:20

jaclyn-taroni requested review from sjspielman and removed request for jaclyn-taroni October 30, 2024 09:51

sjspielman reviewed Oct 30, 2024

View reviewed changes

maud-p and others added 5 commits October 30, 2024 22:48

Update analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Acros…

f64394d

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Acros…

aecb774

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Acros…

ed4384a

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

Update analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Acros…

3688208

…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>

changes to PR844

52ba1d1

maud-p requested a review from sjspielman October 30, 2024 23:21

sjspielman reviewed Oct 31, 2024

View reviewed changes

sjspielman added 10 commits October 31, 2024 14:23

Add infercnv and 07 notebook steps to the workflow script

8d091a5

merge main

304d23d

remove workflow script that ended up in top level directory

8501e4b

Remove old notebook

40dfff0

the notebook uses both, not pull

f4049e6

ensure unique only to prevent duplicate runs

0c8eb7f

fix bug in infercnv script that prevented samples from running with '…

5fa6b09

…both' if they dont have both immune and epithelial

add additional check needed for infercnv that there is at least 1 nor…

a0d3a3a

…mal cell if a reference is being used. also, added seed which was missing

fix order of logic

c2eda32

update code to create a final table of annotations

4f40d13

sjspielman added 3 commits November 1, 2024 09:12

add recommended option

bba0816

some README updates

eba9640

fix colon bug in 06 notebook

aeb2566

sjspielman added 4 commits November 1, 2024 09:23

only loop over relevant sample ids, and clean up the 07 notebook steps

5fef7d9

render final notebook

f39a3fd

update notebook readme

d9f0bf2

Merge branch 'main' into 06_updated_branch2

97e2790

sjspielman self-requested a review November 1, 2024 13:47

sjspielman approved these changes Nov 1, 2024

View reviewed changes

Include infercnv results for SCPCS000190 by using none as a reference…

0fea762

…, so we can get some draft annotations for it

add seed even though not used, so include note to that effect

09e3970

sjspielman added 3 commits November 1, 2024 10:52

need dashes

447a0a6

Found infercnv argument to get infercnv running for the sample with t…

742e77e

…he missing annotations, and updated code accordingly to run this sample

only map over relevant sample ids, and knit updated notebook

79ceef2

Merge branch 'main' into 06_updated_branch2

612ab84

sjspielman merged commit 2e289a2 into AlexsLemonade:main Nov 1, 2024
3 checks passed

maud-p mentioned this pull request Nov 5, 2024

Improve Wilms Tumor Dataset Annotation (SCPCP000006) - explore predicted.score and has_cnv.score thresholds #856

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft annotation #844

Draft annotation #844

maud-p commented Oct 29, 2024

sjspielman left a comment

sjspielman Oct 30, 2024

sjspielman Oct 30, 2024

sjspielman Oct 30, 2024

maud-p Oct 30, 2024

sjspielman Oct 31, 2024

sjspielman Oct 31, 2024

sjspielman Oct 30, 2024

maud-p Oct 30, 2024

sjspielman Oct 31, 2024

sjspielman Oct 30, 2024

maud-p commented Oct 30, 2024

sjspielman left a comment

sjspielman commented Oct 31, 2024

maud-p commented Oct 31, 2024

sjspielman commented Oct 31, 2024

maud-p commented Oct 31, 2024

sjspielman commented Nov 1, 2024

maud-p commented Nov 1, 2024

sjspielman commented Nov 1, 2024

sjspielman commented Nov 1, 2024

maud-p commented Nov 1, 2024


		```{r fig.width=10, fig.height=10, out.width='100%', results='asis'}

		cell_type_df$first.level_annotation <- "unknown"


		# Define normal cells



		```{r fig.width=20, fig.height=20, out.width='100%', results='asis'}
		ggplot(cell_type_df[cell_type_df$first.level_annotation == "normal",], aes( x = umap.umap_1, y = umap.umap_2, color = second.level_annotation), shape = 19, size = 1)+


		```{r fig.width=10, fig.height=10, out.width='100%', results='asis'}

		cell_type_df$second.level_annotation[ cell_type_df$compartment %in% c("stroma") &

Draft annotation #844

Draft annotation #844

Conversation

maud-p commented Oct 29, 2024

Purpose/implementation Section

Please link to the GitHub issue that this pull request addresses.

What is the goal of this pull request?

Briefly describe the general approach you took to achieve this goal.

If known, do you anticipate filing additional pull requests to complete this analysis module?

Provide directions for reviewers

What are the software and computational requirements needed to be able to run the code in this PR?

Are there particularly areas you'd like reviewers to have a close look at?

Is there anything that you want to discuss further?

Author checklists

Analysis module and review

Reproducibility checklist

sjspielman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maud-p commented Oct 30, 2024

sjspielman left a comment

Choose a reason for hiding this comment

sjspielman commented Oct 31, 2024

maud-p commented Oct 31, 2024

sjspielman commented Oct 31, 2024

maud-p commented Oct 31, 2024

sjspielman commented Nov 1, 2024

maud-p commented Nov 1, 2024

sjspielman commented Nov 1, 2024

sjspielman commented Nov 1, 2024

maud-p commented Nov 1, 2024