-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft annotation #844
Draft annotation #844
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is a really nice first draft of annotations! I've left some initial feedback about where I think we can make the code more robust, and some spots where I have questions about the approach.
analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently has the same file name as the 04
notebook, except with 07
. We definitely want to more clearly distinguish these, so can you rename this one? Maybe like combined annotation across samples, since it's more than just label transfer?
analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
analyses/cell-type-wilms-tumor-06/notebook/07_annotation_Across_Samples_exploration.Rmd
Outdated
Show resolved
Hide resolved
|
||
```{r fig.width=10, fig.height=10, out.width='100%', results='asis'} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you'll want to use dplyr::case_when()
here
cell_type_df$first.level_annotation <- "unknown" | ||
|
||
|
||
# Define normal cells |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I have here (and more generally for the notebook) is whether you want to use the scores when assigning labels based on results from label transfer.
For example, in your first condition here, you check whether a cell is fetal nephron or stroma. Cells which have scores of say 0.1 (aka, not very confident!) but are labeled nephron will be regarded as nephron, but there is an approach where you might say "any cell with a score below has an UNKNOWN compartment", and then not label these cells at all. This would be a separate condition: If the score for the compartment is less than a certain value, then keep that cell as unknown since we don't have reliable label transfer results.
That said, I don't think this matters quite so much for compartment
, since those results are more reliable, but it may matter for the cell_type
annotations from label transfer which has many more categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea, thanks!
I would however only apply it to "normal" cells, as we expect cancer cells to have lower predicted.score
?
What about having at the end a quick check of the density of the predicted.scores
for each of the first/second.level_annotation
and filter out some of the annotations with too low confidency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would however only apply it to "normal" cells, as we expect cancer cells to have lower predicted.score?
This makes sense to me, but please add a sentence that explicitly says this in the notebook about this expectation. I see you've added something about using the scores for normal cells (great!) so let's add this explanation too for why you don't use them for cells we're calling as cancer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about having at the end a quick check of the density of the predicted.scores for each of the first/second.level_annotation and filter out some of the annotations with too low confidency?
I think it would certainly be worth looking at the distribution of scores here, and then we can think about filtering. But, I might open a separate issue for this as something to circle back to after the deadline!
|
||
|
||
```{r fig.width=20, fig.height=20, out.width='100%', results='asis'} | ||
ggplot(cell_type_df[cell_type_df$first.level_annotation == "normal",], aes( x = umap.umap_1, y = umap.umap_2, color = second.level_annotation), shape = 19, size = 1)+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is informative to only show normal cells (and similarly below in your next plot to only show cancer cells). Can you explain more of your reasoning for these plots so I understand how they help interpretation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of the cancer types, I guess we are used to have cancer cells that cluster separetly from normal cells.
For Wilms tumor, I think this is however more complicated as cancer cells comprise epithelial, stroma and blastema cancer cells, that have more (transcriptional) similarities with their normal conterparts (i.e. normal kidney epithelium, normal reactive stroma) than between them.
For that reason, I expect for example epithelial cancer and normal cells to be close, if not mixed, in the umap
reduction.
I then found easier to visualize cancer from normal cells separatly. But might be actually better to have to two plots side by side for each of the patient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what might help here is plotting a strategy I have used before to highlight cells in a UMAP - you can make all cells light gray, and then on top of that add a layer of your cells of interest that are colored. This way, you can clearly see the cells you care about, but still see the full context of the UMAP.
Here's an example of how you might code something like this:
# data frame that only contains points of interest
subsetted_iris <- iris |>
filter(Species == "versicolor")
ggplot(iris) +
aes(x = Sepal.Length, y = Sepal.Width) +
geom_point(color = "gray") +
# add layer with points of interest colored
geom_point(
data = subsetted_iris,
aes(color = Species)
)
Again though, this might be something to do later after the deadline!
|
||
```{r fig.width=10, fig.height=10, out.width='100%', results='asis'} | ||
|
||
cell_type_df$second.level_annotation[ cell_type_df$compartment %in% c("stroma") & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, please use case_when
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
…s_Samples_exploration.Rmd Co-authored-by: Stephanie Spielman <[email protected]>
Thank you @sjspielman for looking into it! I just pushed the few changes. Thank you for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the case_when suggestion
case_when
is so useful, glad I could introduce you to it 😄
I left a few comments about other areas I think this can be improved, but overall I think this notebook is a great "first draft" of your annotations and you should plan (if you want to still contribute!) to do some of the additional code changes later since you may not have time before the deadline. In this case, I would encourage you to open an issue where you can track future updates to this notebook. You can always copy and paste some of my comments there to help write the issue, too! Another benefit to writing this issue is just so everyone knows there is potentially more work planned for this module (even if that work doesn't happen, that's ok! we still will have the record of discussing it!).
For now, here's what we need at least:
- Please make the corresponding change to fix the
has_cnv_score
"bug" in the06
notebook, so it uses<=
and>
instead of:
. - I do not see any code that actually runs inferCNV across all samples, only the 5 we have explored more in depth. This needs to be part of the workflow. It's clear that you've run the code, so maybe it just isn't committed?
- This notebook needs to be added to the workflow and to the module's README file
- Remove the old HTML that is still in this PR (from before you renamed the notebook)
- Have this notebook export a TSV file of draft annotations that meets these guidelines: https://openscpca.readthedocs.io/en/latest/grant-opportunities/#submission-acceptance-criteria. You can still have your first-level and second-level annotations, but we'll want the columns described in the link above too. Since you did not actually use marker genes for the annotation, just to explore annotation and do a little bit of validation, you won't need to make that second TSV described in the link.
@maud-p in the interest of time given the deadline, I'm going to go ahead and push code to your branch that addresses a couple of my reviews, including:
Then, I will be able to approve this PR which will hopefully make your results eligible in time :) |
…both' if they dont have both immune and epithelial
…mal cell if a reference is being used. also, added seed which was missing
@sjspielman thank you so much for your help! |
It would definitely introduce conflicts, since along the way I have caught a few bugs and am addressing them too. I will report back here tomorrow with the full details, since I'm still working on it, but I will handle the TSV from here! |
OK thank you very much! Don't hesitate to let me know at the end of your working day what/if I can continue tomorrow morning (CEST time)! |
I have implemented the following changes:
This PR can therefore be approved! 🎉 These are the additional review comments which have not been implemented: @maud-p, you may wish to open a new issue about addressing these comments or other future steps that you might be interested in doing in the future! But, we don't want them in this PR since it's being approved, and we'd like to merge it in to meet the deadline. Either way, thank you again for all your time and effort to get this draft of annotations done 🎆 🥳 !!! |
…, so we can get some draft annotations for it
@sjspielman thank you so much for all your reviews, advices etc. I am really happy about the job we did together 🥳 thanks +++ for your great great help these last days to meet the deadline!!!! |
Noting that I was also able to get one more sample running through inferCNV, so now only 1 sample remains unannotated and it's probably due to some cryptic bug in inferCNV which can be investigated in the future. |
…he missing annotations, and updated code accordingly to run this sample
Alright, I was able to get the last sample working!! All samples now have a draft annotation 🎉 |
Thank you!!!! |
Purpose/implementation Section
In this PR, I like to make a first draft of annotations for the Wilms tumor 06 dataset
Please link to the GitHub issue that this pull request addresses.
I opened the issue:
#839
What is the goal of this pull request?
To sumarize the analysis performed so far and try to combine them to annotate the Wilms tumor dataset.
Briefly describe the general approach you took to achieve this goal.
The aim is to combine label transfer and CNV inference to annotate Wilms tumor samples in SCPCP000006. The proposed annotation will be based on the combination of:
the label transfer from the fetal kidney reference (Stewart et al.), in particular the fetal_kidney_predicted.compartment and fetal_kidney:predicted.cell_type, as well as the mapping.score for each compartment,
the predicted CNV calculated using intra-sample endothelial and immune cells (--reference both) as normal reference
In a second time, we will explore and validate the chosen annotation.
We will use some of the markers genes to validate visually the annotations.
The analysis can be summarized as the following:
Where cnv.thr and map.thr need to be discussed
<style> </style>If known, do you anticipate filing additional pull requests to complete this analysis module?
I think quite some points need to be discussed and can be improved or checked in later analyses .
Provide directions for reviewers
I think the present notebook is not completely done, but I wanted to share with you what I have been able to sumarized and explore so far.
Happy to discuss about every steps and how it can be improved.
What are the software and computational requirements needed to be able to run the code in this PR?
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
Author checklists
Check all those that apply.
Note that you may find it easier to check off these items after the pull request is actually filed.
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.