diff --git a/.github/components/dictionary.txt b/.github/components/dictionary.txt index a1ecebf06..2b69193a7 100644 --- a/.github/components/dictionary.txt +++ b/.github/components/dictionary.txt @@ -29,6 +29,7 @@ Cao Carpentries CellAssign CELLxGENE +chemotherapies chondrocytes chr CLI @@ -50,6 +51,7 @@ demultiplexing dendogram derangements designee +diploidy discoverable DM DNT @@ -107,6 +109,7 @@ immunities impactful indicia InferCNV +intra Jaccard Jitter JSON @@ -160,7 +163,10 @@ overclustered Panglao PanglaoDB PDX +peritubular pluripotent +PMID +PNG podman podocyte Posit @@ -175,6 +181,7 @@ README redistribution redistributions renv +repartition repo reproducibility reproducibly @@ -190,6 +197,7 @@ SCE ScPCA SCPCP scRNA +scRNAseq scrublet SEACells SemVar @@ -197,19 +205,23 @@ seq SingleR snRNA socio +Spearman SSO +stemness stroma stromal Stumptown subdiagnosis sublicensable subtypes +subunits symlink symlinked synched TBD Tirode trainings +transcriptional transferrable transphobic Treg diff --git a/analyses/cell-type-wilms-tumor-06/README.md b/analyses/cell-type-wilms-tumor-06/README.md index 1aad00a03..d26f1af77 100644 --- a/analyses/cell-type-wilms-tumor-06/README.md +++ b/analyses/cell-type-wilms-tumor-06/README.md @@ -9,20 +9,20 @@ Each of these groups is composed of the blastemal, epithelial, and stromal popul Here, we first aim to annotate the Wilms Tumor snRNA-seq samples in the SCPCP000006 (n=40) dataset. To do so we will: -• Provide annotations of normal cells composing the kidney, including normal kidney epithelium, endothelium, stroma and immune cells +- Provide annotations of normal cells composing the kidney, including normal kidney epithelium, endothelium, stroma and immune cells -• Provide annotations of tumor cell populations that may be present in the WT samples, including blastemal, epithelial, and stromal populations of cancer cells +- Provide annotations of tumor cell populations that may be present in the WT samples, including blastemal, epithelial, and stromal populations of cancer cells Based on the provided annotation, we would like to additionally provide a reference of marker genes for the three cancer cell populations, which is so far lacking for the WT community. The analysis is/will be divided as the following: - [x] Metadata file: compilation of a metadata file of marker genes for expected cell types that will be used for validation at a later step - [x] Script: clustering of cells across a set of parameters for few samples -- [x] Script: label transfer from the fetal kidney atlas reference using runAzimuth -- [x] Script: run copykat and inferCNV +- [x] Script: label transfer from the fetal kidney atlas reference using an Azimuth-adapted approach +- [x] Script: run `copyKat` and `inferCNV` - [x] Notebook: explore results from steps 2 to 4 for about 5 to 10 samples -- [x] Script: Run inferCNV for all samples -- [x] Notebook: explore results from step 6, integrate all samples together and annotate the dataset using (i) metadatafile, (ii) CNV information, (iii) label transfer information +- [x] Script: Run `inferCNV` for all samples +- [x] Notebook: explore results from step 6 and annotate the dataset using (i) metadata file, (ii) CNV information, (iii) label transfer information ## Usage @@ -45,7 +45,7 @@ Of note, this requires AWS CLI setup to run as intended: https://openscpca.readt ```shell ../../download-data.py --projects SCPCP000006 ``` -This is saving the data in OpenScPCA-analysis/data/current/SCPCP000006 +This is saving the data in `OpenScPCA-analysis/data/current/SCPCP000006` 4. Run the module: ```shell @@ -61,15 +61,15 @@ Some information can be helpful for annotation and validation: We expect few changes between the 2 conditions, including a higher immune infiltration and more DNA damages pathways in treated samples. - histology: the COG classifies Wilms tumor as either (i) Favorable or (ii) Anaplastic. -Some differenices are expected, some marker genes or pathways are associated with anaplasia (see sets of marker gene). +Some differences are expected, some marker genes or pathways are associated with anaplasia (see sets of marker gene). ## Output files for each of the steps, we have two types of `output`: -- the `notebook` saved in the `notebook` directory, with a subfolder for each sample. +- the `notebook` saved in the `notebook` directory, with a folder for each sample. -- the created objects saved in `results` directory, with a subfolder for each sample. +- the created objects saved in `results` directory, with a folder for each sample. # Analysis @@ -77,9 +77,9 @@ for each of the steps, we have two types of `output`: ## Marker sets We first build a resource for later validation of the annotated cell types. -We gather from the litterature marker genes and specific genomic alterations that could help us characterizing the Wilms tumor ecosystem, including cancer and non-cancer cells. +We gather from the literature marker genes and specific genomic alterations that could help us characterizing the Wilms tumor ecosystem, including cancer and non-cancer cells. -### The table CellType_metadata.csv contains the following column and information: +### The table `CellType_metadata.csv` contains the following column and information: - "gene_symbol" contains the symbol of the described gene, using the HUGO Gene Nomenclature - ENSEMBL_ID contains the stable identifier from the ENSEMBL database @@ -90,31 +90,31 @@ We gather from the litterature marker genes and specific genomic alterations tha |gene_symbol|ENSEMBL_ID|cell_class|cell_type|DOI|comment| |---|---|---|---|---|---| - |WT1|ENSG00000184937|malignant|cancer_cell|10.1242/dev.153163|Tumor_suppressor_WT1_is_lost_in_some_WT_cells| - |IGF2|ENSG00000167244|malignant|cancer_cell|10.1038/ng1293-408|NA| - |TP53|ENSG00000141510|malignant|anaplastic|10.1158/1078-0432.CCR-16-0985|Might_also_be_in_small_non_anaplastic_subset| - |MYCN|ENSG00000134323|malignant|anaplastic|10.18632/oncotarget.3377|Also_in_non_anaplastic_poor_outcome| - |MAX|ENSG00000125952|malignant|anaplastic|10.1016/j.ccell.2015.01.002|Also_in_non_anaplastic_poor_outcome| - |SIX1|ENSG00000126778|malignant|blastema|10.1016/j.ccell.2015.01.002|NA| - |SIX2|ENSG00000170577|malignant|blastema|10.1016/j.ccell.2015.01.002|NA| - |CITED1|ENSG00000125931|malignant|blastema|10.1593/neo.07358|Also_in_embryonic_kidney| - |PTPRC|ENSG00000081237|immune|NA|10.1101/gr.273300.120|NA| - |CD68|ENSG00000129226|immune|myeloid|10.1186/1746-1596-7-12|NA| - |CD163|ENSG00000177575|immune|macrophage|10.1186/1746-1596-7-12|NA| - |VWF|ENSG00000110799|endothelium|endothelium|10.1134/S1990747819030140|NA| - |CD3E|ENSG00000198851|immune|T_cell|10.1101/gr.273300.120|NA| - |MS4A1|ENSG00000156738|immune|B_cell|10.1101/gr.273300.120|NA| - |FOXP3|ENSG00000049768|immune|T_cell|10.1101/gr.273300.120|Treg| - |CD4|ENSG00000010610|immune|T_cell|10.1101/gr.273300.120|NA| - |CD8A|ENSG00000153563|immune|T_cell|10.1101/gr.273300.120|NA| - |EPCAM|ENSG00000119888|NA|epithelial|10.1016/j.stemcr.2014.05.013|epithelial_malignant_and_non_malignant| - |NCAM1|ENSG00000149294|malignant|blastema|10.1016/j.stemcr.2014.05.013|might_also_be_expressed_in_non_malignant| - |PODXL|ENSG00000128567|non-malignant|podocyte|10.1016/j.stem.2019.06.009|NA| - |COL6A3|ENSG00000163359|malignant|mesenchymal|10.2147/OTT.S256654|might_also_be_expressed_in_non_malignant_stroma| - |THY1|ENSG00000154096|malignant|mesenchymal|10.1093/hmg/ddq042|might_also_be_expressed_in_non_malignant_stroma| - - -### The table GeneticAlterations_metadata.csv contains the following column and information: + |`WT1`|`ENSG00000184937`|malignant|cancer_cell|`10.1242/dev.153163`|Tumor_suppressor_WT1_is_lost_in_some_WT_cells| + |`IGF2`|`ENSG00000167244`|malignant|cancer_cell|`10.1038/ng1293-408`|NA| + |`TP53`|`ENSG00000141510`|malignant|anaplastic|`10.1158/1078-0432.CCR-16-0985`|Might_also_be_in_small_non_anaplastic_subset| + |`MYCN`|`ENSG00000134323`|malignant|anaplastic|`10.18632/oncotarget.3377`|Also_in_non_anaplastic_poor_outcome| + |`MAX`|`ENSG00000125952`|malignant|anaplastic|`10.1016/j.ccell.2015.01.002`|Also_in_non_anaplastic_poor_outcome| + |`SIX1`|`ENSG00000126778`|malignant|blastema|`10.1016/j.ccell.2015.01.002`|NA| + |`SIX2`|`ENSG00000170577`|malignant|blastema|`10.1016/j.ccell.2015.01.002`|NA| + |`CITED1`|`ENSG00000125931`|malignant|blastema|`10.1593/neo.07358`|Also_in_embryonic_kidney| + |`PTPRC`|`ENSG00000081237`|immune|NA|`10.1101/gr.273300.120`|NA| + |`CD68`|`ENSG00000129226`|immune|myeloid|`10.1186/1746-1596-7-12`|NA| + |`CD163`|`ENSG00000177575`|immune|macrophage|`10.1186/1746-1596-7-12`|NA| + |`VWF`|`ENSG00000110799`|endothelium|endothelium|`10.1134/S1990747819030140`|NA| + |`CD3E`|`ENSG00000198851`|immune|T_cell|`10.1101/gr.273300.120`|NA| + |`MS4A1`|`ENSG00000156738`|immune|B_cell|`10.1101/gr.273300.120`|NA| + |`FOXP3`|`ENSG00000049768`|immune|T_cell|`10.1101/gr.273300.120`|Treg| + |`CD4`|`ENSG00000010610`|immune|T_cell|`10.1101/gr.273300.120`|NA| + |`CD8A`|`ENSG00000153563`|immune|T_cell|`10.1101/gr.273300.120`|NA| + |`EPCAM`|`ENSG00000119888`|NA|epithelial|`10.1016/j.stemcr.2014.05.013`|epithelial_malignant_and_non_malignant| + |`NCAM1`|`ENSG00000149294`|malignant|blastema|`10.1016/j.stemcr.2014.05.013`|might_also_be_expressed_in_non_malignant| + |`PODXL`|`ENSG00000128567`|non-malignant|podocyte|`10.1016/j.stem.2019.06.009`|NA| + |`COL6A3`|`ENSG00000163359`|malignant|mesenchymal|`10.2147/OTT.S256654`|might_also_be_expressed_in_non_malignant_stroma| + |`THY1`|`ENSG00000154096`|malignant|mesenchymal|`10.1093/hmg/ddq042`|might_also_be_expressed_in_non_malignant_stroma| + + +### The table `GeneticAlterations_metadata.csv` contains the following column and information: - alteration contains the number and portion of the affected chromosome - gain_loss contains the information regarding the gain or loss of the corresponding genetic alteration @@ -125,11 +125,11 @@ We gather from the litterature marker genes and specific genomic alterations tha |alteration|gain_loss|cell_class|cell_type|DOI|PMID|comment |---|---|---|---|---|---|---| -|11p13|loss|malignant|NA|10.1242/dev.153163|NA|NA| -|11p15|loss|malignant|NA|10.1128/mcb.9.4.1799-1803.1989|NA|NA| +|11p13|loss|malignant|NA|`10.1242/dev.153163`|NA|NA| +|11p15|loss|malignant|NA|`10.1128/mcb.9.4.1799-1803.1989`|NA|NA| |16q|loss|malignant|NA|NA|1317258|Associated_with_relapse| |1p|loss|malignant|NA|NA|8162576|Associated_with_relapse| -|1q|gain|malignant|NA|10.1016/S0002-9440(10)63982-X|NA|Associated_with_relapse| +|1q|gain|malignant|NA|`10.1016/S0002-9440(10)63982-X`|NA|Associated_with_relapse| ## workflow description @@ -152,22 +152,23 @@ The `00_run_workflow.sh` contains the following steps: - Exploration of clustering, label transfers, marker genes and pathways: `03_clustering_exploration.Rmd` in `notebook_template` - - CNV inference using [`infercnv`](https://github.com/broadinstitute/inferCNV/wiki) with endothelial and immune cells as reference from either the same patient or a pool of upfront resection Wilms tumor samples: `06_infercnv.R` in `script` + - CNV inference using [`inferCNV`](https://github.com/broadinstitute/inferCNV/wiki) with endothelial and immune cells as reference from either the same patient or a pool of upfront resection Wilms tumor samples: `06_inferCNV.R` in `script` -While we only selected the `infercnv` method with endothelium and immune cells as normal reference for the main workflow across samples, our analysis includes an exploration of cnv inference methods based on `copykat` and `infercnv` on a subselection of samples: -the `script` `explore-cnv-methods.R` calls the independent scripts `05_copyKAT.R` and `06_infercnv.R` for the samples - - "SCPCS000179", - - "SCPCS000184", - - "SCPCS000194", - - "SCPCS000205", - - "SCPCS000208". +While we only selected the `inferCNV` method with endothelium and immune cells as normal reference for the main workflow across samples, our analysis includes an exploration of CNV inference methods based on `copyKAT` and `inferCNV` on a subset of samples. +The script `explore-cnv-methods.R` calls the independent scripts `05_copyKAT.R` and `06_inferCNV.R` for these samples: + + - `SCPCS000179` + - `SCPCS000184` + - `SCPCS000194` + - `SCPCS000205` + - `SCPCS000208` In addition, we explored the results for all samples in one notebook twice during the analysis: - the notebook `04_annotation_Across_Samples_exploration.Rmd` explored the annotations obtained by label transfer in all samples -- the notebook `07_annotation_Across_Samples_exploration.Rmd` explored the potential of combining label transfer and cnv to finalize the annotation of the Wilms tumor dataset. +- the notebook `07_annotation_Across_Samples_exploration.Rmd` explored the potential of combining label transfer and CNV to finalize the annotation of the Wilms tumor dataset. For each sample and each of the step, an html report is generated and accessible in the directory `notebook`. @@ -200,20 +201,20 @@ Here we will use an `Azimuth`-adapted approach to transfer labels from the refer ### Input and outputs We start with the `_process.Rds` data to run `01_seurat-processing.Rmd`. -The output of `01_seurat-processing.Rmd` is saved in `results` in a subfolder for each sample and is the input of the second step `02a_label-transfer_fetal_full_reference_Cao.Rmd`. +The output of `01_seurat-processing.Rmd` is saved in `results` in a folder for each sample and is the input of the second step `02a_label-transfer_fetal_full_reference_Cao.Rmd`. The output of `02a_label-transfer_fetal_full_reference_Cao.Rmd` is then the input of `02b_label-transfer_fetal_kidney_reference_Stewart.Rmd`. -Following the same approach, the output of `02b_label-transfer_fetal_kidney_reference_Stewart.Rmd` is the input of `03_clustering_exploration.Rmd` and `06_infercnv.R`. -The outputs of `06_infercnv.R` `06_infercnv_HMM-i3_{sample_id}_{reference-type}.rds` is finally the input of `07_combined_annotation_across_samples_exploration.Rmd`, which produces a TSV with annotations in `results/SCPCP000006-annotations.tsv `. +Following the same approach, the output of `02b_label-transfer_fetal_kidney_reference_Stewart.Rmd` is the input of `03_clustering_exploration.Rmd` and `06_inferCNV.R`. +The outputs of `06_inferCNV.R` `06_inferCNV_HMM-i3_{sample_id}_{reference-type}.rds` is finally the input of `07_combined_annotation_across_samples_exploration.Rmd`, which produces a TSV with annotations in `results/SCPCP000006-annotations.tsv `. All inputs/outputs generated and used in the main workflow are saved in the `results/{sample_id}` folder. -Results in subfolders such as `results/{sample_id}/05_copyKAT` or `results/{sample_id}/06_infercnv` have been obtained for a subselection of samples in the exploratory analysis, and are thus kept separated from the results of the main workflow. +Results in folders such as `results/{sample_id}/05_copyKAT` or `results/{sample_id}/06_inferCNV` have been obtained for a subset of samples in the exploratory analysis, and are thus kept separated from the results of the main workflow. -At the end of the workflow, we have a `Seurat`object that contains: +At the end of the workflow, we have a `Seurat` object that contains: - normalization and clustering, dimensional reductions - label transfer from the fetal full reference - label transfer from the fetal kidney reference -- cnv predictions using `infercnv` +- CNV predictions using `inferCNV` ## Software requirements diff --git a/analyses/cell-type-wilms-tumor-06/notebook/00-reference/00b_characterization_fetal_kidney_reference_Stewart.html b/analyses/cell-type-wilms-tumor-06/notebook/00-reference/00b_characterization_fetal_kidney_reference_Stewart.html index 5651a7b7e..15a769467 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/00-reference/00b_characterization_fetal_kidney_reference_Stewart.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/00-reference/00b_characterization_fetal_kidney_reference_Stewart.html @@ -5487,7 +5487,7 @@

2024-08-07

Introduction

The aim is to characterize the human fetal kidney from the kidney cell atlas. You can find more about the human kidney atlas here: https://www.kidneycellatlas.org/ [1] The rds data can be -download using the download link https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds +download using the download link: https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds. The reference was downloaded and created in the script scripts/prepare-fetal-references.R.

@@ -5512,14 +5512,14 @@

Base directories

Input files

The input file /Users/sjspielman/ALSF/open-scpca/OpenScPCA-analysis/analyses/cell-type-wilms-tumor-06/scratch/fetal_kidney.rds -is the output of the R script -prepare-fetal-references.R.

+is the output of the script prepare-fetal-references.R.

Output file

We will save the result of the differential expression analysis in -results/references/00b_marker_genes_fetal_kidney_Stewart.csv Notebook is -saved in the notebook/00-reference directory

+results/references/00b_marker_genes_fetal_kidney_Stewart.csv. +Notebook is saved in the notebook/00-reference +directory.

path_to_output <- file.path(module_base, "results", "references")
@@ -5541,11 +5541,12 @@

Characterization of compartment and cell types in the reference

characterized the different compartments and cell types.

This is just to get markers genes of the different population, in case some could be of interest for the Wilms tumor annotations.

-

We run DElegate::FindAllMarkers2 to find markers of the different -clusters and manually check if they do make sense. -DElegate::FindAllMarkers2 is an improved version of -Seurat::FindAllMarkers based on pseudobulk differential expression -method. Please check the preprint from Chistoph Hafemeister: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v1 +

We run DElegate::FindAllMarkers2() to find markers of +the different clusters and manually check if they do make sense. +DElegate::FindAllMarkers2() is an improved version of +Seurat::FindAllMarkers() based on pseudobulk differential +expression method. Please check the preprint from Hafemeister and +Halbritter: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v2 and tool described here: https://github.com/cancerbits/DElegate

Find marker genes for each of the compartment

@@ -5645,7 +5646,7 @@

References

Session info

sessionInfo()
-
## R version 4.4.1 (2024-06-14)
+
## R version 4.4.0 (2024-04-24)
 ## Platform: aarch64-apple-darwin20
 ## Running under: macOS 15.1
 ## 
@@ -5685,12 +5686,12 @@ 

Session info

## [40] RSpectra_0.16-2 irlba_2.3.5.1 crosstalk_1.2.1 ## [43] labeling_0.4.3 progressr_0.14.0 timechange_0.3.0 ## [46] fansi_1.0.6 spatstat.sparse_3.1-0 httr_1.4.7 -## [49] polyclip_1.10-7 abind_1.4-5 compiler_4.4.1 +## [49] polyclip_1.10-7 abind_1.4-5 compiler_4.4.0 ## [52] bit64_4.0.5 withr_3.0.1 viridis_0.6.5 ## [55] fastDummies_1.7.4 highr_0.11 MASS_7.3-61 -## [58] tools_4.4.1 lmtest_0.9-40 httpuv_1.6.15 +## [58] tools_4.4.0 lmtest_0.9-40 httpuv_1.6.15 ## [61] future.apply_1.11.2 goftest_1.2-3 glue_1.7.0 -## [64] nlme_3.1-166 promises_1.3.0 grid_4.4.1 +## [64] nlme_3.1-166 promises_1.3.0 grid_4.4.0 ## [67] Rtsne_0.17 cluster_2.1.6 reshape2_1.4.4 ## [70] generics_0.1.3 gtable_0.3.5 spatstat.data_3.1-2 ## [73] tzdb_0.4.0 data.table_1.16.0 hms_1.1.3 @@ -5698,7 +5699,7 @@

Session info

## [79] ggrepel_0.9.5 RANN_2.6.2 pillar_1.9.0 ## [82] vroom_1.6.5 limma_3.60.4 yulab.utils_0.1.7 ## [85] spam_2.10-0 RcppHNSW_0.6.0 later_1.3.2 -## [88] splines_4.4.1 lattice_0.22-6 bit_4.0.5 +## [88] splines_4.4.0 lattice_0.22-6 bit_4.0.5 ## [91] renv_1.0.7 survival_3.7-0 deldir_2.0-4 ## [94] tidyselect_1.2.1 locfit_1.5-9.10 miniUI_0.1.1.1 ## [97] pbapply_1.7-2 knitr_1.48 gridExtra_2.3 @@ -5710,7 +5711,7 @@

Session info

## [115] xtable_1.8-4 reticulate_1.38.0 munsell_0.5.1 ## [118] jquerylib_0.1.4 Rcpp_1.0.13 globals_0.16.3 ## [121] spatstat.random_3.3-1 png_0.1-8 spatstat.univar_3.0-0 -## [124] parallel_4.4.1 assertthat_0.2.1 dotCall64_1.1-1 +## [124] parallel_4.4.0 assertthat_0.2.1 dotCall64_1.1-1 ## [127] sparseMatrixStats_1.16.0 listenv_0.9.1 viridisLite_0.4.2 ## [130] scales_1.3.0 ggridges_0.5.6 crayon_1.5.3 ## [133] leiden_0.4.3.1 rlang_1.1.4 cowplot_1.1.3
diff --git a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd index 0a2c5a58d..e4667177a 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration.Rmd @@ -155,7 +155,7 @@ The report will be saved in the `notebook` directory. -#### do_Feature_mean +#### `do_Feature_mean` `do_Feature_mean` shows a heatmap of mean expression of a feature grouped by a metadata. @@ -183,7 +183,7 @@ do_Feature_mean <- function(df, group.by, feature) { ``` -#### do_Feature_boxplot +#### `do_Feature_boxplot` `do_Feature_boxplot` shows boxplot of expression of a feature grouped by a metadata. @@ -214,7 +214,7 @@ do_Feature_boxplot <- function(df, group.by, feature, split.by) { } ``` -#### do_Feature_densityplot +#### `do_Feature_densityplot` `do_Feature_densityplot` shows boxplot of expression of a feature grouped by a metadata. @@ -387,7 +387,7 @@ DT::datatable(compartment_df, ### Label transfer predicted.score for the four compartments -The vertical line drawn correspods to the threshold explored in the notebook. +The vertical line drawn corresponds to the threshold explored in the notebook. ```{r fig.height=5, fig.width=10, message=FALSE, warning=FALSE, out.width='100%'} p <- do_Feature_densityplot( @@ -573,9 +573,9 @@ do_Feature_boxplot( ggtitle("boxplot of PTPRC expression for all cells") ``` -### Umap reduction +### UMAP reduction -We look at the umap reduction per sample. +We look at the UMAP reduction per sample. #### All cells diff --git a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.5.html b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.5.html index 72fa0448c..f24a09b8e 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.5.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.5.html @@ -11,7 +11,7 @@ - + Annotation exploration for SCPCP000006, predicted.score threshold 0.5 @@ -5478,7 +5478,7 @@

Annotation exploration for SCPCP000006, predicted.score threshold 0.5

Maud PLASCHKA

-

2024-11-14

+

2024-11-15

@@ -5604,7 +5604,7 @@

Output file

Functions

-

do_Feature_mean

+

do_Feature_mean

do_Feature_mean shows a heatmap of mean expression of a feature grouped by a metadata.

    @@ -5633,7 +5633,7 @@

    do_Feature_mean

    }
-

do_Feature_boxplot

+

do_Feature_boxplot

do_Feature_boxplot shows boxplot of expression of a feature grouped by a metadata.

-

do_Feature_densityplot

+

do_Feature_densityplot

do_Feature_densityplot shows boxplot of expression of a feature grouped by a metadata.

-
- +
+

Predicted compartment

@@ -5794,8 +5794,8 @@

Predicted compartment

buttons = c("csv", "excel") ) )
-
- +
+

What is the predicted organ of cells that are not labeled as kidney cell? Please note that this table is not sample-specific but contains all samples pooled into one.

@@ -5816,8 +5816,8 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

We also checked the number of cell in each compartment per sample, to assess the presence/absence of non-cancer cells (endothelia and immune) that could help the inference of copy number alterations.

@@ -5838,12 +5838,12 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

Label transfer predicted.score for the four compartments

-

The vertical line drawn correspods to the threshold explored in the +

The vertical line drawn corresponds to the threshold explored in the notebook.

p <- do_Feature_densityplot(
   df = cell_type_df,
@@ -5921,8 +5921,8 @@ 
Disease timing
buttons = c("csv", "excel") ) )
-
- +
+
@@ -6001,8 +6001,8 @@
Immune cells
-

Umap reduction

-

We look at the umap reduction per sample.

+

UMAP reduction

+

We look at the UMAP reduction per sample.

All cells

Point are colored per compartment:

diff --git a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.75.html b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.75.html index 2bf19e864..ee87a7a32 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.75.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.75.html @@ -11,7 +11,7 @@ - + Annotation exploration for SCPCP000006, predicted.score threshold 0.75 @@ -5478,7 +5478,7 @@

Annotation exploration for SCPCP000006, predicted.score threshold 0.75

Maud PLASCHKA

-

2024-11-14

+

2024-11-15

@@ -5604,7 +5604,7 @@

Output file

Functions

-

do_Feature_mean

+

do_Feature_mean

do_Feature_mean shows a heatmap of mean expression of a feature grouped by a metadata.

    @@ -5633,7 +5633,7 @@

    do_Feature_mean

    }
-

do_Feature_boxplot

+

do_Feature_boxplot

do_Feature_boxplot shows boxplot of expression of a feature grouped by a metadata.

-

do_Feature_densityplot

+

do_Feature_densityplot

do_Feature_densityplot shows boxplot of expression of a feature grouped by a metadata.

-
- +
+

Predicted compartment

@@ -5794,8 +5794,8 @@

Predicted compartment

buttons = c("csv", "excel") ) )
-
- +
+

What is the predicted organ of cells that are not labeled as kidney cell? Please note that this table is not sample-specific but contains all samples pooled into one.

@@ -5816,8 +5816,8 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

We also checked the number of cell in each compartment per sample, to assess the presence/absence of non-cancer cells (endothelia and immune) that could help the inference of copy number alterations.

@@ -5838,12 +5838,12 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

Label transfer predicted.score for the four compartments

-

The vertical line drawn correspods to the threshold explored in the +

The vertical line drawn corresponds to the threshold explored in the notebook.

p <- do_Feature_densityplot(
   df = cell_type_df,
@@ -5921,8 +5921,8 @@ 
Disease timing
buttons = c("csv", "excel") ) )
-
- +
+
@@ -6001,8 +6001,8 @@
Immune cells
-

Umap reduction

-

We look at the umap reduction per sample.

+

UMAP reduction

+

We look at the UMAP reduction per sample.

All cells

Point are colored per compartment:

diff --git a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.85.html b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.85.html index e961e14d0..8cadb57aa 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.85.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.85.html @@ -11,7 +11,7 @@ - + Annotation exploration for SCPCP000006, predicted.score threshold 0.85 @@ -5478,7 +5478,7 @@

Annotation exploration for SCPCP000006, predicted.score threshold 0.85

Maud PLASCHKA

-

2024-11-14

+

2024-11-15

@@ -5604,7 +5604,7 @@

Output file

Functions

-

do_Feature_mean

+

do_Feature_mean

do_Feature_mean shows a heatmap of mean expression of a feature grouped by a metadata.

    @@ -5633,7 +5633,7 @@

    do_Feature_mean

    }
-

do_Feature_boxplot

+

do_Feature_boxplot

do_Feature_boxplot shows boxplot of expression of a feature grouped by a metadata.

-

do_Feature_densityplot

+

do_Feature_densityplot

do_Feature_densityplot shows boxplot of expression of a feature grouped by a metadata.

-
- +
+

Predicted compartment

@@ -5794,8 +5794,8 @@

Predicted compartment

buttons = c("csv", "excel") ) )
-
- +
+

What is the predicted organ of cells that are not labeled as kidney cell? Please note that this table is not sample-specific but contains all samples pooled into one.

@@ -5816,8 +5816,8 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

We also checked the number of cell in each compartment per sample, to assess the presence/absence of non-cancer cells (endothelia and immune) that could help the inference of copy number alterations.

@@ -5838,12 +5838,12 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

Label transfer predicted.score for the four compartments

-

The vertical line drawn correspods to the threshold explored in the +

The vertical line drawn corresponds to the threshold explored in the notebook.

p <- do_Feature_densityplot(
   df = cell_type_df,
@@ -5921,8 +5921,8 @@ 
Disease timing
buttons = c("csv", "excel") ) )
-
- +
+
@@ -6001,8 +6001,8 @@
Immune cells
-

Umap reduction

-

We look at the umap reduction per sample.

+

UMAP reduction

+

We look at the UMAP reduction per sample.

All cells

Point are colored per compartment:

diff --git a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.95.html b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.95.html index 6b4cfefc1..54078d575 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.95.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/04_annotation_Across_Samples_exploration_predicted.score_threshold_0.95.html @@ -11,7 +11,7 @@ - + Annotation exploration for SCPCP000006, predicted.score threshold 0.95 @@ -5478,7 +5478,7 @@

Annotation exploration for SCPCP000006, predicted.score threshold 0.95

Maud PLASCHKA

-

2024-11-14

+

2024-11-15

@@ -5604,7 +5604,7 @@

Output file

Functions

-

do_Feature_mean

+

do_Feature_mean

do_Feature_mean shows a heatmap of mean expression of a feature grouped by a metadata.

    @@ -5633,7 +5633,7 @@

    do_Feature_mean

    }
-

do_Feature_boxplot

+

do_Feature_boxplot

do_Feature_boxplot shows boxplot of expression of a feature grouped by a metadata.

-

do_Feature_densityplot

+

do_Feature_densityplot

do_Feature_densityplot shows boxplot of expression of a feature grouped by a metadata.

-
- +
+

Predicted compartment

@@ -5794,8 +5794,8 @@

Predicted compartment

buttons = c("csv", "excel") ) )
-
- +
+

What is the predicted organ of cells that are not labeled as kidney cell? Please note that this table is not sample-specific but contains all samples pooled into one.

@@ -5816,8 +5816,8 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

We also checked the number of cell in each compartment per sample, to assess the presence/absence of non-cancer cells (endothelia and immune) that could help the inference of copy number alterations.

@@ -5838,12 +5838,12 @@

Predicted compartment

buttons = c("csv", "excel") ) ) -
- +
+

Label transfer predicted.score for the four compartments

-

The vertical line drawn correspods to the threshold explored in the +

The vertical line drawn corresponds to the threshold explored in the notebook.

p <- do_Feature_densityplot(
   df = cell_type_df,
@@ -5921,8 +5921,8 @@ 
Disease timing
buttons = c("csv", "excel") ) )
-
- +
+
@@ -6001,8 +6001,8 @@
Immune cells
-

Umap reduction

-

We look at the umap reduction per sample.

+

UMAP reduction

+

We look at the UMAP reduction per sample.

All cells

Point are colored per compartment:

diff --git a/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.Rmd b/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.Rmd index 61e02005f..8fe94884f 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.Rmd @@ -45,18 +45,16 @@ The analysis can be summarized as the following: _Where `cnv.thr` and `pred.thr` need to be discussed_ - -first level annotation | second level annotation | selection of the cells | marker genes for validation | cnv validation --- | -- | -- | -- | -- -normal | endothelial | compartment == "endothelium" & predicted.score > pred.thr & cnv_score < cnv.thr | WVF | no cnv -normal | immune | compartment == "immune" & predicted.score > pred.thr & cnv_score < cnv.thr | PTPRC, CD163, CD68 | no cnv -normal | kidney | cell_type %in% c("kidney cell", "kidney epithelial", "podocyte") & predicted.score > pred.thr & cnv_score < cnv.thr | CDH1, PODXL, LTL | no cnv -normal | stroma | compartment == "stroma" & predicted.score > pred.thr & cnv_score < cnv.thr | VIM | no cnv -cancer | stroma | compartment == "stroma" & cnv_score > cnv.thr | VIM | proportion_cnv_chr -1 -4 -11 -16 -17 -18 -cancer | blastema | compartment == "fetal_nephron" & cell_type == "mesenchymal cell" & cnv_score > cnv.thr | CITED1 | proportion_cnv_chr -1 -4 -11 -16 -17 -18 -cancer | epithelial | compartment == "fetal_nephron" & cell_type != "mesenchymal cell" & cnv_score > cnv.thr | CDH1 | proportion_cnv_chr -1 -4 -11 -16 -17 -18 -unknown | - | the rest of the cells | - | proportion_cnv_chr -1 -4 -11 -16 -17 -18 - +| first level annotation | second level annotation | selection of the cells | marker genes for validation | CNV validation | +| ---------------------- | ----------------------- | ---------------------- | --------------------------- | --------------- | +| normal | endothelial | `compartment == "endothelium" & predicted.score > pred.thr & cnv_score < cnv.thr` | `VWF`| no CNV | +| normal | immune | `compartment == "immune" & predicted.score > pred.thr & cnv_score < cnv.thr` | `PTPRC`, `CD163`, `CD68`| no CNV | +| normal | kidney | `cell_type %in% c("kidney cell","kidney epithelial", "podocyte") & predicted.score > pred.thr & cnv_score < cnv.thr` | `CDH1`, `PODXL`, `LTL`| no CNV | +| normal | stroma | `compartment == "stroma" & predicted.score > pred.thr & cnv_score < cnv.thr`| `VIM`| no CNV | +| cancer | stroma | `compartment == "stroma" & cnv_score > cnv.thr` | `VIM`| `proportion_cnv_chr: 1, 4, 11, 16, 17, 18` | +| cancer | blastema | `compartment == "fetal_nephron" & cell_type == "mesenchymal cell" & cnv_score > cnv.thr` | `CITED1`| `proportion_cnv_chr: 1, 4, 11, 16, 17, 18` | +| cancer | epithelial | `compartment == "fetal_nephron" & cell_type != "mesenchymal cell" & cnv_score > cnv.thr` | `CDH1`| `proportion_cnv_chr: 1, 4, 11, 16, 17, 18` | +| unknown | - | the rest of the cells | - | -| ### Packages @@ -284,7 +282,7 @@ do_Feature_mean <- function(df, group.by, feature) { ## Analysis -### Global cnv score +### Global CNV score As done in `06_cnv_infercnv_exploration.Rmd`, we calculate single CNV score and assess its potential in identifying cells with CNV versus normal cells without CNV. @@ -311,14 +309,14 @@ table(cell_type_df$has_cnv_score) At first, we like to indicate in the `first.level_annotation` if a cell is normal, cancer or unknown. -- _normal_ cells can be observe in all four compartments (`endothelium`, `immune`, `stroma` or `fetal nephron`) and do not have cnv. -We only allow a bit of flexibility in terms of cnv profile for immune and endothelium cells that have a high predicted score. -Indeed, we know that false positive cnv can be observed in a cell type specific manner. +- _normal_ cells can be observe in all four compartments (`endothelium`, `immune`, `stroma` or `fetal nephron`) and do not have CNV +We only allow a bit of flexibility in terms of CNV profile for immune and endothelium cells that have a high predicted score. +Indeed, we know that false positive CNV can be observed in a cell type specific manner. The threshold used for the `predicted.score` is defined as a parameter of this notebook as `r params$predicted.celltype.threshold`. -The threshold used for the identification of cnv is also defined in the params of the notebook as `r params$cnv_threshold`. +The threshold used for the identification of CNV is also defined as the notebook parameter `r params$cnv_threshold`. -- _cancer_ cells are either from the `stroma` or `fetal nephron` compartments and must have at least few cnv. +- _cancer_ cells are either from the `stroma` or `fetal nephron` compartments and must have at least few CNV @@ -366,7 +364,7 @@ Wilms tumor cancer cells can be: - _cancer stroma_: We define as _cancer stroma_ all cancer cells from the stroma compartment. -- _blastema_,: we defined as _bastema_ every cancer cell that has a `fetal_kidney_predicted.cell_type == mesenchymal cell`. +- _blastema_,: we defined as _blastema_ every cancer cell that has a `fetal_kidney_predicted.cell_type == mesenchymal cell`. We know that these _mesenchymal_ cells are cells from the cap mesenchyme that are not expected to be in a mature kidney. These blastema cells should express higher _CITED1_. @@ -434,7 +432,7 @@ ggplot(cell_type_df, aes(x = umap.umap_1, y = umap.umap_2, color = second.level_ theme(text = element_text(size = 22)) ``` -### Validation cancer versus normal based on the cnv profile +### Validation cancer versus normal based on the CNV profile ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} for (i in 1:22) { @@ -444,64 +442,64 @@ for (i in 1:22) { ### Validation of second level annotation using marker genes -#### Immune, _PTPRC_ expression +#### Immune, `PTPRC` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000081237") ``` -#### Endothelium, _VWF_ expression +#### Endothelium, `VWF` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000110799") ``` -#### Stroma, _Vimentin_ expression +#### Stroma, `Vimentin` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000026025") ``` -#### Stroma, _COL6A3_ expression +#### Stroma, `COL6A3` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000163359") ``` -#### Stroma, _THY1_ expression +#### Stroma, `THY1` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000154096") ``` -#### Blastema, _CITED1_ expression +#### Blastema, `CITED1` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000125931") ``` -#### Blastema, _NCAM1_ expression +#### Blastema, `NCAM1` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000149294") ``` -#### stemness marker (blastema and primitive epithelium), _SIX2_ expression +#### stemness marker (blastema and primitive epithelium), `SIX2` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000170577") ``` -#### Epithelium, _CDH1_ expression +#### Epithelium, `CDH1` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000039068") ``` -#### Epithelium, _PODXL_ expression +#### Epithelium, `PODXL` expression ```{r fig.width=20, fig.height=5, out.width='100%', results='asis'} do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000128567") @@ -545,20 +543,20 @@ length(unique(annotations_table$scpca_sample_id)) - Combining label transfer and CNV inference we have produced draft annotations for all 40 Wilms tumor samples in SCPCP000006 -- The heatmaps of cnv proportion and marker genes support our annotations, but signals with some marker genes are very low. +- The heatmaps of CNV proportion and marker genes support our annotations, but signals with some marker genes are very low. Also, there is no universal marker for each entity of Wilms tumor that cover all tumor cells from all patient. This makes the validation of the annotations quite difficult. - However, we could try to take the problem from the other side, and used the current annotation to perform differential expression analysis and try to find marker genes that are consistent across patient and Wilms tumor histologies. - In each histology (i.e. epithelial and stroma), the distinction between cancer and non cancer cell is difficult (as expected). -In this analysis, we suggested to rely on the cnv score to assess the normality of the cell. +In this analysis, we suggested to rely on the CNV score to assess the normality of the cell. Here again, we could try to run differential expression analysis and compare epithelial (resp. stroma) cancer versus non-cancer cells across patient, aiming to find a share transcriptional program allowing the classification cancer versus normal. - In our annotation, we haven't taken into account the favorable/anaplastic status of the sample. -However, as anaplasia can occur in every (but do not has to) wilms tumor histology, I am not sure how to integrate the information into the annotation. +However, as anaplasia can occur in every (but do not has to) Wilms tumor histology, I am not sure how to integrate the information into the annotation. -- This notebook could be finally rendered using different parameters, i.e. threshold for the cnv score and predicted score to use. +- This notebook could be finally rendered using different parameters, i.e. threshold for the CNV score and predicted score to use. ## Session Info diff --git a/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.html b/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.html index 9a921ddab..80fec741e 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.html +++ b/analyses/cell-type-wilms-tumor-06/notebook/07_combined_annotation_across_samples_exploration.html @@ -11,7 +11,7 @@ - + Combined annotation exploration for SCPCP000006 @@ -2941,7 +2941,7 @@

Combined annotation exploration for SCPCP000006

Maud PLASCHKA

-

2024-11-14

+

2024-11-15

@@ -2968,13 +2968,13 @@

Introduction

The analysis can be summarized as the following:

Where cnv.thr and pred.thr need to be discussed

- +
+---++ @@ -2982,71 +2982,65 @@

Introduction

- + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - - - + + + - +
second level annotation selection of the cells marker genes for validationcnv validationCNV validation
normal endothelialcompartment == “endothelium” & predicted.score > pred.thr -& cnv_score < cnv.thrWVFno cnvcompartment == "endothelium" & predicted.score > pred.thr & cnv_score < cnv.thrVWFno CNV
normal immunecompartment == “immune” & predicted.score > pred.thr & -cnv_score < cnv.thrPTPRC, CD163, CD68no cnvcompartment == "immune" & predicted.score > pred.thr & cnv_score < cnv.thrPTPRC, CD163, CD68no CNV
normal kidneycell_type %in% c(“kidney cell”, “kidney epithelial”, “podocyte”) -& predicted.score > pred.thr & cnv_score < cnv.thrCDH1, PODXL, LTLno cnvcell_type %in% c("kidney cell","kidney epithelial", "podocyte") & predicted.score > pred.thr & cnv_score < cnv.thrCDH1, PODXL, LTLno CNV
normal stromacompartment == “stroma” & predicted.score > pred.thr & -cnv_score < cnv.thrVIMno cnvcompartment == "stroma" & predicted.score > pred.thr & cnv_score < cnv.thrVIMno CNV
cancer stromacompartment == “stroma” & cnv_score > cnv.thrVIMproportion_cnv_chr -1 -4 -11 -16 -17 -18compartment == "stroma" & cnv_score > cnv.thrVIMproportion_cnv_chr: 1, 4, 11, 16, 17, 18
cancer blastemacompartment == “fetal_nephron” & cell_type == “mesenchymal cell” -& cnv_score > cnv.thrCITED1proportion_cnv_chr -1 -4 -11 -16 -17 -18compartment == "fetal_nephron" & cell_type == "mesenchymal cell" & cnv_score > cnv.thrCITED1proportion_cnv_chr: 1, 4, 11, 16, 17, 18
cancer epithelialcompartment == “fetal_nephron” & cell_type != “mesenchymal cell” -& cnv_score > cnv.thrCDH1proportion_cnv_chr -1 -4 -11 -16 -17 -18compartment == "fetal_nephron" & cell_type != "mesenchymal cell" & cnv_score > cnv.thrCDH1proportion_cnv_chr: 1, 4, 11, 16, 17, 18
unknown - the rest of the cells -proportion_cnv_chr -1 -4 -11 -16 -17 -18-
@@ -3109,120 +3103,118 @@

Input files

# These samples were run with "none" as the reference none_reference_samples <- c("SCPCS000177", "SCPCS000180", "SCPCS000181", "SCPCS000190", "SCPCS000197") - + # Create a data frames of all annotations cell_type_df <- sample_ids |> purrr::map( # For each sample_id, do the following: \(sample_id) { - - if (sample_id %in% none_reference_samples) { - reference <- "none" - } else { - reference <- "both" - } - - input_file <- file.path( - result_dir, - sample_id, - glue::glue("06_infercnv_HMM-i3_{sample_id}_reference-{reference}.rds") - ) - - - # The file may not be present if this is being run in CI, which is ok. - # If we are not running in CI and the file doesn't exist, we should error out - # We should error out if the file does not exist and we are NOT testing - if (!file.exists(input_file)) { - if (params$testing) { - return(NULL) - } else { - stop("Input RDS file does not exist.") - } - } - - # Read in the Seurat object - srat <- readRDS(input_file) - - # Create and return a data frame from the Seurat object with relevant annotations - # this data frame will have four columns: barcode, sample_id, compartment, organ - data.frame( - # label transfer from the fetal kidney reference - cell_type = srat$fetal_kidney_predicted.cell_type, - compartment = srat$fetal_kidney_predicted.compartment, - - # predicted.scores from the label transfer from the fetal kidney reference - cell_type.score = srat$fetal_kidney_predicted.cell_type.score, - compartment.score = srat$fetal_kidney_predicted.compartment.score, - - # cell embedding - umap = srat@reductions$umap@cell.embeddings, - - # marker genes - PTPRC = FetchData(object = srat, vars = "ENSG00000081237", layer = "counts"), - VWF = FetchData(object = srat, vars = "ENSG00000110799", layer = "counts"), - VIM = FetchData(object = srat, vars = "ENSG00000026025", layer = "counts"), - CITED1 = FetchData(object = srat, vars = "ENSG00000125931", layer = "counts"), - CDH1 = FetchData(object = srat, vars = "ENSG00000039068", layer = "counts"), - PODXL = FetchData(object = srat, vars = "ENSG00000128567", layer = "counts"), - COL6A3 = FetchData(object = srat, vars = "ENSG00000163359", layer = "counts"), - SIX2 = FetchData(object = srat, vars = "ENSG00000170577", layer = "counts"), - NCAM1 = FetchData(object = srat, vars = "ENSG00000149294", layer = "counts"), - THY1 = FetchData(object = srat, vars = "ENSG00000154096", layer = "counts"), - - # proportion of cnv per chromosome - proportion_cnv_chr1 = srat$proportion_cnv_chr1, - proportion_cnv_chr2 = srat$proportion_cnv_chr2, - proportion_cnv_chr3 = srat$proportion_cnv_chr3, - proportion_cnv_chr4 = srat$proportion_cnv_chr4, - proportion_cnv_chr5 = srat$proportion_cnv_chr5, - proportion_cnv_chr6 = srat$proportion_cnv_chr6, - proportion_cnv_chr7 = srat$proportion_cnv_chr7, - proportion_cnv_chr8 = srat$proportion_cnv_chr8, - proportion_cnv_chr9 = srat$proportion_cnv_chr9, - proportion_cnv_chr10 = srat$proportion_cnv_chr10, - proportion_cnv_chr11 = srat$proportion_cnv_chr11, - proportion_cnv_chr12 = srat$proportion_cnv_chr12, - proportion_cnv_chr13 = srat$proportion_cnv_chr13, - proportion_cnv_chr14 = srat$proportion_cnv_chr14, - proportion_cnv_chr15 = srat$proportion_cnv_chr15, - proportion_cnv_chr16 = srat$proportion_cnv_chr16, - proportion_cnv_chr17 = srat$proportion_cnv_chr17, - proportion_cnv_chr18 = srat$proportion_cnv_chr18, - proportion_cnv_chr19 = srat$proportion_cnv_chr19, - proportion_cnv_chr20 = srat$proportion_cnv_chr20, - proportion_cnv_chr21 = srat$proportion_cnv_chr21, - proportion_cnv_chr22 = srat$proportion_cnv_chr22, - - # cnv global estimation per chromosome - has_cnv_chr1 = srat$has_cnv_chr1, - has_cnv_chr2 = srat$has_cnv_chr2, - has_cnv_chr3 = srat$has_cnv_chr3, - has_cnv_chr4 = srat$has_cnv_chr4, - has_cnv_chr5 = srat$has_cnv_chr5, - has_cnv_chr6 = srat$has_cnv_chr6, - has_cnv_chr7 = srat$has_cnv_chr7, - has_cnv_chr8 = srat$has_cnv_chr8, - has_cnv_chr9 = srat$has_cnv_chr9, - has_cnv_chr10 = srat$has_cnv_chr10, - has_cnv_chr11 = srat$has_cnv_chr11, - has_cnv_chr12 = srat$has_cnv_chr12, - has_cnv_chr13 = srat$has_cnv_chr13, - has_cnv_chr14 = srat$has_cnv_chr14, - has_cnv_chr15 = srat$has_cnv_chr15, - has_cnv_chr16 = srat$has_cnv_chr16, - has_cnv_chr17 = srat$has_cnv_chr17, - has_cnv_chr18 = srat$has_cnv_chr18, - has_cnv_chr19 = srat$has_cnv_chr19, - has_cnv_chr20 = srat$has_cnv_chr20, - has_cnv_chr21 = srat$has_cnv_chr21, - has_cnv_chr22 = srat$has_cnv_chr22 - ) |> - tibble::rownames_to_column("barcode") |> - dplyr::mutate(sample_id = sample_id) - } - ) |> - # now combine all dataframes to make one big one - dplyr::bind_rows()
+ if (sample_id %in% none_reference_samples) { + reference <- "none" + } else { + reference <- "both" + } + + input_file <- file.path( + result_dir, + sample_id, + glue::glue("06_infercnv_HMM-i3_{sample_id}_reference-{reference}.rds") + ) + + # The file may not be present if this is being run in CI, which is ok. + # If we are not running in CI and the file doesn't exist, we should error out + # We should error out if the file does not exist and we are NOT testing + if (!file.exists(input_file)) { + if (params$testing) { + return(NULL) + } else { + stop("Input RDS file does not exist.") + } + } + + # Read in the Seurat object + srat <- readRDS(input_file) + + # Create and return a data frame from the Seurat object with relevant annotations + # this data frame will have four columns: barcode, sample_id, compartment, organ + data.frame( + # label transfer from the fetal kidney reference + cell_type = srat$fetal_kidney_predicted.cell_type, + compartment = srat$fetal_kidney_predicted.compartment, + + # predicted.scores from the label transfer from the fetal kidney reference + cell_type.score = srat$fetal_kidney_predicted.cell_type.score, + compartment.score = srat$fetal_kidney_predicted.compartment.score, + + # cell embedding + umap = srat@reductions$umap@cell.embeddings, + + # marker genes + PTPRC = FetchData(object = srat, vars = "ENSG00000081237", layer = "counts"), + VWF = FetchData(object = srat, vars = "ENSG00000110799", layer = "counts"), + VIM = FetchData(object = srat, vars = "ENSG00000026025", layer = "counts"), + CITED1 = FetchData(object = srat, vars = "ENSG00000125931", layer = "counts"), + CDH1 = FetchData(object = srat, vars = "ENSG00000039068", layer = "counts"), + PODXL = FetchData(object = srat, vars = "ENSG00000128567", layer = "counts"), + COL6A3 = FetchData(object = srat, vars = "ENSG00000163359", layer = "counts"), + SIX2 = FetchData(object = srat, vars = "ENSG00000170577", layer = "counts"), + NCAM1 = FetchData(object = srat, vars = "ENSG00000149294", layer = "counts"), + THY1 = FetchData(object = srat, vars = "ENSG00000154096", layer = "counts"), + + # proportion of cnv per chromosome + proportion_cnv_chr1 = srat$proportion_cnv_chr1, + proportion_cnv_chr2 = srat$proportion_cnv_chr2, + proportion_cnv_chr3 = srat$proportion_cnv_chr3, + proportion_cnv_chr4 = srat$proportion_cnv_chr4, + proportion_cnv_chr5 = srat$proportion_cnv_chr5, + proportion_cnv_chr6 = srat$proportion_cnv_chr6, + proportion_cnv_chr7 = srat$proportion_cnv_chr7, + proportion_cnv_chr8 = srat$proportion_cnv_chr8, + proportion_cnv_chr9 = srat$proportion_cnv_chr9, + proportion_cnv_chr10 = srat$proportion_cnv_chr10, + proportion_cnv_chr11 = srat$proportion_cnv_chr11, + proportion_cnv_chr12 = srat$proportion_cnv_chr12, + proportion_cnv_chr13 = srat$proportion_cnv_chr13, + proportion_cnv_chr14 = srat$proportion_cnv_chr14, + proportion_cnv_chr15 = srat$proportion_cnv_chr15, + proportion_cnv_chr16 = srat$proportion_cnv_chr16, + proportion_cnv_chr17 = srat$proportion_cnv_chr17, + proportion_cnv_chr18 = srat$proportion_cnv_chr18, + proportion_cnv_chr19 = srat$proportion_cnv_chr19, + proportion_cnv_chr20 = srat$proportion_cnv_chr20, + proportion_cnv_chr21 = srat$proportion_cnv_chr21, + proportion_cnv_chr22 = srat$proportion_cnv_chr22, + + # cnv global estimation per chromosome + has_cnv_chr1 = srat$has_cnv_chr1, + has_cnv_chr2 = srat$has_cnv_chr2, + has_cnv_chr3 = srat$has_cnv_chr3, + has_cnv_chr4 = srat$has_cnv_chr4, + has_cnv_chr5 = srat$has_cnv_chr5, + has_cnv_chr6 = srat$has_cnv_chr6, + has_cnv_chr7 = srat$has_cnv_chr7, + has_cnv_chr8 = srat$has_cnv_chr8, + has_cnv_chr9 = srat$has_cnv_chr9, + has_cnv_chr10 = srat$has_cnv_chr10, + has_cnv_chr11 = srat$has_cnv_chr11, + has_cnv_chr12 = srat$has_cnv_chr12, + has_cnv_chr13 = srat$has_cnv_chr13, + has_cnv_chr14 = srat$has_cnv_chr14, + has_cnv_chr15 = srat$has_cnv_chr15, + has_cnv_chr16 = srat$has_cnv_chr16, + has_cnv_chr17 = srat$has_cnv_chr17, + has_cnv_chr18 = srat$has_cnv_chr18, + has_cnv_chr19 = srat$has_cnv_chr19, + has_cnv_chr20 = srat$has_cnv_chr20, + has_cnv_chr21 = srat$has_cnv_chr21, + has_cnv_chr22 = srat$has_cnv_chr22 + ) |> + tibble::rownames_to_column("barcode") |> + dplyr::mutate(sample_id = sample_id) + } + ) |> + # now combine all dataframes to make one big one + dplyr::bind_rows()

Output file

@@ -3264,7 +3256,7 @@

do_Feature_mean

Analysis

-

Global cnv score

+

Global CNV score

As done in 06_cnv_infercnv_exploration.Rmd, we calculate single CNV score and assess its potential in identifying cells with CNV versus normal cells without CNV.

@@ -3292,19 +3284,18 @@

First level annotation

  • normal cells can be observe in all four compartments (endothelium, immune, stroma or -fetal nephron) and do not have cnv. We only allow a bit of -flexibility in terms of cnv profile for immune and endothelium cells +fetal nephron) and do not have CNV We only allow a bit of +flexibility in terms of CNV profile for immune and endothelium cells that have a high predicted score. Indeed, we know that false positive -cnv can be observed in a cell type specific manner.
  • +CNV can be observed in a cell type specific manner.

The threshold used for the predicted.score is defined as a parameter of this notebook as 0.85. The threshold used for the -identification of cnv is also defined in the params of the notebook as -0.

+identification of CNV is also defined as the notebook parameter 0.

  • cancer cells are either from the stroma or fetal nephron compartments and must have at least few -cnv.
  • +CNV
# Define normal cells
 # We first pick up the immune and endothelial cells annotated via the label transfer compartments under the condition that the predicted score is above the threshold
@@ -3345,7 +3336,7 @@ 

Cancer cells

  • cancer stroma: We define as cancer stroma all cancer cells from the stroma compartment.

  • -
  • blastema,: we defined as bastema every cancer +

  • blastema,: we defined as blastema every cancer cell that has a fetal_kidney_predicted.cell_type == mesenchymal cell. We know that these mesenchymal cells are cells from the cap @@ -3406,7 +3397,7 @@

    Cancer and normal cells

-

Validation cancer versus normal based on the cnv profile

+

Validation cancer versus normal based on the CNV profile

for (i in 1:22) {
   print(do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = glue::glue("proportion_cnv_chr", i)))
 }
@@ -3415,53 +3406,53 @@

Validation cancer versus normal based on the cnv profile

Validation of second level annotation using marker genes

-

Immune, PTPRC expression

+

Immune, PTPRC expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000081237")

-

Endothelium, VWF expression

+

Endothelium, VWF expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000110799")

-

Stroma, Vimentin expression

+

Stroma, Vimentin expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000026025")

-

Stroma, COL6A3 expression

+

Stroma, COL6A3 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000163359")

-

Stroma, THY1 expression

+

Stroma, THY1 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000154096")

-

Blastema, CITED1 expression

+

Blastema, CITED1 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000125931")

-

Blastema, NCAM1 expression

+

Blastema, NCAM1 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000149294")

-

stemness marker (blastema and primitive epithelium), SIX2 -expression

+

stemness marker (blastema and primitive epithelium), +SIX2 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000170577")

-

Epithelium, CDH1 expression

+

Epithelium, CDH1 expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000039068")

-

Epithelium, PODXL expression

+

Epithelium, PODXL expression

do_Feature_mean(cell_type_df, group.by = "second.level_annotation", feature = "ENSG00000128567")

@@ -3499,7 +3490,7 @@

Conclusion

  • Combining label transfer and CNV inference we have produced draft annotations for all 40 Wilms tumor samples in SCPCP000006

  • -
  • The heatmaps of cnv proportion and marker genes support our +

  • The heatmaps of CNV proportion and marker genes support our annotations, but signals with some marker genes are very low. Also, there is no universal marker for each entity of Wilms tumor that cover all tumor cells from all patient. This makes the validation of the @@ -3510,17 +3501,17 @@

    Conclusion

    and Wilms tumor histologies.

  • In each histology (i.e. epithelial and stroma), the distinction between cancer and non cancer cell is difficult (as expected). In this -analysis, we suggested to rely on the cnv score to assess the normality +analysis, we suggested to rely on the CNV score to assess the normality of the cell. Here again, we could try to run differential expression analysis and compare epithelial (resp. stroma) cancer versus non-cancer cells across patient, aiming to find a share transcriptional program allowing the classification cancer versus normal.

  • In our annotation, we haven’t taken into account the favorable/anaplastic status of the sample. However, as anaplasia can -occur in every (but do not has to) wilms tumor histology, I am not sure +occur in every (but do not has to) Wilms tumor histology, I am not sure how to integrate the information into the annotation.

  • This notebook could be finally rendered using different -parameters, i.e. threshold for the cnv score and predicted score to +parameters, i.e. threshold for the CNV score and predicted score to use.

diff --git a/analyses/cell-type-wilms-tumor-06/notebook/README.md b/analyses/cell-type-wilms-tumor-06/notebook/README.md index 28c4442f9..211be679f 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook/README.md +++ b/analyses/cell-type-wilms-tumor-06/notebook/README.md @@ -1,11 +1,10 @@ # Notebook directory instructions -The notebook directory holds subdirectory for each of the sample in the Wilms tumor dataset SCPCP000006 and the fetal kidney reference that we used for label transfer. +The notebook directory holds a directory for each of the sample in the Wilms tumor dataset `SCPCP000006` and the fetal kidney reference that we used for label transfer. -## Azimuth compatible fetal kidney reference +## Fetal kidney reference -To perform label transfer using Azimuth and the fetal kidney atlas, a reference is built via [`scripts/download-and-create-fetal-kidney-ref.R`](../scripts/download-and-create-fetal-kidney-ref.R) using the fetal_full.Rds object download from: -"https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds" +To perform label transfer using an Azimuth-adapted appraoch and the fetal kidney atlas, a reference is built via [`scripts/prepare-fetal-references.R`](../scripts/scripts/prepare-fetal-references.R). As part of the `00b_characterize_fetal_kidney_reference_Stewart.Rmd` notebook template, we characterized the fetal kidney reference and generated lists of marker genes for the compartment and cell types composing the reference. @@ -20,10 +19,10 @@ In brief, the `_processed.rds` `sce object` is converted to `Seurat` and normali Dimensionality reduction (`RunPCA` and `RunUMAP`) and clustering (`FindNeighbors` and `FindClusters`) are performed before saving the `Seurat` object. - [x] `02a_fetal_full_label-transfer_{sample-id}.html` is the output of the [`02a_label-transfer_fetal_full_reference_Cao.Rmd`](../notebook_template/02a_label-transfer_fetal_full_reference_Cao.Rmd) notebook template. -In brief, we used `Azimuth` to transfer labels from the `Azimuth` fetal full reference (Cao et al.) +In brief, we used an Azimuth-adapted approach to transfer labels from the Azimuth fetal full reference (Cao et al.) - [x] `02b_fetal_kidney_label-transfer_{sample-id}.html` is the output of the [`02b_label-transfer_fetal_kidney_reference_Stewart.Rmd`](../notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd) notebook template. -In brief, we used `Azimuth` to transfer labels from the fetal kidney reference (Stewart et al.) +In brief, we used an Azimuth-adapted approach to transfer labels from the fetal kidney reference (Stewart et al.) - [x] `03_clustering_exploration_{sample-id}.html` is the output of the [`03_clustering_exploration.Rmd`](../notebook_template/03_clustering_exploration.Rmd) notebook template. In brief, we explore the clustering results, we look into some marker genes, pathways enrichment and label transfer. @@ -34,9 +33,9 @@ In brief, we explore the clustering results, we look into some marker genes, pat The next step in analysis is to identify tumor vs. normal cells. - [x] `04_annotation_Across_Samples_exploration.html` is the output of the [`04_annotation_Across_Samples_exploration.Rmd`](../notebook/04_annotation_Across_Samples_exploration.Rmd) notebook. -In brief, we explored the label transfer results across all samples in the Wilms tumor dataset SCPCP000006 in order to identify a few samples that we can begin next analysis steps with. +In brief, we explored the label transfer results across all samples in the Wilms tumor dataset `SCPCP000006` in order to identify a few samples that we can begin next analysis steps with. -One way to evaluate the label transfer is to look at the `predicted.score` for each label being transfered, which more or less correspond to the certainty for a label transfer to be _TRUE_. More informations on the cell-level metric `predicted.score` can be found in the [mapping QC](https://azimuth.hubmapconsortium.org/#Mapping%20QC) section of `Azimuth` documentation. +One way to evaluate the label transfer is to look at the `predicted.score` for each label being transferred, which more or less correspond to the certainty for a label transfer to be `TRUE`. More information on the cell-level metric `predicted.score` can be found in the [mapping QC](https://azimuth.hubmapconsortium.org/#Mapping%20QC) section of `Azimuth` documentation. We render the notebook with different thresholds for the `predicted.score` and evaluate the impact of filtering out cells with a `predicted.score` below 0.5, 0.75, 0.85 and 0.95. @@ -44,11 +43,11 @@ Of important notes: - The stroma compartment often has a poor `predicted.score`. This is for me an indication that these cells might be cancer cells and not normal stromal cells. -- We would rather use the `predicted.score` threshold to select normal cells for which we have a high confidency, i.e. immune and endothelial cells, but not to filter out all cells below the threshold. +- We would rather use the `predicted.score` threshold to select normal cells for which we have a high confidence, i.e. immune and endothelial cells, but not to filter out all cells below the threshold. - While a `predicted.score` of 0.5 is much too low (almost all cells having a higher `predicted.score`) and 0.95 is too high (so few cells pass the threshold), 0.75 and 0.85 looked both appropriate for our purpose. ---> We decided to go with the most stringent threshold of 0.85 as we want to be sure of our selection of normal cells (i.e. endothelial and immune cells) that we will use to run `infercnv`. +--> We decided to go with the most stringent threshold of 0.85 as we want to be sure of our selection of normal cells (i.e. endothelial and immune cells) that we will use to run `inferCNV`. - [x] `07_combined_annotation_across_samples_exploration.html` is the output of the [`07_combined_annotation_across_samples_exploration.Rmd`](../notebook/07_combined_annotation_across_samples_exploration.Rmd) notebook. This notebook performs a draft annotation of samples using information from CNV inference and label transfer. @@ -56,15 +55,16 @@ This notebook performs a draft annotation of samples using information from CNV ## Exploratory analysis We selected in [`04_annotation_Across_Samples_exploration.Rmd`](../notebook/04_annotation_Across_Samples_exploration.Rmd) 5 samples to test for aneuploidy and CNV inference: -- sample SCPCS000194 has > 85 % of cells predicted as kidney and 234 + 83 endothelium and immune cells. -- sample SCPCS000179 has > 94 % of cells predicted as kidney and 25 + 111 endothelium and immune cells. -- sample SCPCS000184 has > 96 % of cells predicted as kidney and 39 + 70 endothelium and immune cells. -- sample SCPCS000205 has > 89 % of cells predicted as kidney and 92 + 76 endothelium and immune cells. -- sample SCPCS0000208 has > 95 % of cells predicted as kidney and 18 + 35 endothelium and immune cells. + +- sample SCPCS000194 +- sample SCPCS000179 +- sample SCPCS000184 +- sample SCPCS000205 +- sample SCPCS000208 - [x] `05_copykat_exploration_{sample_id}.html` is the output of the [`05_copykat_exploration.Rmd`](../notebook_template/05_copykat_exploration.Rmd) notebook template. -In brief, we wanted to test `copykat` results obtained with or without normal cells as reference, using either an euclidean or statistical (spearman) method for CNV heatmap clustering. +In brief, we wanted to test `copykat` results obtained with or without normal cells as reference, using either an euclidean or statistical (Spearman) method for CNV heatmap clustering. This impact the final decision made by `copykat` for each cell to be either aneuploid or diploid, and it is thus crucial to explore the results using the different methods. For each of the selected samples, we explore the results in the template `notebook` [`05_copykat_exploration.Rmd`](../notebook_template/05_copykat_exploration.Rmd), which creates a notebook `05_cnv_copykat_exploration_{sample_id}.html` for each sample. These `notebooks` are inspired by the plots written for the Ewing Sarcoma analysis in [`03-copykat.Rmd`](https://github.com/AlexsLemonade/OpenScPCA-analysis/blob/main/analyses/cell-type-ewings/exploratory_analysis/03-copykat.Rmd). diff --git a/analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd b/analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd index a2c5f54e3..f9a9838fc 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook_template/00b_characterize_fetal_kidney_reference_Stewart.Rmd @@ -32,7 +32,7 @@ knitr::opts_chunk$set( The aim is to characterize the human fetal kidney from the kidney cell atlas. You can find more about the human kidney atlas here: https://www.kidneycellatlas.org/ [1] -The rds data can be download using the download link https://datasets.cellxgene.cziscience.com/40ebb8e4-1a25-4a33-b8ff-02d1156e4e9b.rds +The rds data can be download using the download link: . The reference was downloaded and created in the script `scripts/prepare-fetal-references.R`. ## Packages @@ -61,13 +61,13 @@ module_base <- file.path(repository_base, "analyses", "cell-type-wilms-tumor-06" ## Input files -The input file `r params$fetal_kidney_path` is the output of the `R script` `prepare-fetal-references.R`. +The input file `r params$fetal_kidney_path` is the output of the script `prepare-fetal-references.R`. ## Output file -We will save the result of the differential expression analysis in results/references/00b_marker_genes_fetal_kidney_Stewart.csv -Notebook is saved in the `notebook/00-reference` directory +We will save the result of the differential expression analysis in `results/references/00b_marker_genes_fetal_kidney_Stewart.csv`. +Notebook is saved in the `notebook/00-reference` directory. ```{r path_to_output} path_to_output <- file.path(module_base, "results", "references") @@ -92,9 +92,9 @@ Here, we use an unbiased approach to find transcripts that characterized the dif This is just to get markers genes of the different population, in case some could be of interest for the Wilms tumor annotations. -We run DElegate::FindAllMarkers2 to find markers of the different clusters and manually check if they do make sense. -DElegate::FindAllMarkers2 is an improved version of Seurat::FindAllMarkers based on pseudobulk differential expression method. -Please check the preprint from Chistoph Hafemeister: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v1 +We run `DElegate::FindAllMarkers2()` to find markers of the different clusters and manually check if they do make sense. +`DElegate::FindAllMarkers2()` is an improved version of `Seurat::FindAllMarkers()` based on pseudobulk differential expression method. +Please check the preprint from Hafemeister and Halbritter: https://www.biorxiv.org/content/10.1101/2023.03.28.534443v2 and tool described here: https://github.com/cancerbits/DElegate ### Find marker genes for each of the compartment diff --git a/analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd b/analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd index 7ef0751c7..327b9fb4b 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook_template/02b_label-transfer_fetal_kidney_reference_Stewart.Rmd @@ -233,17 +233,17 @@ f2 <- SCpubr::do_BarPlot( Note: -For some reason, the "cap-mesenchyme cells" has been renamed in cellxgene in "mesenchymal cells". +For some reason, the "cap-mesenchyme cells" has been renamed in CELLxGENE as "mesenchymal cells". -The cap mesenchyme is a cap of condensed metanephric mesenchyme, comprised of cells which epithelialize and sequentially form the pretubular aggregate (PA), renal vesicle (RV), C-, and S-shaped bodies, and finally the mature nephron. +The cap mesenchyme is a cap of condensed metanephric mesenchyme, comprised of cells which epithelialize and sequentially form the peritubular aggregate (PA), renal vesicle (RV), C-, and S-shaped bodies, and finally the mature nephron. The CM contains nephron progenitor cells. This can be confusing and we just need to pay attention that : - the fetal nephron / mesenchymal cells are cap-mesenchyme cells. -In our case, cap-mesenchyme contains blastema and primitive epitheliul cancer cells. +In our case, cap-mesenchyme contains blastema and primitive epithelial cancer cells. -- the stroma / mesenchymal stem are likelly mesenchymal cancer or normal cells. +- the stroma / mesenchymal stem are likely mesenchymal cancer or normal cells. ## Save the `Seurat`object diff --git a/analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd b/analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd index ced32c199..e6220b15f 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook_template/03_clustering_exploration.Rmd @@ -122,7 +122,7 @@ The pre-processed and annotated `Seurat` object per samples are saved in the `re Here we defined function that will be used multiple time all along the notebook. -#### Visualize seurat clusters and metadata +#### Visualize Seurat clusters and metadata For a Seurat object `object`and a metadata `metadata`, the function `visualize_metadata` will plot `FeaturePlot` and `BarPlot` @@ -168,7 +168,7 @@ visualize_metadata <- function(object, meta, group.by) { } ``` -#### Visualize seurat clusters and markers genes +#### Visualize Seurat clusters and markers genes For a Seurat object `object`and a features `features`, the function `visualize_feature` will plot `FeaturePlot` and `ViolinPlot` @@ -276,7 +276,7 @@ Enrichment_plot <- function(category, signatures, background) { `do_Table_Heatmap` shows heatmap of counts of cells for combinations of two metadata variables -- `data` seurat object +- `data` Seurat object - `first_group` is the name of the first metadata to group the cells - `last_group` is the name of the second metadata to group the cells @@ -319,7 +319,7 @@ if (params$testing) { ``` -### Visualize seurat clusters +### Visualize Seurat clusters We expect up to 5 set of clusters: @@ -374,9 +374,9 @@ for (feature in CellType_metadata$ENSEMBL_ID[CellType_metadata$ENSEMBL_ID %in% r ### Look at specific pathways -#### TP53 pathway +#### `TP53` pathway -here we will calculate a TP53 score using `AddModuleScore` and the genes of the HALLMARK_P53_PATHWAY gene set. +here we will calculate a `TP53` score using `AddModuleScore` and the genes of the HALLMARK_P53_PATHWAY gene set. ```{r fig.height=4, fig.width=20, warning=FALSE, out.width='100%'} srat <- MSigDB_score(object = srat, category = "H", gs_name = "HALLMARK_P53_PATHWAY", name = "TP53_score", nbin = seurat_nbins) @@ -413,7 +413,7 @@ visualize_metadata(srat, meta = "DICER1_score1", group.by = "seurat_clusters") ``` -### Find marker genes for each of the seurat clusters +### Find marker genes for each of the Seurat clusters In addition to the list of known marker genes, we used an unbiased approach to find transcripts that characterized the different clusters. diff --git a/analyses/cell-type-wilms-tumor-06/notebook_template/05_copykat_exploration.Rmd b/analyses/cell-type-wilms-tumor-06/notebook_template/05_copykat_exploration.Rmd index 52950010b..ab9e36f33 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook_template/05_copykat_exploration.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook_template/05_copykat_exploration.Rmd @@ -1,12 +1,12 @@ --- -title: "Copykat CNV results exploration for `r params$sample_id`" +title: "CopyKAT CNV results exploration for `r params$sample_id`" author: "Maud PLASCHKA" date: "`r Sys.Date()`" params: sample_id: "SCPCS000179" seed: 12345 -output: - html_document: +output: + html_document: toc: yes toc_float: yes code_folding: hide @@ -36,19 +36,19 @@ subdiagnosis <- readr::read_tsv( dplyr::pull(subdiagnosis) ``` -This notebook explores using [`CopyKAT`](https://github.com/navinlabcode/copykat) to estimate tumor and normal cells in `r params$sample_id` from SCPCP000006. +This notebook explores using [`CopyKAT`](https://github.com/navinlabcode/copykat) to estimate tumor and normal cells in `r params$sample_id` from SCPCP000006. This sample has a(n) `r subdiagnosis` subdiagnosis. -`CopyKAT` was run using the `05_copyKAT.R` script using either an euclidean or statistical (spearman) method to calculate distance in `copyKAT`. +`CopyKAT` was run using the `05_copyKAT.R` script using either an euclidean or statistical (Spearman) method to calculate distance in `copyKAT`. `CopyKAT` was run with and without a normal reference. Immune and endothelial cells as identified by label transfer were used as the references cells where applicable. -These results are read into this notebook and used to: - - - Visualize diploid and aneuploid cells on the UMAP. - - Evaluate common copy number gains and losses in Wilms tumor. - - Compare the annotations from `CopyKAT` to cell type annotations using label transfer and the fetal (kidney) references. +These results are read into this notebook and used to: + + - Visualize diploid and aneuploid cells on the UMAP. + - Evaluate common copy number gains and losses in Wilms tumor. + - Compare the annotations from `CopyKAT` to cell type annotations using label transfer and the fetal (kidney) references. ### Packages @@ -123,13 +123,13 @@ for (ref_value in c("ref", "noref")) { ### Output file -Reports will be saved in the `notebook` directory. +Reports will be saved in the `notebook` directory. The pre-processed and annotated `Seurat` object per samples are saved in the `result` folder. ## Functions -Here we defined function that will be used multiple time all along the notebook. +Here we defined function that will be used multiple time all along the notebook. ## Analysis @@ -144,7 +144,7 @@ DefaultAssay(srat) <- "SCT" ### CopyKAT results -Below we look at the heatmaps produced by `CopyKAT`. +Below we look at the heatmaps produced by `CopyKAT`. #### Heatmap without reference @@ -169,8 +169,8 @@ Below we look at the heatmaps produced by `CopyKAT`. #### UMAP -Below we prepare and plot a UMAP that shows which cells are classified as diploid, aneuploid, and not defined by `CopyKAT`. -We show a side by side UMAP with results from running `CopyKAT` both with and without a reference of normal cells. +Below we prepare and plot a UMAP that shows which cells are classified as diploid, aneuploid, and not defined by `CopyKAT`. +We show a side by side UMAP with results from running `CopyKAT` both with and without a reference of normal cells. ```{r} # read in ck predictions from both reference types (no_normal and with_normal) @@ -197,19 +197,19 @@ ggplot(cnv_df, aes(x = umap_1, y = umap_2, color = copykat.pred)) + ### Validate common CNAs found in Wilms tumor -To validate some of these annotations, we can also look at some [commonly found copy number variations](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-wilms-tumor-06#the-table-geneticalterations_metadatacsv-contains-the-following-column-and-information) in Wilms tumor patients: - +To validate some of these annotations, we can also look at some [commonly found copy number variations](https://github.com/AlexsLemonade/OpenScPCA-analysis/tree/main/analyses/cell-type-wilms-tumor-06#the-table-geneticalterations_metadatacsv-contains-the-following-column-and-information) in Wilms tumor patients: + - Loss of Chr1p - Gain of Chr1q - Loss of Chr11p13 - Loss of Chr11p15 - Loss of Chr16q - -Although these are the most frequent, there are patients who do not have any of these alterations and patients that only have some of these alterations. -See [Tirode et al.,](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264969/) and [Crompton et al.](https://doi.org/10.1158/2159-8290.CD-13-1037). - -`CopyKAT` outputs a matrix that contains the estimated copy numbers for each gene in each cell. -We can read that in and look at the mean estimated copy numbers for each chromosome across each cell. + +Although these are the most frequent, there are patients who do not have any of these alterations and patients that only have some of these alterations. +See [Tirode et al.,](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4264969/) and [Crompton et al.](https://doi.org/10.1158/2159-8290.CD-13-1037). + +`CopyKAT` outputs a matrix that contains the estimated copy numbers for each gene in each cell. +We can read that in and look at the mean estimated copy numbers for each chromosome across each cell. We might expect that tumor cells would show an increased estimated copy number in Chr1q, and/or a loss of Chr1p, Chr11p and Chr16q. ```{r} @@ -236,8 +236,8 @@ cnv_df <- cnv_df |> dplyr::left_join(full_cnv_df, by = c("barcodes", "reference_used")) |> dplyr::filter(!is.na(chrom)) ``` - -Let's look at the distribution of CNV estimation in cells that are called aneuploid and diploid by `CopyKAT`. + +Let's look at the distribution of CNV estimation in cells that are called aneuploid and diploid by `CopyKAT`. ```{r, fig.height=15, fig.width=10} # create faceted density plots showing estimation of CNV detection across each chr of interest @@ -255,9 +255,9 @@ ggplot(cnv_df, aes(x = mean_cnv_detection, color = copykat.pred)) + ## Conclusions -From the heatmap of CNV and the mean CNV detection plots, there does not appear to be any pattern that drives the identification of aneuploid cells. -The assignment of the aneuploidy/diploidy value might relies on very few CNV and/or an arbitrary threshold. -This might be why the assignement of aneuploidy/diploidy values differs between condition (and between runs!!). +From the heatmap of CNV and the mean CNV detection plots, there does not appear to be any pattern that drives the identification of aneuploid cells. +The assignment of the aneuploidy/diploidy value might relies on very few CNV and/or an arbitrary threshold. +This might be why the assignment of aneuploidy/diploidy values differs between condition (and between runs!!). ## Session Info diff --git a/analyses/cell-type-wilms-tumor-06/notebook_template/06_cnv_infercnv_exploration.Rmd b/analyses/cell-type-wilms-tumor-06/notebook_template/06_cnv_infercnv_exploration.Rmd index 6584a3ab7..d4bbf8789 100644 --- a/analyses/cell-type-wilms-tumor-06/notebook_template/06_cnv_infercnv_exploration.Rmd +++ b/analyses/cell-type-wilms-tumor-06/notebook_template/06_cnv_infercnv_exploration.Rmd @@ -42,7 +42,7 @@ This sample has a(n) `r subdiagnosis` subdiagnosis. `infercnv` was run using the `06_inferCNV.R` script with and without a normal reference, from the same patient or from an inter-patient pull of normal cells. We tested the impact of the sub-selection of normal cells using either immune, and/or endothelial cells as healthy reference. In addition, we are exploring the use of the [HMM based CNV Prediction Methods](https://github.com/broadinstitute/infercnv/wiki/inferCNV-HMM-based-CNV-Prediction-Methods). -`infercnv` currently support two models for HMM-based CNV prediction, what we refer to as the i3 and i6 models. These are set in the 'infercnv::run()' as HMM_type='i3' or HMM_type='i6' (i6 is default). Each method operates on the 'preliminary infercnv object' which has been processed through the standard inferCNV processing routines, involving subtraction of signal corresponding to 'normal (reference)' cells and smoothing operations. +`infercnv` currently support two models for HMM-based CNV prediction, what we refer to as the i3 and i6 models. These are set in the `infercnv::run()` as HMM_type='i3' or HMM_type='i6' (i6 is default). Each method operates on the preliminary `infercnv` object which has been processed through the standard inferCNV processing routines, involving subtraction of signal corresponding to "normal (reference)" cells and smoothing operations. - i3 HMM is a three-state CNV model representing deletion, neutral, and amplification states. - i6 HMM: a six-state CNV model that predicts the following CNV levels: @@ -109,7 +109,7 @@ Do_CNV_heatmap <- function(object, infercnv_obj, group.by, reference_value) { #### Calculate a global CNV score per cell to check the general distribution For a `Seurat` object an `infercnv` object created with the script `06_infercnv.R` using `reference_value` as a reference, the function `Do_CNV_score` calculate a CNV score per cell. -The score is calculated based on the [biostar discussion](https://www.biostars.org/p/9573777/). +The score is calculated based on the [this discussion](https://www.biostars.org/p/9573777/). The function `Do_CNV_score` returns the `Seurat` object with an additional metadata named `CNV-score_{reference_value}.` - `reference_value` is the selection of normal cells used for `infercnv` @@ -128,9 +128,9 @@ Do_CNV_score <- function(seurat_oject, infercnv_obj, reference_value) { ``` -#### Visualize seurat clusters and metadata +#### Visualize Seurat clusters and metadata -For a Seurat object `object`and a metadata `metadata`, the function `visualize_metadata` will plot `FeaturePlot` and `BarPlot` +For a Seurat object `object` and a metadata `metadata`, the function `visualize_metadata` will plot `FeaturePlot` and `BarPlot` - `object` is the Seurat object @@ -231,9 +231,9 @@ visualize_density <- function(object, features, group.by) { } ``` -#### Wrapper function to explore `infercnv` HMM cnv prediction +#### Wrapper function to explore `infercnv` HMM CNV prediction -The `wrapper_explore_hmm` take as input the `infercnv_obj` generated with `infercnv` HMM cnv predictions. +The `wrapper_explore_hmm` take as input the `infercnv_obj` generated with `infercnv` HMM CNV predictions. The wrapper allows the following steps and plots: @@ -242,7 +242,7 @@ The wrapper allows the following steps and plots: For each chromosome, we look at the repartition of the `proportion_cnv_` in cells labeled as immune, endothelial, stroma and fetal nephron. `proportion_cnv_` is the proportion in number of genes that are part of any cnv/loss/duplication in the given chr. -##### Distribution of CNV estimation in the Wilms tumor copartments +##### Distribution of CNV estimation in the Wilms tumor compartments For each chromosome, we look at the distribution of the `proportion_cnv_` in cells labeled as immune, endothelial, stroma and fetal nephron. `proportion_cnv_` is the proportion in number of genes that are part of any cnv/loss/duplication in the given chr. @@ -254,9 +254,9 @@ We do not know if fetal nephron and stroma cells are a mix of normal and cancer Would they be a group of normal cells, we should expect a single peak center on 0 for every chromosome. As we expect to have a large number of cancer with heterogeneous CNV, we should see multiple peaks. -##### DotPlot +##### Dot Plot -The `Dotplot` representation summarizes the percentage of cells in each compartment with cnv in each of the 22 chromosomes. +The `Dotplot` representation summarizes the percentage of cells in each compartment with CNV in each of the 22 chromosomes. ##### CNV score @@ -334,10 +334,10 @@ srat <- readRDS(file.path(module_base, "results", params$sample_id, glue::glue(" ## Analysis -### Heatmap of infercnv results +### Heatmap of `infercnv` results Here we plot the output of `infercnv` as heatmaps of CNV. -We first look at the png file generated by the `infercnv` function. +We first look at the PNG file generated by the `infercnv` function. We then used the `infercnv object` to look at mean CNV value across compartments (immune, endothelial, stroma and fetal nephron). #### Without reference @@ -372,7 +372,7 @@ for (reference_value in c("reference-none", "reference-immune", "reference-endot ``` These heatmaps emphasize the importance of the selection of normal cells prior the inference of CNV. The normal reference should contain as much cell types as possible, in order to minimize false positive CNV. -In our case, we should take immune and entodethial cells when possible. +In our case, we should take immune and endothelial cells when possible. Of note: By default if no reference is provided, `infercnv` take the mean of expression as normal reference. The risk is that the main cell population (in our case the fetal nephron compartment) might be mistaken as the normal baseline. @@ -380,8 +380,8 @@ The risk is that the main cell population (in our case the fetal nephron compart ### Summary CNV score -We want to calculate a single CNV score and asess if/how it can be use to define cells with CNV versus stable/normal cells. -We defined the score as discribed in the [biostar discussion](https://www.biostars.org/p/9573777/). +We want to calculate a single CNV score and assess if/how it can be used to define cells with CNV versus stable/normal cells. +We defined the score as described in the [biostar discussion](https://www.biostars.org/p/9573777/). We would expect: @@ -409,7 +409,7 @@ We might have to select chromosomes we would like to look at, i.e. the one relev ### HMM-i3 inference prediction with both immune and endothelium cells as reference -We then explore infercnv results generated with immune and endothelial cells as reference, using a [HMM-i3 prediction models](https://github.com/broadinstitute/infercnv/wiki/infercnv-i3-HMM-type). +We then explore `infercnv` results generated with immune and endothelial cells as reference, using a [HMM-i3 prediction models](https://github.com/broadinstitute/infercnv/wiki/infercnv-i3-HMM-type). We load the `Seurat` object generated in `06_infercnv.R` @@ -418,7 +418,7 @@ We load the `Seurat` object generated in `06_infercnv.R` srat_i3 <- readRDS(file.path(module_base, "results", params$sample_id, glue::glue("06_infercnv_HMM-i3_", params$sample_id, "_reference-both.rds"))) ``` -and explore the CNV results using the `wrapper_explore_hmm` fucntion. +and explore the CNV results using the `wrapper_explore_hmm` function. ```{r fig.width=16, fig.height=8, out.width='100%', out.height='100%', warning=FALSE} p <- list() @@ -482,15 +482,15 @@ pull ``` -## Comparisons of inter- and intra-patient global cnv score with HMM prediction model +## Comparisons of inter- and intra-patient global CNV score with HMM prediction model We compare here the binary CNV scores calculated with the three HMM prediction models: -- HMM-i3 with inter-patient endothelial and immunce cells as reference +- HMM-i3 with inter-patient endothelial and immune cells as reference -- HMM-i3 with intra-patient endothelial and immunce cells as reference +- HMM-i3 with intra-patient endothelial and immune cells as reference -- HMM-i6 with intra-patient endothelial and immunce cells as reference +- HMM-i6 with intra-patient endothelial and immune cells as reference ```{r fig.width=20, fig.height=8, out.width='100%'} @@ -515,23 +515,23 @@ In our case, we advise taking at least immune and endothelial cells as normal re - The HMM prediction models help exploring the `infercnv` results. In this notebook, we have compared three HMM prediction models: - + HMM-i3 with inter-patient endothelial and immunce cells as reference + + HMM-i3 with inter-patient endothelial and immune cells as reference - + HMM-i3 with intra-patient endothelial and immunce cells as reference + + HMM-i3 with intra-patient endothelial and immune cells as reference - + HMM-i6 with intra-patient endothelial and immunce cells as reference + + HMM-i6 with intra-patient endothelial and immune cells as reference - Globally, the three scores seems to drive similar conclusions, with the majority of fetal nephron and stroma cells being cancer cells, at least in the sample selected. - + The HMM-i3 model with inter-patient endothelial and immunce cells as reference has the advantage to be usable for all Wilms tumor samples, including the ones with a very low number of immune and/or endothelial cells. + + The HMM-i3 model with inter-patient endothelial and immune cells as reference has the advantage to be usable for all Wilms tumor samples, including the ones with a very low number of immune and/or endothelial cells. - + The HMM-i3 model with intra-patient endothelial and immunce cells as reference seems to be the cleaner, ~fast to run (10 minutes per samples) and is more precise than the HMM-i3 with the inter-patient reference. + + The HMM-i3 model with intra-patient endothelial and immune cells as reference seems to be the cleaner, ~fast to run (10 minutes per samples) and is more precise than the HMM-i3 with the inter-patient reference. - + The HMM-i6 model with intra-patient endothelial and immunce cells as reference is very slow (~2 hours per sample) and couldn't be used for the entire cohorte. + + The HMM-i6 model with intra-patient endothelial and immune cells as reference is very slow (~2 hours per sample) and couldn't be used for the entire cohort. It is more noisy than the i3 version. However, it could have the potential to detect cancer cells with very low CNV profile. -- Surprisingly, running `infercnv` with emdothelial and immune cells from (i) the same patient or (ii) a set of Wilms tumor patients do not seem to affect drastically the results. +- Surprisingly, running `infercnv` with endothelial and immune cells from (i) the same patient or (ii) a set of Wilms tumor patients do not seem to affect drastically the results. Some false positive CNV might occur in every patient due to the inter-patient variability. By comparing the results in conditions (i) and (ii), we should be able to understand which false positive are recurrent and do not take them into account. diff --git a/analyses/cell-type-wilms-tumor-06/results/README.md b/analyses/cell-type-wilms-tumor-06/results/README.md index 01e9e9ff1..d0eab53e1 100644 --- a/analyses/cell-type-wilms-tumor-06/results/README.md +++ b/analyses/cell-type-wilms-tumor-06/results/README.md @@ -1,7 +1,7 @@ # Azimuth compatible fetal references To perform label transfer using code adapted from Azimuth, we prepare two references in [`scripts/prepare-fetal-references.R`](../scripts/prepare-fetal-references.R). -- First, we use the fetal_full.Rds object downloaded from . +- First, we use the fetal_full.Rds object downloaded from CELLxGENE. This is a fetal kidney reference from Stewart et al., and it is saved in `references/stewart_formatted_ref.rds`. - Second, we format the Azimuth "fetusref" reference. This is a fetal organ reference from Cao et al., and it is saved in `references/cao_formatted_ref.rds`. @@ -58,31 +58,31 @@ For each sample and each condition (reference and distance), we saved in `result We also tried to infer large CNV in cancer cells using `infercnv` and tested the sensibility of the output in regard to the definition of the normal cells. We selected previously 5 samples to test for these parameters: -- sample SCPCS000194 has > 85 % of cells predicted as kidney and 234 + 83 endothelium and immune cells. -- sample SCPCS000179 has > 94 % of cells predicted as kidney and 25 + 111 endothelium and immune cells. -- sample SCPCS000184 has > 96 % of cells predicted as kidney and 39 + 70 endothelium and immune cells. -- sample SCPCS000205 has > 89 % of cells predicted as kidney and 92 + 76 endothelium and immune cells. -- sample SCPCS0000208 has > 95 % of cells predicted as kidney and 18 + 35 endothelium and immune cells. +- sample SCPCS000194 +- sample SCPCS000179 +- sample SCPCS000184 +- sample SCPCS000205 +- sample SCPCS000208 `infercnv` requires a gene position file that we build in `06a_build-geneposition.R` and saved as `gencode_v19_gen_pos.complete.txt` in `results/references`. For each sample and each condition (reference), we saved in `results/{sample_id}/06_infercnv/reference-{selection}`: -- the final `infercnv`rds object in `06_infercnv_{sample_id}_reference-{selection}.rds` +- the final `infercnv` object in `06_infercnv_{sample_id}_reference-{selection}.rds` - the heatmap of CNV in `06_infercnv_{sample_id}_reference-{selection}_heatmap.png` Of note, the final `infercnv` rds object includes the following slots: -- 'infercnv_obj@ expr.data' : contains the processed expression matrix as it exists at the end of that stage for which that inferCNV object represents. +- `infercnv_obj@expr.data` : contains the processed expression matrix as it exists at the end of that stage for which that inferCNV object represents. -- 'infercnv_obj@reference_grouped_cell_indices' : list containing the expression matrix column indices that correspond to each of the normal (reference) cell types. +- `infercnv_obj@reference_grouped_cell_indices` : list containing the expression matrix column indices that correspond to each of the normal (reference) cell types. -- 'infercnv_obj@observation_grouped_cell_indices' : similar list as above, but corresponds to the tumor cell types. +- `infercnv_obj@observation_grouped_cell_indices` : similar list as above, but corresponds to the tumor cell types. Based on the above slots, it would be straightforward to extract info of interest and/or move data into other analysis frameworks. -In addition, fot the condition `reference = "both"`, we ran `infercnv` with `HMM = TRUE`. +In addition, for the condition `reference = "both"`, we ran `infercnv` with `HMM = TRUE`. [HMM CNV prediction methods](https://github.com/broadinstitute/infercnv/wiki/inferCNV-HMM-based-CNV-Prediction-Methods) will allow us to explore the CNV results better, with an easy [merge](https://github.com/broadinstitute/infercnv/wiki/Extracting-features) of `infercnv` result with the `Seurat` object. -However, HMM CNV prediction methods uses a lot of resources, including time (~2h/sample/condition), and often lead to RSession end. +However, HMM CNV prediction methods uses a lot of resources, including time (~2h/sample/condition), and often causes the R session to crash. This is why we only ran the HMM model for one `reference` condition. After selection of the best reference to use, we will run it for all samples.