Skip to content

Commit

Permalink
Merge branch 'jashapiro/round-robin1' of https://github.com/AlexsLemo…
Browse files Browse the repository at this point in the history
…nade/ScPCA-manuscript into jashapiro/round-robin1
  • Loading branch information
jashapiro committed Mar 20, 2024
2 parents b7c4852 + 633cc1a commit 90cdc9e
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion content/02.introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ To address this unmet need, Alex's Lemonade Stand Foundation and the Childhood C
The ScPCA Portal holds uniformly processed summarized gene expression from 10x Genomics droplet-based single-cell and single-nuclei RNA-seq for over 500 samples from a diverse set of over 50 types of pediatric cancers.
Originally comprised of data from ten projects funded by Alex's Lemonade Stand Foundation, the Portal has since expanded to include data contributed by pediatric cancer research community members.
In addition to gene expression data from single-cell and single-nuclei RNA-seq, the Portal includes data obtained from bulk RNA-seq, spatial transcriptomics, and feature barcoding methods, such as CITE-seq and cell hashing.
All data provided on the portal are available in formats ready for downstream analysis with common workflow ecosystems such as `R/Bioconductor`'s `SingleCellExperiment` [@doi:10.1038/s41592-019-0654-x] or the Python-based `AnnData` [@doi:10.1186/s13059-017-1382-0].
All data provided on the portal are available in formats ready for downstream analysis with common workflow ecosystems such as `SingleCellExperiment` objects used by `R/Bioconductor`[@doi:10.1038/s41592-019-0654-x] or `AnnData` objects used by `Scanpy` and related Python modules [@doi:10.1186/s13059-017-1382-0].
Downloaded objects contain normalized gene expression counts, dimensionality reduction results, and cell type annotations.
<!-- JAS: I changed the above to refer to workflows, but I wonder if we want to specifically call out Scanpy as part of that -->

Expand Down
8 changes: 4 additions & 4 deletions content/03.results.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ This unfiltered counts matrix is stored in a `SingleCellExperiment` object [@doi
`scpca-nf` performs filtering of empty droplets, removal of low-quality cells, normalization, dimensionality reduction, and cell type annotation (Figure {@fig:fig2}A).
The unfiltered gene by cell counts matrices are filtered to remove any barcodes that are not likely to contain cells using `DropletUtils::emptyDropsCellRanger()`[@doi:10.1186/s13059-019-1662-y], and all cells that pass are saved in a `SingleCellExperiment` object and `.rds` file with the suffix `_filtered.rds`.
Low-quality cells are identified and removed with `miQC` [@doi:10.1371/journal.pcbi.1009290], which jointly models the proportion of mitochondrial reads and detected genes per cell and calculates a probability that each cell is compromised.
The remaining cells' counts are normalized [@doi:10.1186/s13059-016-0947-7] and reduced-dimension representations are caclulated using both principal component analysis (PCA) and UMAP.
The remaining cells' counts are normalized [@doi:10.1186/s13059-016-0947-7], and reduced-dimension representations are calculated using both principal component analysis (PCA) and UMAP.
Finally, cell types are classified using two automated methods, `SingleR`[@doi:10.1038/s41590-018-0276-y] and `CellAssign`[@doi:10.1038/s41592-019-0529-1].
The results from this analysis are stored in a processed `SingleCellExperiment` object saved to an `.rds` file with the suffix `_processed.rds`.

Expand Down Expand Up @@ -131,7 +131,7 @@ The output is a single TSV file with the gene by sample counts matrix for all sa
This gene by sample matrix is only included with project downloads on the Portal.

To quantify spatial transcriptomics data, `scpca-nf` takes the RNA FASTQ and slide image as input (Figure {@fig:figS3}B).
As there is not yet full support for spatial transcriptomics with `alevin-fry`, `scpca-nf` uses Space Ranger to quantify all spatial transcriptomics data [@url:https://www.10xgenomics.com/support/software/space-ranger/latest].
As `alevin-fry` does not yet fully support spatial transcriptomics data, `scpca-nf` uses Space Ranger to quantify all spatial transcriptomics data [@url:https://www.10xgenomics.com/support/software/space-ranger/latest].
The output includes the spot by gene matrix along with a summary report produced by Space Ranger.

## Downloading projects from the ScPCA Portal
Expand All @@ -156,7 +156,7 @@ If the ScPCA project includes samples with bulk RNA-seq, two additional files ar
Providing data for all samples within a single file facilitates performing joint gene-level analyses, such as differential expression or gene set enrichment analyses, on multiple samples simultaneously.
Therefore, we provide a single, merged object for each project containing all raw and normalized gene expression data and metadata for all single-cell and single-nuclei RNA-seq libraries within a given ScPCA project.
We provide merged objects for all projects in the Portal except for those with multiplexing, due to potential ambiguity in identifying samples across multiplexed libraries.
The data in the merged object has simply been combined wihtout further processing; no batch-corrected or integrated data is included.
The data in the merged object has simply been combined without further processing; no batch-corrected or integrated data is included.
If downloading data from an ScPCA project as a single, merged file, the download will include a single `.rds` or `.hdf5` file, a summary report for the merged object, and a folder with all individual QC and cell type reports for each library found in the merged object (Figure {@fig:fig3}B).

To build the merged objects, we created an additional stand-alone workflow for merging the output from `scpca-nf`, `merge.nf` (Figure {@fig:fig3}C).
Expand Down Expand Up @@ -201,7 +201,7 @@ We calculated the delta median statistic for each cell in the dataset by subtrac
The delta median statistic helps evaluate how confident `SingleR` is in assigning each cell to a specific cell type, where low delta median values indicate ambiguous assignments and high delta median values indicate confident assignments [@url:https://bioconductor.org/books/release/SingleRBook/annotation-diagnostics.html#based-on-the-deltas-across-cells].
<!-- TODO: ⚠️ For review - What do you think of the next sentence? -->
<!-- JAS: removed phrase about names and ontologies to simplify -->
Using this measure, we found that the `BlueprintEncodeData` reference [@doi:10.3324/haematol.2013.094243; @doi:10.1038/nature11247], which includes a variety of normal cell types, tended to perform best or at least similarly to other references across samples from different disease types (Figure {@fig:figS4}).
Using this measure, we found that the `BlueprintEncodeData` reference [@doi:10.3324/haematol.2013.094243; @doi:10.1038/nature11247], which includes a variety of normal cell types, tended to perform better than or at least similarly to other references across samples from different disease types (Figure {@fig:figS4}).
Based on these findings, we used the `BlueprintEncodeData` reference to annotate cells from all libraries on the Portal, as using a single reference is potentially valuable for cross-project analyses.

In contrast, `CellAssign` is a marker-gene-based annotation method that requires a binary matrix with all cell types and all associated marker genes as the reference.
Expand Down

0 comments on commit 90cdc9e

Please sign in to comment.