The aim of this workshop is to provide a researcher with no prior coding experience the necessary skills to perform their own analysis on a single-cell RNA-seq dataset. The list of topics discussed in this workshop can be found here.
A basic knowledge of R programming language is beneficial, though not strictly necessary to perform an analysis. There are several free beginner's courses for R, a simple web search will render a few. Here are a couple:
- https://lumc.github.io/rcourse/HLO_202301/S01L01l_index.html (Taught at LUMC)
- https://r-crash-course.github.io/
Seurat is a toolkit for single cell genomics written in R. It is developed and maintained by the Satija Lab, New York Genome Center. Is is probably the most popular single-cell analysis package.
The most common tasks that are relevant for any single-cell data analysis are covered in this Seurat tutorial, complete with a toy dataset. We typically use this tutorial as a basis for any analysis.
Identifying cell populations present across multiple datasets can be problematic under standard workflows. Seurat includes a set o methods to match, or align, shared cell populations across datasets. These methods first identify cross-dataset pairs of cells that are in a matched biological state (‘anchors’), can be used both to correct for technical differences between datasets (i.e. batch effect correction), and to perform comparative scRNA-seq analysis of across experimental conditions. A workflow employing the default Seurat integration approach can be found here. It must be noted, however, that we have observed that this default approach typically results in over-correction, or in other words, removes too much biological variance. In our experience, using function RunFastMNN typically renders more balanced results.
A few useful links to Seurat documentation:
- Essential Seurat command list
- A complete list of all Seurat functions. You may want to browse it to find your next favorite (plotting) function
Scanpy, standing for single-cell analysis in Python, is a scalable toolkit for analyzing single-cell gene expression data. It has some advantages over Seurat, the most prominent being the availability of advanced machine learning algorithms from Python and the scalability (ability to process very big datasets). We will not use Scanpy for this workshop.