This repository contains a worked example showing how to cluster and visualize a mass cytometry (CyTOF) data set, using FlowSOM for clustering and Rtsne for visualization.
FlowSOM is an R/Bioconductor package for clustering flow cytometry and mass cytometry (CyTOF) data (see paper and Bioconductor package). The clustering algorithm is based on self-organizing maps and hierarchical consensus meta-clustering.
We previously showed that FlowSOM performs very well for clustering high-dimensional CyTOF data, and in particular has extremely fast runtimes (see paper published in Cytometry Part A and code repository on GitHub).
Rtsne is an R implementation of the popular t-SNE algorithm (see t-SNE algorithm page, Rtsne development page, and Rtsne package on CRAN).
The t-SNE algorithm projects high-dimensional data to 2 or 3 dimensions for visualization. This is conceptually similar to principal component analysis (PCA). However, the t-SNE algorithm is non-linear (while PCA is linear), making t-SNE much better suited for many types of biological data.
On a t-SNE plot of flow or mass cytometry data, points "near" to each other can be interpreted as belonging to the same or similar cell populations. However, the precise distances in the plot are not meaningful, so care should be taken not to over-interpret the plot. The algorithm also has a random start, so unless a random seed is used (as in this example), each run will look slightly different.
The repository contains the following files.
- R code script: FlowSOM_Rtsne_example.R
- data file: data/Samusik_01_notransform.fcs
- output plot file: plots/FlowSOM_Rtsne_plot.pdf
- output file of FlowSOM cluster labels: results/cluster_labels_FlowSOM.txt