Comprehensive single-cell clustering implementations by python
全面的单细胞聚类 python 实现
Read data from .mat
, .csv
and .txt
files.
从 .mat
, .csv
, .txt
文件中读取数据
Some tools including:
- Get colors for painting base on labels.
- Scatter with displaying labels.
- Accuracy compute.
包括一些常用工具:
- 根据标签获得颜色值,用于作图
- 制作带有标签图例的散点图
Methods of dimension reduction including:
- t-SNE
- PCA
包括一些降维方法:
- t 分布随机近邻嵌入
- 主成分分析
Clustering methods including:
- k-means
- k-NN
- hca
包括一些聚类方法:
- k 均值聚类
- k 最近邻法
- 层次聚类
Support vector machine implements including:
- SVM model training
- cross validation
包括一些支持向量机的实现:
- SVM 模型的训练和预测
- 交叉验证
Evaluation metrics including:
- accuracy
- adjusted rand score
- normalized mutual info score
- weighted F1 score
包括一些评价指标:
- 准确率
- 调整兰德系数 ARI
- 标准化互信息 NMI
- 加权 F1 值
Examples for processing.
流程的一些例子.
Xin 数据集(人胰岛细胞)1600 样本 使用 t-SNE 降维并可视化
perplexity = 50
perplexity = 5
This dataset file is too large to upload, please download it from:
此数据集文件过大,无法上传,请从以下链接下载:
Xin, Y. et al. RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes Genes. Cell Metab. 24, 608–615 (2016)
Yang 数据集(人胚胎发育)90 样本 使用 t-SNE 降维并可视化
perplexity = 40
perplexity = 5
Yan, L. et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013)
人胰岛数据集 / 60 样本 / 困惑度 = 5
k-NN prediction:
使用 k-NN 进行训练和自预测:
准确率 0.9667 (58/60) 'delta' 类的两个样本都分类错误
Accuracy = 0.9667 (58/60) The class 'delta' (2 samples) are totally missed.
hierarchy dendogram:
使用层次聚类进行分类:
scatter for hierarchy cluster result:
层次聚类结果散点图:
人癌细胞数据集 / 33 样本 / 困惑度 = 5
Jiang, H., Sohn, L., Huang, H., & Chen, L. (2018). Single Cell Clustering Based on Cell-Pair Differentiability Correlation and Variance Analysis. (May).