stDyer is a spatial domain cluster method for sptailly resolved transcriptomic data.
Install dependencies
# clone project
git clone https://github.com/ericcombiolab/stDyer.git
cd stDyer
# create conda environment
conda env create -f stdyer.yml
conda activate stdyer
There is a tutorial notebook tutorial.ipynb that demonstrates how to train the model with a single slice dataset. For more advanced usage using command line, please refer to the following sections:
Train model with chosen experiment configuration from configs/experiment/
python run.py experiment=example.yaml
The predicted spatial domain labels will be saved to anndata(.h5ad) files in logs/logger_logs folder. The raw predicted spatial domain labels is in adata.obs["pred_labels"]. The autoencoder refined labels is in adata.obs["mlp_fit"].
The detected spatially variable genes will be saved in adata.uns["svg_dict"].
You can override any parameter from command line like this
python run.py trainer.max_epochs=20
Train model with chosen experiment configuration from configs/experiment/ with multiple GPUs
CUDA_VISIBLE_DEVICES=0,1 python run.py experiment=example_ddp.yaml trainer.devices=2
To train model with your own dataset, you can copy the configs/experiment/example_ddp.yaml to configs/experiment/your_experiment.yaml file and modify it to your needs. The required data format is h5ad, which can be created by AnnData. The "spatial" key in the obsm attribute of the anndata object (adata.obsm["spatial"]
) indicates spatial coordinates and is necessary for constructing spatial adjacency graph. The full path to h5ad file is data_dir/dataset_dir/data_file_name
. You can also specify the requred number of spatial domains with the parameter num_classes
in your_experiment.yaml as well. The config file has rich comments for explaining the parameters.
cp configs/experiment/example_ddp.yaml configs/experiment/your_experiment.yaml
python run.py experiment=your_experiment.yaml
To train with a dataset with multiple slices, you need to first align the dataset with paste2. Refer to align_multiple_slices_with_paste2.ipynb for preprocessing steps. You can then train with configs/experiment/example_multi_slices.yaml. For your own dataset, make sure the obs attribute of the anndata object has the "batch" column (adata.obs["batch"]
), which indicates the slice index. Set z_scale
with a meaningful value (refer to config file for details) as adata.obs["batch"] * z_scale * min_two_units_xy_distance
will be considered as the third coordinate for constructing spatial adjacency graph besides two coordinates in adata.obsm["spatial"]
.
python run.py experiment=example_multi_slices.yaml
You can check https://doi.org/10.5281/zenodo.11315101 to download the processed data and reproducible Jupyter notebooks. Please read the README.md inside the zip file for details.