Skip to content

ericcombiolab/stDyer

Repository files navigation


stDyer

PyTorch Lightning Config: Hydra Template

Description

stDyer is a spatial domain cluster method for sptailly resolved transcriptomic data.

How to run

Install dependencies

# clone project
git clone https://github.com/ericcombiolab/stDyer.git
cd stDyer

# create conda environment
conda env create -f stdyer.yml
conda activate stdyer

Tutorial

There is a tutorial notebook tutorial.ipynb that demonstrates how to train the model with a single slice dataset. For more advanced usage using command line, please refer to the following sections:

For the dataset with a single slice

Train model with chosen experiment configuration from configs/experiment/

python run.py experiment=example.yaml

The predicted spatial domain labels will be saved to anndata(.h5ad) files in logs/logger_logs folder. The raw predicted spatial domain labels is in adata.obs["pred_labels"]. The autoencoder refined labels is in adata.obs["mlp_fit"].

The detected spatially variable genes will be saved in adata.uns["svg_dict"].

You can override any parameter from command line like this

python run.py trainer.max_epochs=20

For the large dataset (multiple GPUs)

Train model with chosen experiment configuration from configs/experiment/ with multiple GPUs

CUDA_VISIBLE_DEVICES=0,1 python run.py experiment=example_ddp.yaml trainer.devices=2

To train model with your own dataset, you can copy the configs/experiment/example_ddp.yaml to configs/experiment/your_experiment.yaml file and modify it to your needs. The required data format is h5ad, which can be created by AnnData. The "spatial" key in the obsm attribute of the anndata object (adata.obsm["spatial"]) indicates spatial coordinates and is necessary for constructing spatial adjacency graph. The full path to h5ad file is data_dir/dataset_dir/data_file_name. You can also specify the requred number of spatial domains with the parameter num_classes in your_experiment.yaml as well. The config file has rich comments for explaining the parameters.

cp configs/experiment/example_ddp.yaml configs/experiment/your_experiment.yaml
python run.py experiment=your_experiment.yaml

For the dataset with a multiple slices

To train with a dataset with multiple slices, you need to first align the dataset with paste2. Refer to align_multiple_slices_with_paste2.ipynb for preprocessing steps. You can then train with configs/experiment/example_multi_slices.yaml. For your own dataset, make sure the obs attribute of the anndata object has the "batch" column (adata.obs["batch"]), which indicates the slice index. Set z_scale with a meaningful value (refer to config file for details) as adata.obs["batch"] * z_scale * min_two_units_xy_distance will be considered as the third coordinate for constructing spatial adjacency graph besides two coordinates in adata.obsm["spatial"].

python run.py experiment=example_multi_slices.yaml

For reproducing the results in the paper

You can check https://doi.org/10.5281/zenodo.11315101 to download the processed data and reproducible Jupyter notebooks. Please read the README.md inside the zip file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published