- InfoGenomeR is the Integrative Framework for Genome Reconstruction that uses a breakpoint graph to model the connectivity among genomic segments at the genome-wide scale. InfoGenomeR integrates cancer purity and ploidy, total CNAs, allele-specific CNAs, and haplotype information to identify the optimal breakpoint graph representing cancer genomes.
- The InfoGenomeR workflow is run by the snakemake.
- Environments are described in workflow/envs.
- Rules are described in workflow/Snakefile.
wget https://github.com/conda-forge/miniforge/releases/download/24.1.2-0/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
mamba create -c conda-forge -c bioconda -n snakemake snakemake
conda config --set channel_priority strict
conda activate snakemake
git clone https://github.com/dmcblab/InfoGenomeR.git
InfoGenomeR_repo=${PWD}/InfoGenomeR
After git clone, follow the steps below.
snakemake --core all --use-conda InfoGenomeR_env
snakemake --cores all --use-conda InfoGenomeR_download
Take a low coverage example (~50G). Check the example is working, and then replace example files with yours.
snakemake --core all --use-conda InfoGenomeR_example_download
# go to the InfoGenomeR repository.
cd ${InfoGenomeR_repo}
# make a workspace directory
workspace_dir=InfoGenomeR_workspace1
mkdir -p ${workspace_dir}
# link the reference in the workspace directory
ln -s ${PWD}/humandb/ref ${workspace_dir}/ref
Take the low coverage example in examples/fastq
- fastq/normal1.fq.gz (optional for somatic)
- fastq/normal2.fq.gz (optional for somatic)
- fastq/tumor1.fq.gz
- fastq/tumor2.fq.gz
ln -s ${PWD}/examples/fastq ${workspace_dir}/fastq
Then, go to InfoGenomeR run
If you start here, the bam folder would be yours, where the bam files should be named as below.
- bam/normal_sorted.bam (optional for somatic)
- bam/tumor_sorted.bam
ln -s bam ${workspace_dir}/bam
Then, go to InfoGenomeR run
Select either somatic (if a matched normal exists) or total (all variants in tumor) mode
# Run the InfoGenomeR workflow. The example is triploidy
snakemake --core all --use-conda ${workspace_dir}/InfoGenomeR_output --config mode=somatic min_ploidy=2.5 max_ploidy=3.5
snakemake --core all --use-conda ${workspace_dir}/InfoGenomeR_output --config mode=total min_ploidy=2.5 max_ploidy=3.5