MAGscreen - discovering new microbial species

Snakemake workflow to identify novel microbial species from a set of genomes.

Genomes are first quality-filtered based on the CheckM stats then compared against a genome database using Mash and MUMmer. Unknown hits are extracted, clustered at species-level using dRep and further quality-controlled with GUNC.

Installation

Install conda and snakemake
Clone repository

git clone https://github.com/alexmsalmeida/magscreen.git

How to run

Edit config.yml file to point to the input, output and databases directories. Input directory should contain the .fa assemblies to analyse and a .csv file with CheckM completeness and contamination scores. The databases folder should contain the GUNC diamond database and a custom Mash database (.msh) with the genomes you want to screen against.
(option 1) Run the pipeline locally (adjust -j based on the number of available cores)

snakemake --use-conda -k -j 4

(option 2) Run the pipeline on a cluster (e.g., LSF)

snakemake --use-conda -k -j 100 --cluster-config cluster.yml --cluster 'bsub -n {cluster.nCPU} -M {cluster.mem} -o {cluster.output}'

Output

The main output is located in the directory new_species/ which contains the best-quality representative genomes (.fa files) of each new species. New species matching all of the following criteria are filtered out:

Flagged by GUNC: clade_separation_score >0.45; contamination_portion >0.05; reference_representation_score >0.5
Are singletons (dRep clusters with only one member)
Are <90% complete based on CheckM

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
envs		envs
test/input		test/input
tools		tools
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
cluster.yml		cluster.yml
config.yml		config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAGscreen - discovering new microbial species

Installation

How to run

Output

About

Releases

Packages

Languages

License

slambrechts/magscreen

Folders and files

Latest commit

History

Repository files navigation

MAGscreen - discovering new microbial species

Installation

How to run

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages