This is a bioinformatics workflow that built using snakemake with the aim to automatic downstream processing of Next Generation Sequencing (NGS) reads, currently supports only paired-end Illumina sequence data. This project bundles a number of snakefiles for a de novo assembly, pan-genome and detecting genes for antimicrobial resistance as well as virulence. This is a brief description on the snakefiles
denovoassembly.Snakefile
- Quality assessment of raw reads using FASTQC
- quality trimming using SICKLE
- SPAdes for a de novo assembly
- Contig filteration, including lengths and coverage
- Taxonomic classification of fastq reads using minikraken
- Taxonomic classification of assembled contigs using kraken
- Calculate read coverage stats for mapped reads
- Quality assessment of assemblies using quast
- A final informative .html report using MULTIQC
pangenome.Snakefile
- Prokka for a rapid contig annotation
- Roary to construct a pangenome
- FastTree to create a ML phylogeny
- replace
run
in rules withshell
, so conda packages will be dowanloaded and used - allow each of the snakefiles to run independently if needed
- Visualising the phylogentic tree and plotting metdata is done outside Snakemake
- The option
--use-conda
within snakemake is not currently feasible - The config file MUST include both fastq and fasta files
The workflow was used with the following versions of software
- snakemake v5.3.0 https://snakemake.readthedocs.io/en/stable/
- Fastqc v0.11.5 https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
- sickle version 1.33 https://github.com/najoshi/sickle
- SPAdes v3.12.0 http://cab.spbu.ru/software/spades/
- Kraken v0.10.6 https://ccb.jhu.edu/software/kraken/
- KronaTools 2.7 https://github.com/marbl/Krona
- QualiMap v.2.2.1 http://qualimap.bioinfo.cipf.es/
- QUAST v4.3 http://bioinf.spbau.ru/quast
- multiqc v1.5 https://multiqc.info/
- Prokka v1.13.3 https://github.com/tseemann/prokka
- Roary v3.6.1 https://sanger-pathogens.github.io/Roary/
- Bowtie2 v2.3.0 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
- BWA v0.7.12-r1039 http://bio-bwa.sourceforge.net/
- SAMTools v1.4-14-g90b995f http://samtools.sourceforge.net/
- FastTree v2.1.9 http://www.microbesonline.org/fasttree/
-
Set up a project folder for the run
mkdir NGS-Project cd NGS-Project
-
Download the latest version from gitlab
git clone https://gitlab.com/Mostafa.Abdel-Glil/snakepipelines_bacterialgenomes.git
A bash script generateConfigSnakemake.sh
is written to automatically generate a config file in yaml format providing a folder that holds the raw data as well as the already assembled genomes. The produced config file list all raw data and assembled genome and contain the paths of databases and scripts. Some editing to config file is essential to set up the paths for databases
USAGE:
bash ./Scripts/generateConfigSnakemake.sh -d DIR/ -o File -g STRING
DESCRIPTION:
Generate Config file for Snakemake workflow in yaml format.
REQUIRED ARGUMENTS:
-d, --directory DIR
directory path where fastq/fasta files are stored.
-o, --output STRING
The output config file for Snakemake workflow in yaml format.
-g, --genus STRING
The name of genus for the fastq/fasta files
OPTIONAL ARGUMENTS:
-h, --help
Show this message.
EXAMPLE:
bash ./Scripts/generateConfigSnakemake.sh -d ./fastqReads/ -o config.yaml -g Campylobacter
The following paths for databases should be adjusted in the config file.
DB:
minikraken: /home/mostafa.abdel/dbs/miniKraken/minikraken_20171019_8GB
kraken: /home/DB_RAM/KrakenDB
krona: /home/DB/Krona_Taxonomy
AMR_db: /data/AGr110/mostafa/ariba_fmt_dbs/ariba_Card
VF_db: /data/AGr110/mostafa/ariba_fmt_dbs/ariba_vfdb_full
micDATA: /home/mostafa.abdel/aProjects/Campylobacter/snakemakeProject/Final-Snake-Project/data/micData.txt
AMR_db_abricate: card
VF_db_abricate: vfdb
tools:
scripts_dir: /home/mostafa.abdel/aProjects/Campylobacter/snakemakeProject/Final-Snake-Project/Scripts
multiqc_bin: /home/mostafa.abdel/.local/bin
directories:
snakemake_folder: /home/mostafa.abdel/aProjects/Campylobacter/snakemakeProject/Final-Snake-Project
It is always a good idea to display what the workflow will do without execution. For doing that, we will use the follwoing command.
snakemake -np --quiet --snakefile master.Snakefile
Execute the commands in the pipeline by removing the -np
option
snakemake --snakefile master.Snakefile
Comments should be addressed to [email protected]