SALADS (SALmonella entericA preDiction and Subtyping)

http://groupa.bioappgenome22.biosci.gatech.edu/

Team Members: Amartya Mandal, Mannan Bhola, Erin Connolly, Varsha Bhat, Zack Mudge, Shreyash Gupta, Palak Aggarwal

We present SALADS (SALmonella entericA** preDiction and Subtyping), a browser-based predictive webserver to identify and analyze Salmonella strain diversity for users with limited coding experience. SALADS aims to support bioinformaticians with a functional toolbox, allowing seamless integration of de novo assembly with SNP-based and phylogenomic analytic pipelines for unprecedented precision and detection of pathogenic Salmonella strains. SALADS can perform the following analyses given an input of raw sequence reads:

De novo assembly
Strain identification
Average Nucleotide Identity
Computational phenotyping
Visualization of results

Background & Motivation

Infections caused by pathogens commonly transmitted by food are common, potentially all preventable and therefore of major public health importance. The global burden of foodborne illnesses estimates that each year as many as 600 million, or almost 1 in 10 people in the world, fall ill after consuming contaminated food. Of these, 420,000 people die, including 125,000 children under the age of 5 years [1]. Therefore, our ability to rapidly detect, investigate, and stop foodborne outbreaks is vital to public health.

Webserver Walkthrough

File Upload, Enter Email, Run Pipeline!

Click on “Choose File" to select the first of the paired reads and then select "Choose File" below to complete the pair.
Enter your email
Select ‘Run Pipeline’ You will be notified by email when the analysis is complete.

Congratulations! You have just completed your analysis.

Results

About the Webserver

Technical Stack:

The technical stack implemented comprises:

Django: Python web framework
Gunicorn: Python web server gateway interface (WSGI) HTTP server
NGINX: Reverse proxy
Postgres: Database

System architecture:

About the Pipeline

General workflow:

Pipeline workflow:

Step 1: Genome Assembly

Main Aim: To stitch together the original bacterial genome from the fragmented reads of DNA

How do we assemble the genome?

The assembly pipeline performs read trimming and de novo assembly given paired-end read files in FASTQ format. Initial read quality assessment and trimming are performed with FastQC and BBDuk, respectively. MultiQC is used to compile read quality metrics for all isolates into one quality assessment output. Assemblies are produced using SPAdes, and assembly quality reports are produced using QUAST.

Step 2: Gene Prediction

Main Aim: To computationally infer the portions of DNA sequence that are biologically significant

How do we predict genes?

The prediction pipeline predicts coding and noncoding (RNA) genes given the output from genome assembly. Glimmer, Genemark-S2 and Prodigal predict the coding regions, and Infernal is used to predict the noncoding genes. Bedtools is used to merge the results from the aforementioned tools. The output comprises the GFF, FNA and FAA files with the predicted coding and noncoding genes

Step 3: Functional Annotation

Main Aim: To attach biological information about what cellular processes a gene carries out.

How do we predict genes?

The functional annotation pipeline uses both homology and ab-initio tools. For homology based prediction, Eggnog mapper, CARD-RGI(antibiotic resistance) and VFDB(virulence factors) are used. The ab-initio tools used for signal & membrane proteins annotation are SignalP and TMHMM. For non-coding genes, the assembled contigs are used as input for Piler-CR. The final output is a merged GFF file.

Step 4: Comparative Genomics

Main Aim: To identify outbreak strains through the genetic relatedness between isolates.

How do we compare isolates?

The comparative genomics pipeline takes fasta files from the genome assembly output and identifies clusters and similarities using fastANI. fastANI provides a phylogenetic tree and heatmap as output.

References

“Who's First Ever Global Estimates of Foodborne Diseases Find Children under 5 Account for Almost One Third of Deaths.” World Health Organization, World Health Organization, 3 Dec. 2015, www.who.int/news/item/03-12-2015-who-s-first-ever-global-estimates-of-foodborne-diseases-find-children-under-5-account-for-almost-one-third-of-deaths.
Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Chen, L., Yang, J., Yu, J., Yao, Z., Sun, L., Shen, Y., Jin, Q. (2005). VFDB: a reference database for bacterial virulence factors .Nucleic Acids Res. 33:D325-8.
Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, et al. (2007). DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Micr 57: 81-91. doi:10.1099/ijs.0.64483-0.
Lomsadze A, Gemayel K, Tang S, Borodovsky M. Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes.
Arthur L. Delcher, Kirsten A. Bratke, Edwin C. Powers, Steven L. Salzberg, Identifying bacterial genes and endosymbiont DNA with Glimmer, Bioinformatics
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119.
Pertea G and Pertea M. GFF Utilities: GffRead and GffCompare [version 1; peer review: 3 approved]. F1000Research 2020, 9:304

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
backend		backend
imgs		imgs
LICENSE		LICENSE
README.md		README.md
load_envs.sh		load_envs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SALADS (SALmonella entericA preDiction and Subtyping)

Background & Motivation

Webserver Walkthrough

Results

About the Webserver

Technical Stack:

System architecture:

About the Pipeline

Step 1: Genome Assembly

Step 2: Gene Prediction

Step 3: Functional Annotation

Step 4: Comparative Genomics

References

About

Releases

Packages

Languages

License

Hanuphant/Webserver_Computational_Genomics

Folders and files

Latest commit

History

Repository files navigation

SALADS (SALmonella entericA preDiction and Subtyping)

Background & Motivation

Webserver Walkthrough

Results

About the Webserver

Technical Stack:

System architecture:

About the Pipeline

Step 1: Genome Assembly

Step 2: Gene Prediction

Step 3: Functional Annotation

Step 4: Comparative Genomics

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages