snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

This repository contains a Nextflow DSL2 pipeline for annotating genetic variants in VCF files using SnpEff and dbSnp database. The pipeline processes input VCF files, performs various annotations, and generates a comprehensive annotation file.

Prerequisites

Make sure you have the following dependencies installed before running the pipeline:

Nextflow
conda
dbSNP database
dbNSFP database

Pipeline Overview

FilterInputFiles: Filters input VCF files using PLINK 2 to retain PASS variants with a maximum of 2 alleles.
AnnotateWithRSID: Annotates variants with RSID using SnpSift and the dbSNP database.
AnnotateWithImpact: Annotates variants with functional impact using snpEff and a specified reference genome.
FullyAnnotateWithDbSNP: Performs comprehensive annotation using SnpSift and dbNSFP database, including information on gene impact, gnomAD data, REVEL scores, ClinVar information, and more.
ExtractFields: Extracts relevant fields from the annotated VCF files and creates a tab-separated text file with a header for downstream analysis.

Usage

Clone the repository:

git clone https://github.com/IARCbioinfo/snpeff_annotation-nf
cd snpeff_annotation-nf

Adjust the nextflow.config file if necessary. The package versions are specified in environment.yml file.
Run the pipeline with:
```
nextflow run main.nf -profile conda
```

Input

Name	Default value	Description
`--input_folder_with_VCF_files`	`${baseDir}/VCFs/`	Folder containing `*vcf.gz` files

Parameters

Optional

Name	Default value	Description
`--reference_genome`	`GRCh37.75`	Reference genome
`--dbNSF_path`	`${baseDir}/dbNSFP4.1a.txt.gz`	dbNSFP database
`--dbSNP_path`	`${baseDir}/dbsnp150.vcf.gz`	dbSNP database
`--output_path`	`${baseDir}/output`	Output folder

Output

The final annotated and extracted information will be available in the output directory as full_annotation.txt.

Customization

Adjust the memory requirements etc in the nextflow.config file.
Customize the annotation processes in the main.nf script based on your specific requirements.

Acknowledgments

This pipeline utilizes various bioinformatics tools and databases, including PLINK, bcftools, SnpSift, snpEff, dbNSFP, and dbSNP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

Prerequisites

Pipeline Overview

Usage

Input

Parameters

Optional

Output

Customization

Acknowledgments

Files

README.md

Latest commit

History

README.md

File metadata and controls

snpeff_annotation-nf

Nextflow DSL2 pipeline to annotate VCF files with SnpEff and dbSnp

Prerequisites

Pipeline Overview

Usage

Input

Parameters

Optional

Output

Customization

Acknowledgments