NGS_processing_pipeline

Overview:

This pipeline

Creates the required file structure
Demultiplexes sequences based on the presence of a primer
Does other things

Setup process:

Clone the git repository
etc

Dependecies:

Data from PrimerID workflow

1) Motifbinner
2) contam repo
3) pyhton3
4) scipy?
5) pandas
6) numpy
7) Biopython?  
if calling the pipeline using demultiplex.py,
    you will need to specify the config.json settings file and
    a primer description csv file
    Templates for these are located in the repo as template_demultiplex_config.json and
    template_master_primer_file.csv

To run:

python3 demultiplex.py --config_file <config.json> --output_dir <output directory>

Using the config file:

The configuration for the run is stored in a JSON file for reproducibility. An example file is provided which can be edited to suit your run.

The JSON file contains:

input_data

fwd_fastq_file - This is the fastq file containing the forward (R1) sequences.
rev_fastq_file - This is the fastq file containing the reverse (R2) sequences.
primer_csv - This file contains the primers and their associated gene regions, as well as information on whether the sequences are overlapping, and the length of sequence preceding the primer in the fastq file.
patient_list - This is a python list containing the patient / sample name.
out_folder - This is the full path to the output folder.

pipelineSettings
out_prefix: The prefix name of your outfile.
frame: The reading frame (1, 2 or 3).
stops: Remove sequences with stop codons?
min_read_length: The minimum read length.
run_step: The step at which to resume the analysis if it is interrupted.

haplotype_settings
infile: The path and name of the aligned fasta file.
field: The field that differentiates your samples/time points (use the last field if multiple. ie: 4 for 'CAP177_2000_004wpi_V3C4_GGGACTCTAGTG_28, or 2 for SVB008_SP_GGTAGTCTAGTG_231).
script_folder: The path to the folder containing the pipeline scripts.

Output:

The output of this pipeline is X in Y format

Name		Name	Last commit message	Last commit date
Latest commit History 391 Commits
local_blast_db/lanl_hiv_db		local_blast_db/lanl_hiv_db
.gitignore		.gitignore
HIV_var_cons_regions.csv		HIV_var_cons_regions.csv
HIV_var_region_coordinates.csv		HIV_var_region_coordinates.csv
README.md		README.md
align_ngs_codons.py		align_ngs_codons.py
align_ngs_codons_tests.py		align_ngs_codons_tests.py
call_motifbinner.py		call_motifbinner.py
contam_removal.py		contam_removal.py
demultiplex.py		demultiplex.py
gene_sub_regions_start_end.csv		gene_sub_regions_start_end.csv
haplotyper_freq.py		haplotyper_freq.py
ngs_stats_calculator.py		ngs_stats_calculator.py
reference_sequences.fasta		reference_sequences.fasta
remove_bad_sequences.py		remove_bad_sequences.py
split_fasta_into_subfiles.py		split_fasta_into_subfiles.py
step_1_create_folders.py		step_1_create_folders.py
step_2_ngs_processing_pipeline_master_call.py		step_2_ngs_processing_pipeline_master_call.py
step_3_make_haplotpes_from_alignment.py		step_3_make_haplotpes_from_alignment.py
template_demultiplex_config.json		template_demultiplex_config.json
template_master_primer_file.csv		template_master_primer_file.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NGS_processing_pipeline

Overview:

This pipeline

Setup process:

Dependecies:

To run:

Using the config file:

Output:

About

Releases

Packages

Contributors 3

Languages

ColinAnthony/NGS_processing_pipeline

Folders and files

Latest commit

History

Repository files navigation

NGS_processing_pipeline

Overview:

This pipeline

Setup process:

Dependecies:

To run:

Using the config file:

Output:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages