pgs_variants_coords

Pipeline written to fetch and store the variants locations on different assembly, in order to be quickly retrieve in the pgs-harmonizer.

Requirements

Install Nextflow
Install SQLite
Install VCFtools
Install Python 3.9 or more recent and then run pip install requirements.txt
Download variants VCF files, e.g. https://ftp.ensembl.org/pub/current_variation/vcf/homo_sapiens/homo_sapiens-chr##.vcf.gz (and their indexes - .vcf.gz.csi)

If you start your SQLite knowledge base (KB) from scratch, please run the following command, e.g.:

sqlite3 variant_locations_37.db < knowledge_base_schema.sql
sqlite3 variant_locations_38.db < knowledge_base_schema.sql

Of course, you can name and place the SQLite knowledge base wherever you prefer. As the SQLite knowledge base can become quite big, we found that it is more convinient to have a distinct KB per genome build.

Running pipeline

Nextflow pipeline configuration (nextflow.config)

root_dir = <path to the working directory (other than Nextflow's)>

params {
    pgs_num_from = <PGS ID start number, e.g. '20' for PGS000020>
    pgs_num_to = <PGS ID end number, e.g. '140' for PGS000140>
    genomebuild = <target genomebuild, e.g. '38'>
    pgs_ids_file = <name of the locally generated PGS IDs list file, e.g. "pgs_ids.txt">
    rest_api_url = <URL to the REST API server>
    vars_list_file_path = <path to the variants list file name, e.g. "$root_dir/vars_list.txt">
    var_files_path = <wildcard file name to store the variant coordinates on each VCF file (i.e. chromosome), e.g. "$root_dir/vars_files/vars_chr*.txt">
    merged_var_file_path = <file to merge the content of all the var_files_path, e.g. "$root_dir/vars_files/merged_var_file.txt">
    no_coord_var_file_path = <path to the file listing the variants missing in the VCF files that will be searched via the Ensembl REST API, e.g. "$root_dir/vars_files/no_coord_var_file.txt">
    missing_var_file_path = <path to the file listing the variants missing in the VCF files and the Ensembl REST API, e.g. "$root_dir/vars_files/vars_with_missing_info.txt">
    var_file_path_ensembl = <path to the file containing all the variant data fetched via the Ensembl REST API, e.g. "$root_dir/vars_list_ensembl.txt">
    sqlite_file_path = <path to the SQLite database>
    loc_pipeline = <path to the 'scoring_files_pipeline' directory>
    loc_score_dir = <path to the directory where the scoring files are stored, e.g. '${params.work_dir}/scorefiles/'>
    loc_vcftools = <path to the vcftools bin file, e.g. "..../bin/vcftools">
    loc_vcfs = <path to the 'Ensembl VCF' directory>
    loc_filtered_vcfs = <path to the directory to store the filtered VCF files>
    loc_variants_files = <path to the directory to store the var_files, e.g. "$root_dir/vars_files">
    vcf_file_prefix = <file name prefix used on each VCF files, e.g. "homo_sapiens-chr">'
}

If it runs on LSF, here is an example about what to add to nextflow.config:

process {
    queue = 'short'
    executor = 'lsf'
    withLabel: medium_mem {
        memory = '2 GB'
    }
    withLabel: retry_increasing_mem {
        errorStrategy = 'retry'
        memory = {1.GB * task.attempt}
        maxRetries = { task.exitStatus in [130,140] ? 6 : 2 }
    }
}

Run Nextflow pipeline

nextflow pipeline_dsl2.nf -c nextflow_37.config

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
nf_modules		nf_modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
knowledge_base_schema.sql		knowledge_base_schema.sql
nextflow.config_lsf_template		nextflow.config_lsf_template
pipeline_dsl2.nf		pipeline_dsl2.nf
pipeline_list_dsl2.nf		pipeline_list_dsl2.nf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pgs_variants_coords

Requirements

Running pipeline

Nextflow pipeline configuration (nextflow.config)

Run Nextflow pipeline

About

Releases

Packages

Languages

License

PGScatalog/pgs_variants_coords

Folders and files

Latest commit

History

Repository files navigation

pgs_variants_coords

Requirements

Running pipeline

Nextflow pipeline configuration (nextflow.config)

Run Nextflow pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages