Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritizing non-coding variants #501

Draft
wants to merge 105 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
61f29bd
Added custom VEP annotation for GREEN-DB constraint value.
JonathanKlimp Oct 10, 2022
098c7bf
Added the .bed.gz files containing the GREEN-DB constraint values
JonathanKlimp Oct 10, 2022
d689eef
Changed Force report coordinates from 1 to 0
JonathanKlimp Oct 10, 2022
ae5fa42
Fixed the quotes (") for the added custom GREEN-DB constraint command
JonathanKlimp Oct 10, 2022
2baf60d
added ../ to the input path for the GREEN-DB constraint command
JonathanKlimp Oct 10, 2022
6227a07
Renamed and added the GREEN-DB constraint bed files for both build 38…
JonathanKlimp Oct 10, 2022
797d7f3
Added path variable for the constraint .bed file to the pipeline
JonathanKlimp Oct 10, 2022
5ce7d4e
Revert "Renamed and added the GREEN-DB constraint bed files for both …
JonathanKlimp Oct 10, 2022
3e9e28c
update on the path variable to the green db constraint value file
JonathanKlimp Oct 10, 2022
3a31d2c
renamed variable with GREEN-DB to GREEN_DB
JonathanKlimp Oct 10, 2022
4dfa763
added missing bracket
JonathanKlimp Oct 10, 2022
8b94611
Changed the custom argument for GREEN-DB constraint annotation from e…
JonathanKlimp Oct 11, 2022
c47997f
Update on the path variable for GREEN_DB constraint value
JonathanKlimp Oct 11, 2022
85283e3
Added annotation arguments for region annotation of UCNE, DNase and TFBS
JonathanKlimp Oct 11, 2022
3412ad7
Added annotation arguments for region annotation of UCNE, DNase and TFBS
JonathanKlimp Oct 11, 2022
9eee6f1
Update: changed short name from constraint to region for the region a…
JonathanKlimp Oct 12, 2022
0831672
added command variable for fathm_MKL scores
JonathanKlimp Oct 12, 2022
658deb1
added missing qoute
JonathanKlimp Oct 12, 2022
372c34c
added fathmm variable for GRCh37
JonathanKlimp Oct 12, 2022
b556255
update on the fathmm_mkl variable
JonathanKlimp Oct 12, 2022
e55960a
Edit on Fathmm_MKL custom argument variables
JonathanKlimp Oct 26, 2022
8b9838f
Update on nextflow.config. Changed fathmmMKL variable from .._fathmm_…
JonathanKlimp Oct 27, 2022
f87e4fa
added variables for ncER and ReMM tool scores
JonathanKlimp Oct 28, 2022
dd95616
commented the fathmmMKL variable for testing
JonathanKlimp Nov 1, 2022
11e949c
commented ReMM for testing
JonathanKlimp Nov 1, 2022
5fea08c
uncommented ReMM variable
JonathanKlimp Nov 1, 2022
2a2bb07
Uncommented fathmMKL for testing
JonathanKlimp Nov 2, 2022
fcfd680
added variable for HPO phenotype annotation
JonathanKlimp Nov 10, 2022
952e9d7
update on the file name of fathmm scores
JonathanKlimp Nov 11, 2022
b3c8cab
commented ReMM variable for testing
JonathanKlimp Nov 11, 2022
0daf71a
Update on the region_phenos variable. Changed the file name to the bu…
JonathanKlimp Nov 11, 2022
fdbec8c
fixed type in resources for pheno_region variable
JonathanKlimp Nov 11, 2022
4f72358
uncommented ReMM variable
JonathanKlimp Nov 11, 2022
fda3583
commented ReMM score variable
JonathanKlimp Nov 14, 2022
2658ccf
uncommented ReMM and added an "_" to the ReMM variable in annotate.nf
JonathanKlimp Nov 14, 2022
6f24e4b
changed fathmm VEP variable from overlap to exact
JonathanKlimp Nov 14, 2022
973ecdf
Added extra fathmm for the fathmm variable to let VEP know which valu…
JonathanKlimp Nov 16, 2022
0930c73
Added base for VIPVaranLevel plugin
JonathanKlimp Nov 17, 2022
f2ef945
added print statement for $transcript_variantion_allele
JonathanKlimp Nov 17, 2022
c018ba0
added args for VIPVaranLevel plugin
JonathanKlimp Nov 17, 2022
a5c0ef3
added basic logic for the level calculation
JonathanKlimp Nov 17, 2022
7f15e84
Moved the logic into functions (subs)
JonathanKlimp Nov 17, 2022
6cc976c
added test values for the region and scores
JonathanKlimp Nov 17, 2022
5e422f6
fixed variable types
JonathanKlimp Nov 17, 2022
45cebea
fixed variable typo
JonathanKlimp Nov 17, 2022
da0ab25
added missing "$" for variables
JonathanKlimp Nov 17, 2022
fdefe1b
changed % to $ fow when the hash is used to gather variable
JonathanKlimp Nov 17, 2022
206c760
hashed VIPVaranLevel plugin for testing
JonathanKlimp Nov 18, 2022
a0e4f0f
Added VEP plugin fathmm_MKL from the ensembl vep plugins github
JonathanKlimp Nov 18, 2022
87ba1b2
added variables with data location and VEP argument for the added FAT…
JonathanKlimp Nov 18, 2022
13b39bc
added code to print to a log file.
JonathanKlimp Nov 18, 2022
2e85e37
added "die" satement if the file can't be openend
JonathanKlimp Nov 18, 2022
b380ef9
added "my" to file declaration
JonathanKlimp Nov 18, 2022
8c70e11
removed fathmm custom variable for testing
JonathanKlimp Nov 18, 2022
487fe39
changed the print statements
JonathanKlimp Nov 18, 2022
d157f18
hashed fathmm for testing
JonathanKlimp Nov 18, 2022
03ade6c
added fathmm custom command back
JonathanKlimp Nov 21, 2022
e9ead37
added new print statements to log file
JonathanKlimp Nov 21, 2022
b7829e7
removed FATHMM VEP plugin
JonathanKlimp Nov 22, 2022
91fffbc
removed the catch when opening the log file
JonathanKlimp Nov 22, 2022
f7761d2
removed print statement
JonathanKlimp Nov 22, 2022
b59db94
removed the FATHMM plugin variables
JonathanKlimp Nov 22, 2022
2a9695b
added test print statement outside of the sub "run"
JonathanKlimp Nov 22, 2022
a1ce40a
removed print statements which gave errors
JonathanKlimp Nov 22, 2022
846ae2a
added ncER variables
JonathanKlimp Nov 22, 2022
10dfba7
removed the ncER variables and added new $self print statement
JonathanKlimp Nov 22, 2022
388ecfc
added line hash variable and print statement
JonathanKlimp Nov 22, 2022
68e584b
changed line_hash print and added new get_data from $self
JonathanKlimp Nov 22, 2022
0391bb1
removed faulty print statement
JonathanKlimp Nov 22, 2022
b88ebd1
added new tests statements
JonathanKlimp Nov 22, 2022
e438f53
added ncER tests statements
JonathanKlimp Nov 22, 2022
36cca5e
added test statemetns for ncER
JonathanKlimp Nov 22, 2022
e73ea55
removed ncER statement that raised an error
JonathanKlimp Nov 22, 2022
f41baae
removed print statement which printed non existing variable
JonathanKlimp Nov 22, 2022
96bc0de
removed statement that raised an error
JonathanKlimp Nov 22, 2022
8396c36
edited logging to file
JonathanKlimp Dec 13, 2022
62d3b31
fixed syntax error
JonathanKlimp Dec 14, 2022
f136380
fixed log file location
JonathanKlimp Dec 14, 2022
e8d22bb
added new print statment of self
JonathanKlimp Dec 14, 2022
ab27467
added data dumper print
JonathanKlimp Dec 14, 2022
4905306
update on the print to log file
JonathanKlimp Dec 14, 2022
cfaddc3
removed print statements
JonathanKlimp Dec 14, 2022
b2a0dab
fixed syntax error
JonathanKlimp Dec 14, 2022
919cb1f
added new print statement for $self
JonathanKlimp Dec 14, 2022
08c39ab
added new print statements for the contents of $self
JonathanKlimp Dec 14, 2022
160b1d4
removed print statement
JonathanKlimp Dec 14, 2022
3ae59bb
added -> to to hash call fix error
JonathanKlimp Dec 14, 2022
a9d3403
removed syntax error
JonathanKlimp Dec 14, 2022
1a56f50
fixed correct printing of arrays inside the $self hash
JonathanKlimp Dec 14, 2022
a27323f
fixed syntax error
JonathanKlimp Dec 14, 2022
60a85ae
changed $ to @ for the arrays
JonathanKlimp Dec 14, 2022
21f3dbe
fixed error accesing the array in the for each loop
JonathanKlimp Dec 14, 2022
2a643c9
added extra loop for array printing
JonathanKlimp Dec 14, 2022
b3ff673
added print statement for the custom array in cofig
JonathanKlimp Dec 14, 2022
b4b90d1
added pick_order to the prints
JonathanKlimp Dec 14, 2022
d9b2a62
commented all custom/plugins except from VIPVaranLevel
JonathanKlimp Dec 20, 2022
ae87191
uncommented printing of $self for testing
JonathanKlimp Dec 20, 2022
81f7a65
removed VIPVaranlevel plugin from annotate.sh
JonathanKlimp Dec 20, 2022
64eca9c
changed ReMM from overlap to exact
JonathanKlimp Jan 25, 2023
d41d484
Merge branch 'main' into feat/non-coding
JonathanKlimp Jan 26, 2023
b265a7c
changend ReMM back from exact to overlap
JonathanKlimp Jan 26, 2023
383471b
moved ReMM one line up for testing
JonathanKlimp Jan 26, 2023
87c1830
moved ReMM back
JonathanKlimp Jan 26, 2023
ce07cf3
Update README.md
JonathanKlimp Feb 3, 2023
21ef3ab
Added pre-processing steps for each annotation resource
JonathanKlimp Feb 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,117 @@
# Variant Interpretation Pipeline

This is VIP with added non-coding functionality based on the [GREEN-DB](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8934622/) method.
It has extra annotations resources which includes:
- TFBS regions (The TF that can bind the region)
- UCNE regions
- DNase regions
- FATHMM-MKL scores
- ReMM scores
- ncER scores
- GREEN-DB constraint values
- HPO phenotypes

This branch has the annotations that are expected input for the [vip-decision-tree](https://github.com/molgenis/vip-decision-tree/tree/feat/annotation) score annotation branch. This score annotation tool calculates and annotates a variants score based on the added annotations.

These added annotations require resources that are not downloaded by the install.sh and need to be downloaded manually.
They can be downloaden from the [GREEN-DB download website](https://green-varan.readthedocs.io/en/latest/Download.html). The downloaded resources require some preprocessing follow the following steps for each annotation resource:

[BCFtools](https://samtools.github.io/bcftools/) is required te perform the pre processing steps

## TFBS/ncER


File: {Build}_ncER_perc.bed.gz
File: {Build}_TFBS_merged.bed.gz

Unpack the downloaded gz files and follow these steps:

```
grep -v "#" file.bed | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > file.bed.gz
tabix -p bed file.bed.gz
```

## UCNE/DNase

File: {Build}_UCNE.bed.gz
File: {Build}_DNase_merged.bed.gz

unpack the downloaded gz files and follow these steps:

Perform this extra step for the **DNase** and **UCNE** file to add a fourth column.
Unpack the file and perform this step:

```
awk 'BEGIN{ FS = OFS = "\t" } { print $1,$2,$3, (NR==1? "id" : "UCNE") }' {Build}_UCNE.bed > tmp && mv tmp GRCh38_UCNE.bed

awk 'BEGIN{ FS = OFS = "\t" } { print $1,$2,$3, (NR==1? "id" : "DNase") }' {Build}_DNase.merged.bed > tmp && mv tmp GRCh37_DNase.bed
```

And then for both files:
```
grep -v "#" file.bed | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > file.bed.gz
tabix -p bed file.bed.gz
```

## GREEN-DB constraint value

File: {build}_GREEN-DB.bed.gz

Unpack the downloaded gz files and follow these steps:
Or use this [script](https://github.com/JonathanKlimp/Graduation-scripts/blob/main/data_conversion_scripts/process_bed_constraint_VEP.sh)

First step is to remove all unused columns so that only the constraint value remains.
Second step removes all values smaller than 0.7 and NA values.
```
cut -f 1,2,3,9 {Build}_GREEN-DB.bed > tmp.txt

awk '$4 > 0.7 && $4 != "NA"' tmp.txt > {Build}_GREEN-DB-constraint.bed

grep -v "#" file.bed | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > file.bed.gz
tabix -p bed file.bed.gz
```

## ReMM

File: {Build}_ReMM.tsv.gz

Perform these steps:

```
gunzip {Build}_ReMM.tsv.gz

awk 'BEGIN{ FS = OFS = "\t" } { print $1,$2,$2,$3 }' {Build}_ReMM.tsv > {Build}_ReMM_new_column.bed
#grep -v "#" {Build}_ReMM_new_column.bed | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > {Build}_ReMM.bed.gz
tabix -p bed {Build}_ReMM.bed.gz

```

## FATHMM-MKL

File: {Build}_FATHMM_MKL.tsv.gz

Use this script to convert the tsv file to vcf format: [conversion script](https://github.com/JonathanKlimp/Graduation-scripts/blob/main/data_conversion_scripts/convert_score_tsv_to_vcf.sh)

## Phenotyepes

File: GREEN-DB_v2.5.db.gz

Extract the HPO phenotypes table and the table with the regions from the DB file using [SQLite](https://sqlite.org/index.html) or another preferred DB browser.

Once extracted perform the following:
This command checks if an ID is matched in both files. If this is the case the chrom, pos, stop, phenotype and phenotypeID are printed to a new file.

```
awk -v FS='\t' -v OFS='\t' 'FNR==NR{a[$4]=$1 FS $2 FS $3;next} ($1 in a) {print a[$1],$3,$4}' GREEN-DB_{Build}_regions.csv GREEN-DB_phenotypes.csv > GREEN_DB_{Build}_region_phenos.bed
```

Then perform the following command:

```
grep -v "#" GREEN_DB_{Build}_region_phenos.bed | sort -k1,1 -k2,2n -k3,3n -t$'\t' | bgzip -c > GREEN_DB_{Build}_region_phenos.bed.gz
tabix -p bed GREEN_DB_{Build}_region_phenos.bed
```

## Requirements
- POSIX compatible system (Linux, OS X, etc) / Windows through [WSL](https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux)
- Bash 3.2 (or later)
Expand Down
8 changes: 8 additions & 0 deletions modules/annotate.nf
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@ process annotate {
capiceInputPath = "${id}_chunk${order}_capice_input.tsv"
capiceOutputPath = "${id}_chunk${order}_capice_output.tsv.gz"
capiceModelPath = params[params.assembly + "_annotate_capice_model"]
greenDbConstraintPath = params[params.assembly + "_constraint_GREEN_DB"]
dnaseRegionsPath = params[params.assembly + "_DNase_regions"]
tfbsRegionsPath = params[params.assembly + "_TFBS_regions"]
ucneRegionsPath = params[params.assembly + "_UCNE_regions"]
fathmmMKLScoresPath = params[params.assembly + "_fathmm_MKL_scores"]
ncErScoresPath = params[params.assembly + "_ncER_scores"]
reMMScoresPath = params[params.assembly + "_ReMM_scores"]
regionPhenosPath = params[params.assembly + "_region_phenotypes"]

template 'annotate.sh'
}
Expand Down
16 changes: 16 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,14 @@ params {
GRCh37_annotate_vep_plugin_vkgl = "${projectDir}/resources/GRCh37/vkgl_consensus_20211201.tsv"
GRCh37_annotate_capice_model = "${projectDir}/resources/GRCh37/capice_model_v4.0.0-v2.pickle.dat"
GRCh37_report_genes = "${projectDir}/resources/GRCh37/GCF_000001405.25_GRCh37.p13_genomic_g1k.gff.gz"
GRCh37_constraint_GREEN_DB = "${projectDir}/resources/GRCh37/GRCh37_GREEN_DB_constraint.bed.gz"
GRCh37_DNase_regions = "${projectDir}/resources/GRCh37/GRCh37_DNase.bed.gz"
GRCh37_TFBS_regions = "${projectDir}/resources/GRCh37/GRCh37_TFBS.bed.gz"
GRCh37_UCNE_regions = "${projectDir}/resources/GRCh37/GRCh37_UCNE.bed.gz"
GRCh37_fathmm_MKL_scores = "${projectDir}/resources/GRCh37/GRCh37_FATHMM-MKL_NC.vcf.gz"
GRCh37_ncER_scores = "${projectDir}/resources/GRCh37/GRCh37_ncER_perc.bed.gz"
GRCh37_ReMM_scores = "${projectDir}/resources/GRCh37/GRCh37_ReMM.bed.gz"
GRCh37_region_phenotypes = "${projectDir}/resources/GRCh37/GREEN_DB_GRCh37_region_phenos.bed.gz"

GRCh38_reference = "${projectDir}/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz"
GRCh38_annotate_vep_custom_gnomad = "${projectDir}/resources/GRCh38/gnomad.genomes.v3.1.2.sites.stripped.vcf.gz"
Expand All @@ -54,6 +62,14 @@ params {
GRCh38_annotate_vep_plugin_vkgl = "${projectDir}/resources/GRCh38/vkgl_consensus_20211201.tsv"
GRCh38_annotate_capice_model = "${projectDir}/resources/GRCh38/capice_model_v4.0.0-v2.pickle.dat"
GRCh38_report_genes = "${projectDir}/resources/GRCh38/GCF_000001405.39_GRCh38.p13_genomic_mapped.gff.gz"
GRCh38_constraint_GREEN_DB = "${projectDir}/resources/GRCh38/GRCh38_GREEN_DB_constraint.bed.gz"
GRCh38_DNase_regions = "${projectDir}/resources/GRCh38/GRCh38_DNase.bed.gz"
GRCh38_TFBS_regions = "${projectDir}/resources/GRCh38/GRCh38_TFBS.bed.gz"
GRCh38_UCNE_regions = "${projectDir}/resources/GRCh38/GRCh38_UCNE.bed.gz"
GRCh38_fathmm_MKL_scores = "${projectDir}/resources/GRCh38/GRCh38_FATHMM-MKL_NC.vcf.gz"
GRCh38_ncER_scores = "${projectDir}/resources/GRCh38/GRCh38_ncER_perc.bed.gz"
GRCh38_ReMM_scores = "${projectDir}/resources/GRCh38/GRCh38_ReMM.bed.gz"
GRCh38_region_phenotypes = "${projectDir}/resources/GRCh38/GREEN_DB_GRCh38_region_phenos.bed.gz"

singularity_image_dir = "${projectDir}/images"
chunk_size = 10000
Expand Down
196 changes: 196 additions & 0 deletions resources/vep/plugins/VIPVaranLevel.pm
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
package VIPVaranLevel;

use strict;
use warnings;

use Bio::EnsEMBL::Variation::Utils::BaseVepPlugin;
use base qw(Bio::EnsEMBL::Variation::Utils::BaseVepPlugin);

=head1 NAME
VIPVaranLevel
=head1 SYNOPSIS
mv VIPVaranLevel.pm ~/.vep/Plugins
=head1 DESCRIPTION
Plugin to annotate VIPVaran level per variant based on constraint level, fathmm, ncER, ReMM scores and region.
=cut

sub version {
return '0.1';
}

# Feature includes regulatory features
sub feature_types {
return [ 'Transcript', 'Intergenic', 'Feature'];
}

sub get_header_info {
return {
VIPVaranLevel => "Score between 1 and 3 based on constraint level, fathmm, ncER, ReMM scores and region"
};
}

my %min_scores = (
fathmm => 0.5,
ncER => 0.5,
ReMM => 0.499,
constraint => 0.7
);

sub is_in_region {
my $region_values = $_[0];

## if region contains value
unless($region_values eq "") {
return 1;
} else {
return 0;
}
}

sub tool_min_score {
my $fathmm_score = $_[0];
my $ncER_score = $_[1];
my $ReMM_score = $_[2];

# add logic for when there are multiple scores for the same variant from the same tool.
# example: 99.7852&99.7217 (can be more than 2) ReMM: 0.1710&0.9490&0.9560 (low and high score what to do?)
if($fathmm_score >= $min_scores{"fathmm"} || $ncER_score >= $min_scores{"ncER"} || $ReMM_score >= $min_scores{"ReMM"}) {
return 1;
} else {
return 0;
}
}

sub constraint_min_score {
my $constraint_score = $_[0];

if ($constraint_score >= $min_scores{"constraint"}) {
return 1;
} else {
return 0;
}
}

sub run {
my ($self, $transcript_variation_allele, $line_hash) = @_;

my $base_variation_feature = $transcript_variation_allele->base_variation_feature;
my @vcf_line = @{$base_variation_feature->{_line}};

#my $data = @{$self->get_data}; get_data is not a thing

# my $test3 = $self->ncER; werkt ook niet met $line_hash
#my @test_data = @{$self->green_db_tool_scores};
# code to write to file
my $filename = '/groups/solve-rd/tmp10/projects/vip/feat/non-coding/test/VIPVaranLevel.log';
open(my $file, '>>', $filename) or die $!;
#print($file "HIER onder is vcf line");
# foreach (@vcf_line) {
# print($file "$_\n");
# }
print($file "params\n");
my @params_array = $self->{"params"};
foreach (@params_array) {
print($file "$_\n");
foreach (@$_) {
print($file "$_\n");
}
}
print($file "variant feature types\n");
my @vft_array = $self->{"variant_feature_types"};
foreach (@vft_array) {
print($file "$_\n");
foreach (@$_) {
print($file "$_\n");
}
}
print($file "feature types\n");
my @f_types_array = $self->{"feature_types"};
foreach (@f_types_array) {
print($file "$_\n");
foreach (@$_) {
print($file "$_\n");
}
}
print($file "variant feature types wanted\n");
foreach my $variant_feature_wanted ($self->{"variant_feature_types_wanted"}) {
foreach my $key (keys %$variant_feature_wanted) {
print($file "in keys loop variant feature types wanted\n");
print($file $key);
print($file "=>");
print($file $variant_feature_wanted->{$key});
print($file "\n")
}
}
print($file "feature types wanted\n");
foreach my $feature_wanted ($self->{"feature_types_wanted"}) {
foreach my $key (keys %$feature_wanted) {
print($file "in keys loop feature types wanted\n");
print($file $key);
print($file "=>");
print($file $feature_wanted->{$key});
print($file "\n")
}
}
print($file "config\n");
foreach my $config ($self->{"config"}) {
my @config_custom_array = $self->{"plugin"};
foreach (@config_custom_array) {
print($file "$_\n");
}
my @pick_order = $self->{"pick_order"};
foreach (@pick_order) {
print($file "$_\n");
}
foreach my $key (keys %$config) {
print($file "in keys loop config\n");
print($file $key);
print($file "=>");
print($file $config->{$key});
print($file "\n")
}
}
#print($file @vcf_line); # bevat chrom pos ref alt, 0 en 3x "."
# print($file $transcript_variation_allele); # is een hash
close($file);
# score is 0 by default
my $score = 0;

# $region = something...
# $fathmm = something..
# $ReMM = something...
# $ncER = something...
# $constraint = something...
my $region = 0;
my $fathmm_score = 0;
my $ncER_score = 0;
my $ReMM_score = 0;
my $constraint_score = 0;

# Higher level = higher chance of pathogenicity
# If variant lays in in TFBS,DNase or UCNE = level 1
# If variant has score of FATHMM above 0.5, ncER above 0.5 or ReMM above 0.499 and above = level 2
# If variant has constraint value above 0.7 and above = level 3
# Something with phenotype for level 4?
# What if scores are sufficient but it is not in one of the regions?
# Something with UTRAnnotator?

if(is_in_region($region) && tool_min_score($fathmm_score, $ncER_score, $ReMM_score) && constraint_min_score($constraint_score)) {
$score = 3;
} elsif(is_in_region($region) && tool_min_score($fathmm_score, $ncER_score, $ReMM_score) && !constraint_min_score($constraint_score)) {
$score = 2;
} elsif(is_in_region($region) && !tool_min_score($fathmm_score, $ncER_score, $ReMM_score) && !constraint_min_score($constraint_score)) {
$score = 1;
} else {
$score = 0;
}

# return {
# #VIPVaranLevel => $results
# VIPVaranLevel => $score
# };
#return $score;
return {};
}

1;
9 changes: 9 additions & 0 deletions templates/annotate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,15 @@ vep() {
args+=("--plugin" "UTRannotator,!{vepPluginUtrAnnotatorPath}")
args+=("--custom" "!{vepCustomPhyloPPath},phyloP,bigwig,exact,0")

args+=("--custom" "!{greenDbConstraintPath},constraint,bed,overlap,0")
args+=("--custom" "!{dnaseRegionsPath},region,bed,overlap,0")
args+=("--custom" "!{tfbsRegionsPath},region,bed,overlap,0")
args+=("--custom" "!{ucneRegionsPath},region,bed,overlap,0")
args+=("--custom" "!{fathmmMKLScoresPath},fathmm,vcf,exact,0,fathmm")
args+=("--custom" "!{ncErScoresPath},ncER,bed,overlap,0")
args+=("--custom" "!{reMMScoresPath},ReMM,bed,overlap,0")
args+=("--custom" "!{regionPhenosPath},phenotype,bed,overlap,0")

if [ -n "!{vepPluginArtefact}" ]; then
args+=("--plugin" "Artefact,!{vepPluginArtefact}")
fi
Expand Down