You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently it outputs a csv of all possible matches in the log file (along with which one was best), example:
row_nr
accession
chr_name
chr_position
effect_allele
other_allele
effect_weight
effect_type
ID
REF
ALT
matched_effect_allele
match_type
is_multiallelic
ambiguous
match_flipped
best_match
exclude
duplicate_best_match
duplicate_ID
match_IDs
match_status
dataset
0
PGS000018_hmPOS_GRCh37
1
2245570
G
C
-0.0276009
additive
1:2245570:C:G
C
G
G
altref
false
true
false
true
true
false
false
false
excluded
1000G-chr2
0
PGS000018_hmPOS_GRCh37
1
2245570
G
C
-0.0276009
additive
1:2245570:C:G
C
G
C
refalt_flip
false
true
true
false
true
false
false
false
not_best
1000G-chr2
1
PGS000018_hmPOS_GRCh37
1
22132518
G
A
0.023934
additive
1:22132518:G:A
G
A
G
refalt
false
false
false
true
true
false
false
false
excluded
1000G-chr2
2
PGS000018_hmPOS_GRCh37
1
38386727
G
A
-0.0174935
additive
1:38386727:G:A
G
A
G
refalt
false
false
false
true
true
false
false
false
excluded
1000G-chr2
3
PGS000018_hmPOS_GRCh37
1
55496039
T
C
0.0293005
additive
1:55496039:T:C
T
C
T
refalt
false
false
false
true
true
false
false
false
excluded
1000G-chr2
8
PGS000018_hmPOS_GRCh37
1
110298166
G
C
0.0245969
additive
1:110298166:G:C
G
C
G
refalt
false
true
false
true
true
false
false
false
excluded
1000G-chr2
8
PGS000018_hmPOS_GRCh37
1
110298166
G
C
0.0245969
additive
1:110298166:G:C
G
C
C
altref_flip
false
true
true
false
true
false
false
false
not_best
1000G-chr2
9
PGS000018_hmPOS_GRCh37
1
151762308
G
C
0.0209215
additive
1:151762308:C:G
C
G
G
altref
false
true
false
true
true
false
false
false
excluded
1000G-chr2
9
PGS000018_hmPOS_GRCh37
1
151762308
G
C
0.0209215
additive
1:151762308:C:G
C
G
C
refalt_flip
false
true
true
false
true
false
false
false
not_best
1000G-chr2
10
PGS000018_hmPOS_GRCh37
1
154395946
G
A
-0.0197906
additive
1:154395946:A:G
A
G
G
altref
false
false
false
true
true
false
false
false
excluded
1000G-chr2
28
PGS000018_hmPOS_GRCh37
2
164945044
G
C
0.0213456
additive
2:164945044:G:C
G
C
G
refalt
false
true
false
true
true
false
false
true
excluded
1000G-chr2
28
PGS000018_hmPOS_GRCh37
2
164945044
G
C
0.0213456
additive
2:164945044:G:C
G
C
C
altref_flip
false
true
true
false
true
false
false
true
not_best
1000G-chr2
29
PGS000018_hmPOS_GRCh37
2
202799924
C
T
-0.0226885
additive
2:202799924:T:C
T
C
C
altref
false
false
false
true
false
false
false
true
matched
1000G-chr2
30
PGS000018_hmPOS_GRCh37
2
203829225
A
C
-0.0526925
additive
2:203829225:A:C
A
C
A
refalt
false
false
false
true
false
false
false
true
matched
1000G-chr2
The rows of this file could then be processed into the output of the current HmVCF : e.g. a single row per scoring file variant with information about how it was matched or excluded (harmonisation code)
The text was updated successfully, but these errors were encountered:
I am wondering if we should/could separate the match_variants() call from the argparser so we could use the module directly (and set the parameters independently, e.g.:
I think that would work - I guess you could draft it as a PR? Although I think the output will have to be edited (the log_and_write) function so maybe you have to use the actual methods directly?
The current code is very slow - using match_variants would be much faster and more consistent with what we're doing in pgsc_calc.
Would use a variant of this script: https://github.com/PGScatalog/pgscatalog_utils/blob/main/pgscatalog_utils/match/match_variants.py
Currently it outputs a csv of all possible matches in the log file (along with which one was best), example:
The rows of this file could then be processed into the output of the current HmVCF : e.g. a single row per scoring file variant with information about how it was matched or excluded (harmonisation code)
The text was updated successfully, but these errors were encountered: