Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta positon score interpretation for deletions. #150

Open
drjraclarke opened this issue Aug 23, 2024 · 0 comments
Open

Delta positon score interpretation for deletions. #150

drjraclarke opened this issue Aug 23, 2024 · 0 comments

Comments

@drjraclarke
Copy link

drjraclarke commented Aug 23, 2024

Hi. How should we interpret the delta position scores for a deletion?

eg. for a 6nt deletion : Variant 17:81668137 GTAGCGA > G with output 0.16|0.03|0.01|0.01|7|1|706|1891

I understand for encoding deletions in the input vcf we need POS to be the genomic coordinate of the nucleotide that remains after the deletion (in this case G). So the deletion technically starts at genomic position 81668137+1=81668138 (POS +1 for positive strand).

Should we +/- the delta position score from POS or the actual deletion start position?

Either way, this raises an issue, evidenced by this example and many others I see genome wide. A delta position score of 1 means the predicted location of splicing changes is 81668137 +1 (or 81668138 + 1 pending above query being solved) which = 81668138. This genomic position, if using the wild type sequence as reference, is actually deleted in a 6nt deletion. I do not understand how any delta position score for a deletion of size 6 could be <6? This gene is on the positive strand, so I understand how -1 would be possible, though this happens on genes on both strands.

Should we be using the variant sequence as the reference or the WT? If the variant, are the genomic positions essentially "shuffled" by 6 nucleotides in this example, meaning +1 delta position score points to predicted location of splicing changes as the first nucelotide after the deletion? That was all I could think of to explain this, and your examples only include substitutions. Again, I see this many times, for position scores corresponding to all four categories and for deletions of varying size.

This is important in my case specifically as i am trying to extract the sequence for the predicted location of splicing changes. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant