Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How motifs are defined in the structures of pathogenic STRs ? #45

Open
mletexier-cnrgh opened this issue Oct 23, 2024 · 7 comments
Open

Comments

@mletexier-cnrgh
Copy link

Dear team,
Is there an error in the motifs included in the STR pathogen bed file?
For example, for the DAB1 gene, you have indicated the following motifs:
image

However, in several publications, the pathogenic motif is ATTTC. Here, with the structure defined in the bed file, this motif is never looked at, or do I not understand how TRGT works?
image

Please enlighten me on this subject, as many other genes do not have the pathogenic motif in their structure.

Thank you very much in advance for your interest in my message.
The best
Mélanie :)

@pbsena
Copy link
Contributor

pbsena commented Oct 23, 2024

Hi Mélanie,

In the BED file the motifs are usually defined by the motif units in the hg38 assembly strand. Is the DAB1 locus by any chance transcribed in the negative strand relative to hg38? Becuase in this case the ATTTC locus would be the GAAAT unit shown above, with AAAAT possibly flanking the expansion.

Best,
Guilherme

@egor-dolzhenko
Copy link
Collaborator

Hi Mélanie. Just to add to Guilherme's reply, STRchive (from @hdashnow and the team) is a great resource for definitions of known pathogenic repeats. For example, here is the entry for DAB1: https://strchive.org/database/DAB1.html that specifies motifs in both reference and gene orientations.

Best wishes,
Egor

@mletexier-cnrgh
Copy link
Author

Ah, but of course, that seems obvious now.
Thank you both for your reply and for sharing the database, it's actually a lot clearer.

Mélanie

@mletexier-cnrgh
Copy link
Author

By chance, do you have a tool that can tell whether a STR is pathogenic or not according to the given thresholds?
Mélanie

@dnil
Copy link

dnil commented Oct 23, 2024

You might try STRanger
https://github.com/Clinical-Genomics/stranger

That is what we and some others do. It does benefit (and add extra info) from its own set of extra fields in those repeat definition files, that you can find over there-ish if you like them.

We love Egor and his tools, but I don’t know if we morally speaking should encourage TRGT, or allow Stranger for in the long run though, as they have this weird partly-closed license excluding use with other chemistries. It adds a bit of complexity to pipelines etc and a bit of a bad taste. ExpansionHunter was cleaner that way. 😔 I hope it changes soon!

@mletexier-cnrgh
Copy link
Author

Thank you @dnil,
This tool works very well. I'm trying to break down the results to avoid any misunderstandings.

I have the impression that the repeats bed file is the key to obtaining good results. Depending on the version of the pathogenic_repeats.hg38.TRGT.bed file used, there is not the same definition of patterns, and I think that this can lead to false negatives.
pathogenic_repeats.hg38.TRGT.bed:
chr16 66490398 66490453 ID=BEAN1;REASONS=TAAAA;STRUC=(TAAAA)n
chr16 66490398 66490467 ID=BEAN1;PATTERNS=TGGAA,TAAAA;STRUC=

And I was happy to have found an STR expansion in my index case, but the TAAAA pattern is not the one that is pathogenic in the literature, but (TGGAA)*TAAAA.

Mélanie

@dnil
Copy link

dnil commented Oct 24, 2024

Thank you for the feedback - I'll move this comment as an issue on the STRanger repo instead!
But, quite right, STRanger only deals with the order of the motifs, not their content. This is a particular issue for the non-reference expansions. Compare also in particular RFC1. Most downstream users import the results and images into some graphical environment for evailuation anyway. This is something we have planned adding, any year now! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants