Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building variant catalog from CRAM file error #199

Open
roel1289 opened this issue Nov 20, 2024 · 3 comments
Open

Building variant catalog from CRAM file error #199

roel1289 opened this issue Nov 20, 2024 · 3 comments

Comments

@roel1289
Copy link

Hello,

I am currently working on developing a variant catalog that contains every location of the (GGGAGA)* repeat across the whole human genome (a couple thousand locations).

To make this I am using CRAM files (aligned to hg38), and I am keeping track of the location of each GGGAGA repeat using string searches. I am then using these locations to produce a variant catalog. Here is an example of a shortened variant catalog I have made:

[
    {
        "LocusId": "SVA",
        "LocusStructure": "(GGGAGA)*",
        "ReferenceRegion": [
            "chr2:32916460-32916478",
            "chr12:48501530-48501548"
        ],
        
        "VariantType": "Repeat"
    }
]

Except I keep getting this error:
2024-11-20T13:41:25,[Error loading locus SVA: Locus SVA must specify reference regions for 1 variants]
How can I make a variant catalog like this without knowing the reference region from the reference genome?

Thanks!
Ross

@andreasssh
Copy link

Hi there,

You should create a new entry for each locus, e.g.:

[
    {
        "LocusId": "locus_id_for_this_region",
        "LocusStructure": "(GGGAGA)*",
        "ReferenceRegion": "chr2:32916460-32916478",
        "VariantType": "Repeat"
    },
    {
        "LocusId": "and_locus_id_for_this_region",
        "LocusStructure": "(GGGAGA)*",
        "ReferenceRegion": "chr12:48501530-48501548",
        "VariantType": "Repeat"
    }
]

@roel1289
Copy link
Author

Thank you for the help.
Is there any way to get around this error: [Error loading locus SVA: Flanks can contain at most 5 characters N but found 985 Ns]?

It seems that some people have made some people have made changes to get rid of this warning (https://github.com/bw2/ExpansionHunter).
What is the best route to fix this error?

Thanks!

@andreasssh
Copy link

Yep, the bw2 version and my version (https://gitlab.com/andreassh/ExpansionHunter) both can skip those loci with the error and continue analysis instead of terminating it like the original version does. It happens because the locus is close to a chromosome edge or in unsequenced parts of the reference genome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants