Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stoi error when running with large variant catalog #179

Open
uguenke opened this issue May 24, 2023 · 2 comments
Open

stoi error when running with large variant catalog #179

uguenke opened this issue May 24, 2023 · 2 comments

Comments

@uguenke
Copy link

uguenke commented May 24, 2023

Hi,

I am trying to run EH on WGS cram files, in GRCh38, with a large variant catalog (more than 6000 loci).
When I run it, I have this message and the process just stops:

/path/to/ExpansionHunter --reads /path/to/SAMPLE.cram --reference /path/to/reference/hg38.fa --variant-catalog /path/to/variant_catalog/variant_catalog_cgg_hg38.json --output-prefix SAMPLE
2023-05-24T10:05:14,[Starting ExpansionHunter v5.0.0]
2023-05-24T10:05:14,[Analyzing sample SAMPLE]
2023-05-24T10:05:14,[Initializing reference /path/to/reference/hg38.fa]
2023-05-24T10:05:14,[Loading variant catalog from disk /path/to/variant_catalog/variant_catalog_cgg_hg38.json
2023-05-24T10:05:14,[stoi]

I tried to change the mode to streaming, add more threads, the error is the same.
When I split the catalog, I can run smaller catalogs (around 60 loci), but beyond I get the same message.

Is there something to do? I work on a shared computing server.

Thank you very much for your help

Kevin

@dwill023
Copy link

Yeah I'm getting the same error [stoi] but my json file only has 20 locusid. I've tried with subsetting the json to 6 and only 1 of the locusids and it works but when I try to run it on all 20 I get the error.

I've attached my json I've tried combing though it but can't see any issues.

Please advise
codis_strs.json

@dwill023
Copy link

I've managed to solve the issue, one Locusid (below) has the last ReferenceRegion incorrect, there's an extra digit "8" in the beginning I was able to run the entire json once I fixed this.

{
    "LocusId": "D2S1338",
    "LocusStructure": "(GGAA)*GGAC(GGAA)*(GGCA)*",
    "ReferenceRegion": [
        "2:218014858-218014866",
        "2:218014870-218014922",
        "2:2180148922-218014950"
    ],
    "VariantType": [
        "Repeat",
        "Repeat",
        "Repeat"
    ]  
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants