-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in Clustering Step of LTR Pipeline #241
Comments
Sorry for the delay. If you still have these files, could you check that "JANCLY010000001" is in fact a sequence in your input file:
and that it also occurs in the twobit file:
When I download the assembly from the link you provided (GCF_025583915.1_AnoSag2.1_genomic.fna.gz) I do not see a sequence named JANCLY010000001. Did you alter the assembly in any way, or get it from a different source? |
For that genome I ended up using an older version of RepeatModeler, a standalone singularity container version, but that one gave me a segmentation fault error on several assemblies (I believe related to this) so I'm still trouble-shooting the TE Tools version. Yes it is in the FASTA file (this is a different assembly but same exact error):
And the twobit file: |
Oh also, I forgot to mention that initially the error I got from the Anolis sagrei assembly was because I was using the GenBank version and then switched to the RefSeq version but forgot to remove the output directory before re-running RepeatModeler. But the "clustering" error I submitted I'm getting on several assemblies, and it doesn't seem to be related to size or content. |
Ok...I have tracked down the issue here. The problem is in LTR_retriever. It deals with long sequence identifiers (>13 characters) in a strange way. It attempts to truncate the identifiers (should that still create a unique set for the genome), and warn in the log output that it worked around the issue on its own. The problem with that approach is that RepeatModeler doesn't know how to translate those new shortened identifiers back to the full-length ones. I will have to add some code to RepeatModeler to fix this. In the meantime, the only way to get this to work is to make sure you only feed RepeatModeler genomes with sequence identifiers <= 13 characters long or leave out the LTR pipeline from the run. |
Hi, I encountered the same problem only when using TEtools 1.88.5. The container was established using Docker engine.
LTR pipeline failed to run in TEtools 1.88.5, but it worked fine in both 1.88 and 1.89.2. |
Describe the issue
Trying to run the LTR Pipeline alone to add to some libraries and then re-mask some genomes. So far, this is only happening with one genome. I use the -LTRStruc flag for RepeatModeler runs with no issues, not sure what the issue is here.
Reproduction steps
My exact commands are:
srun apptainer exec --bind=/projects:/projects /common/contrib/containers/tetools-v1.88.sif LTRPipeline ${species_name}.genome.fa -threads 40
I don't know how to reproduce this exactly, I ran it twice when it failed and got the same error message.
This is the genome that's giving an error: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_025583915.1/
Log output
Environment (please include as much of the following information as you can find out):
Using TETools apptainer on Slurm HPC
How did you install RepeatModeler? e.g. manual installation from repeatmasker.org, bioconda, the Dfam TE Tools container, or as part of another bioinformatics tool?
*TE Tools v1.88
Which version of RepeatModeler do you have? The output of
RepeatModeler
without any options will be a help page with the version of the program displayed at the top.Version 2.0.5
Which version of RepeatMasker is this RepeatModeler installation using? Have you installed RepBase RepeatMasker Edition for RepeatMasker, or the full Dfam database?
RepeatMasker 4.1.6
Operating system and version. The output of
q
andlsb_release -a
can be used to find this.Linux wind 4.18.0-513.9.1.el8_9.x86_64 Could not open .../round-2/families.stk for reading! #1 SMP Thu Nov 16 10:29:04 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: