MAFFT failed while running RepeatModeler #256

sjd028 · 2024-07-15T14:49:47Z

Describe the bug
While running RepeatModeler, I am consistently getting error at the point where the de novo LTR sequences found by LtrRetriever are being aligned with MAFFT. The RepeatScout / Recon pipeline is working and those sequences are included in the final consensus sequences, but the LTR pipeline seemingly fails after LtrRetriever is complete and MAFFT does not run correctly, and therefore the LTR sequences are not included in the final consensus sequences. I have run RepeatModeler several times on the same data and received the same error message. I have attached a screenshot of the error message. Here is the full line with the error message:

/opt/mafft/bin/mafft: line 2718: 2108323 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"

To Reproduce
Genome I used: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000778455.1/

Making blast database:
singularity run $dfam BuildDatabase -name DmelFixDfamDb GCA_000778455.1_CA_8.2_MHAP_genomic.fna

Running RepeatModeler:
nohup singularity run $dfam RepeatModeler -database DmelFixDfamDb -threads 20 -LTRStruct >& run2.out &
(or running without nohup, receive same error message:)
singularity run $dfam RepeatModeler -database PpecDfamDb -threads 20 -LTRStruct

Expected behavior
The final fasta files with consensus families should include both sequences from the Recon/RepeatScout pipeline and the LTR pipeline, but I am not getting any LTR families. Since this genome was used for benchmarking in the publication RepeatModeler2 was presented in, I know I should be expecting ~734 families, however I am only getting ~430 families whenever I run it.

Host system (please complete as much of the following information as you can find out):
This was run on a computing cluster on a linux operating system. More info:
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: Rocky
Description: Rocky Linux release 8.9 (Green Obsidian)
Release: 8.9
Codename: GreenObsidian

Singularity version: apptainer version 1.3.1-1.el8
The singularity container was downloaded on July 2, 2024

RepeatModeler was run on a computing cluster using 1 node, 4 cores, and 3G per core. Job efficiency:
CPU Utilized: 2-02:29:28
CPU Efficiency: 81.56% of 2-13:54:16 core-walltime
Job Wall-clock time: 15:28:34
Memory Utilized: 9.82 GB
Memory Efficiency: 81.83% of 12.00 GB

asgray · 2024-09-24T22:16:08Z

Sorry about the delay, but if this is still an issue, it looks to me that MAFFT was killed by the system for running out of memory. I'd suggest using at least 8Gb per core, rather than 3.
Let me know if that works

rmhubley · 2024-10-01T16:45:31Z

In addition to Anthony's suggested fix, I wanted to address the issue of fewer families with DMel. The RepeatModeler2 paper was based on the 2.0.0 version of RepeatModeler and much has changed in the software since then. Notably, the approach to masking families was greatly improved in later versions, reducing the number of redundant families produced while maintaining the overall annotation coverage of the genome. For example, using the paper generated library for DMel ( https://github.com/jmf422/TE_annotation/tree/master/benchmark_libraries/RM2 [734 families] ), 32.45% of the genome is annotated. Generating a new library with 2.0.5 (seed "1570222393" -- NOTE: reproducibility in RepeatModeler is both dependent on the seed and the version) I obtain a library with 451 families, that masks 32.48% of the genome. The newer versions result in a significantly more concise library.

rmhubley · 2024-10-01T18:53:58Z

This is more of a RepeatModeler issue...moving this to the RepeatModeler github issues page

asgray self-assigned this Sep 24, 2024

rmhubley pinned this issue Oct 1, 2024

rmhubley transferred this issue from Dfam-consortium/TETools Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAFFT failed while running RepeatModeler #256

MAFFT failed while running RepeatModeler #256

sjd028 commented Jul 15, 2024

asgray commented Sep 24, 2024

rmhubley commented Oct 1, 2024 •

edited

Loading

rmhubley commented Oct 1, 2024

MAFFT failed while running RepeatModeler #256

MAFFT failed while running RepeatModeler #256

Comments

sjd028 commented Jul 15, 2024

asgray commented Sep 24, 2024

rmhubley commented Oct 1, 2024 • edited Loading

rmhubley commented Oct 1, 2024

rmhubley commented Oct 1, 2024 •

edited

Loading