Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does -srand <number> option cause RepeatModeler to be deterministic? #203

Open
alankuo1 opened this issue Mar 22, 2023 · 6 comments
Open
Labels

Comments

@alankuo1
Copy link

I ran RepeatModeler 2x on a genome, using the option -srand and with the same value. However, the resulting repeat libraries differed in number of elements. I wonder if my understanding of or expectation of -srand is incorrect. My expectation is identical outputs.

The RepeatModeler implementation that I have is in a docker container. The command, that I ran 2x, is:
shifter --image=docker:dfam/tetools:1.7 RepeatModeler -database Polarella_glacialis_CCMP2088 -threads 30 -srand 2756104381

The output file Polarella_glacialis_CCMP2088-families.fa has 1529 sequences in the 1st run, and 1510 sequences in the 2nd run.

@rmhubley
Copy link
Member

rmhubley commented Aug 4, 2023

I am not sure what version of RepeatModeler (e.g 2.0.3, 2.0.4 etc) and rmblast (e.g 2.13.0, 2.14.0) shifter is using. There was a problem with RMBlast (fixed in 2.14.0) where it could generate slightly different (but equally scoring) alignments in a multi-threaded context. When used with RepeatModeler with more than one thread (e.g -pa 10) it could generate different results even when the same seed number was used. If you upgrade to RepeatModeler 2.0.4 and RMBlast 2.14.0 this problem should go away.

@rmhubley rmhubley closed this as completed Aug 7, 2023
@alankuo1
Copy link
Author

alankuo1 commented Aug 29, 2023 via email

@rdhayes
Copy link

rdhayes commented Aug 29, 2023

Hello, our last test was with the v1.7 docker container described at https://github.com/Dfam-consortium/TETools

Alan, it appears that we should test with the more recent v1.85 release, which would upgrade rmblast from 2.13.0 to 2.14.0, according to that repo's changelog.

@alankuo1
Copy link
Author

alankuo1 commented Aug 30, 2023 via email

@rdhayes
Copy link

rdhayes commented Aug 30, 2023

Hello, we have confirmed that our most recent tests with non deterministic results, 22724 scaffolds for 4.2 Gbases, was done with the TETools container v1.85. That changelog indicates that we used:

  • rmblast 2.14.0
  • repeatmasker 4.1.5
  • repeatmodeler 2.0.4

@rmhubley
Copy link
Member

rmhubley commented Sep 6, 2023

Do you have the log files from both runs? Also, if you share the sequence file I could also kick off a reproduction run on our servers to see if I can locate the issue.

@rmhubley rmhubley reopened this Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants