Distributed tskit #2747
jeromekelleher
started this conversation in
Ideas
Replies: 1 comment
-
Seeing as tsinfer is heading this way too it makes sense to offer this method of parallelism in tskit. One option would be to make dask be an optional dependency that is only imported as needed with a friendly error message? This would add a lot of cruft to the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It turns out that with a few straightforward changes to the stats API (and perhaps other places) we can support not-only thread based parellelism, but full distributed computation using dask. Here's an example based on the divergence matrix computation (which is compute intensive) in #2736:
This is pretty cool, but I don't think we should go all in and add Dask as a hard dependency in tskit. It's just too heavy and problematic a dependency, and would be really annoying for things like msprime that would never use it.
We could make a "tskit.distributed" (or something) package which provides wrappers like the one above for key functions that would benefit from this kind of distribution.
Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions