Distributed tskit #2747

jeromekelleher · 2023-04-25T08:37:39Z

jeromekelleher
Apr 25, 2023
Maintainer

It turns out that with a few straightforward changes to the stats API (and perhaps other places) we can support not-only thread based parellelism, but full distributed computation using dask. Here's an example based on the divergence matrix computation (which is compute intensive) in #2736:

def divergence_matrix(ts, chunks=16):

    @dask.delayed
    def worker(ts, interval):
        return ts.divergence_matrix(windows=interval)

    work = ts._chunk_sequence_by_tree(chunks)

    ts_delayed = dask.delayed(ts)
    tasks = [worker(ts_delayed, interval) for interval in work]
    results = dask.compute(*tasks)

    return sum(results)


ts = msprime.sim_ancestry(5000, recombination_rate=1, sequence_length=1000, random_seed=2)
print(ts)
D = divergence_matrix(ts, 4)
print(D)

This is pretty cool, but I don't think we should go all in and add Dask as a hard dependency in tskit. It's just too heavy and problematic a dependency, and would be really annoying for things like msprime that would never use it.

We could make a "tskit.distributed" (or something) package which provides wrappers like the one above for key functions that would benefit from this kind of distribution.

Any thoughts?

benjeffery · 2023-04-25T09:17:39Z

benjeffery
Apr 25, 2023
Maintainer

Seeing as tsinfer is heading this way too it makes sense to offer this method of parallelism in tskit.

One option would be to make dask be an optional dependency that is only imported as needed with a friendly error message? This would add a lot of cruft to the TreeSequence class though, so I'm leaning towards a separate module as you suggest.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed tskit #2747

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Distributed tskit #2747

jeromekelleher Apr 25, 2023 Maintainer

Replies: 1 comment

benjeffery Apr 25, 2023 Maintainer

jeromekelleher
Apr 25, 2023
Maintainer

benjeffery
Apr 25, 2023
Maintainer