Chunking up large scans in memory #36

sjperkins · 2019-07-03T08:39:54Z

As it stands with the MS format I think there are two approaches to creating in-memory windows:

Create the full resolution window for the scan in memory and pack chunks of the MS into the scan. Then parallelise over the window baselines.
Read the entire MS scan into memory, pack per-baseline windows and flag each window separately.
The previous options assume a fairly general MS. There is an optimal path, if we know that the MS is well-behaved by which I mean:
- TIME monotically increases
- all baselines are present for each unique TIME
- baselines are ordered the same way for each TIME value
Then, it should be possible to interleave baseline reads from different timesteps and avoid
an entire scan in memory.

(1) is the current approach. Whichever way we go about it, we need to have an entire scan's worth of memory (in either MS or window format) in order to perform the packing in memory.

(2) might actually result in smaller chunks as we can chunk over both MS row and per-baseline window. But we'd still need all MS rows for the scan in memory as we don't really know which rows contribute to a window

sjperkins mentioned this issue Jul 3, 2019

52 GB scan arrays create NUMA issues #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking up large scans in memory #36

Chunking up large scans in memory #36

sjperkins commented Jul 3, 2019

Chunking up large scans in memory #36

Chunking up large scans in memory #36

Comments

sjperkins commented Jul 3, 2019