Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking up large scans in memory #36

Open
sjperkins opened this issue Jul 3, 2019 · 0 comments
Open

Chunking up large scans in memory #36

sjperkins opened this issue Jul 3, 2019 · 0 comments

Comments

@sjperkins
Copy link
Member

As it stands with the MS format I think there are two approaches to creating in-memory windows:

  1. Create the full resolution window for the scan in memory and pack chunks of the MS into the scan. Then parallelise over the window baselines.

  2. Read the entire MS scan into memory, pack per-baseline windows and flag each window separately.

  3. The previous options assume a fairly general MS. There is an optimal path, if we know that the MS is well-behaved by which I mean:

    • TIME monotically increases
    • all baselines are present for each unique TIME
    • baselines are ordered the same way for each TIME value

    Then, it should be possible to interleave baseline reads from different timesteps and avoid
    an entire scan in memory.

(1) is the current approach. Whichever way we go about it, we need to have an entire scan's worth of memory (in either MS or window format) in order to perform the packing in memory.

(2) might actually result in smaller chunks as we can chunk over both MS row and per-baseline window. But we'd still need all MS rows for the scan in memory as we don't really know which rows contribute to a window

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant