Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for zip #23

Open
mxmlnkn opened this issue Sep 23, 2023 · 0 comments
Open

Add support for zip #23

mxmlnkn opened this issue Sep 23, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@mxmlnkn
Copy link
Owner

mxmlnkn commented Sep 23, 2023

This might make mxmlnkn/ratarmount#105 more performance. Implementing it purely on the ratarmount side would result in having to open one ParallelGzipReader for each compressed file. This can quickly lead to memory overload and thread contention because each of these ParallelGzipReader would have their own cache and prefetcher.

One fix might be to make it possible to share the ThreadPool and the Chunk cache. This feels like it might complicate things and reasoning about it though.

The other idea would be some kind of "native" support for zip by ParallelGzipReader. In theory, it should be possible to provide a StenciledFileReader, that cuts out all zip headers and footers and stuff and only concatenates all raw deflate streams into one large one. This could actually also be done directly in ratarmount! It might even be sufficiently fast! Maybe that is already the solution.

The other idea was, to basically provide these stencils as an imported index, which should be constructible purely from the zip metadata. In this case, one chunk would be one file and the back-references would be empty because each file is independently compressed! However, this would not enable parallel decompression of large members and many small members might also be problematic, especially as the cache can hold only a fixed number of chunks instead of having a memory limit. Therefore, it would be kind of necessary to split large chunks and join small ones either during import (would not be on-the-fly unfortunately) or during first-touch, which would be very complex to implement, especially as the chunk index database was thought to be created in a kind of streaming manner from lowest to smallest and then never change. It might be hard to change this assumption error-free at all code locations, even though there are a lot of tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant