-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an out of memory dataset #831
Conversation
This PR closes #830
For a proper .tif with block sizes, we see a memory leak within trainer.predict! Memory usage of out-of-memory dataset
Memory usage within-memory dataset
** These results show that a within memory dataset releases memory during trainer.predict, whereas the out of memory dataset gains an enormous amount of memory.** |
Success! Closing the rasterio set on each step helps and only slows it down a bit.
The reason to not just have out-of-memory is 1) It doesn't allow parallelization, rasterio is not threadsafe. 2. It only works on tiled objects, for untiled .JPG objects, its slower, since it loads the entire image every time. |
In response to #830 #829, I think it was time to make an out-of-memory prediction tool for when rasters are too large to read. This is for when the rasterio itself is too large for
would yield a memory error. It doesn't (yet) address any memory load within the existing TileDataset class. I imagine that reading blocks from rasterio will be slower than in memory, and that rasterio is not thread safe, so setting workers to 0. Not that we have in general seen much difference between workers = 0 and workers = 1 with multi-processing.