Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading BOM data into ram #41

Open
cameronko opened this issue Dec 23, 2022 · 3 comments
Open

Loading BOM data into ram #41

cameronko opened this issue Dec 23, 2022 · 3 comments

Comments

@cameronko
Copy link

Hello,

I am trying to train this model on Australia's BOM Radar Data, however, I am having trouble loading the data into memory.

I have 1 year's worth of data in netCDF4 format at time steps of 5 minutes. Each time step is a seperate NC file. The file structure to access the precipitation field at 01/01/2022 at 12:30pm would be: BOM Rain Rate Data 2022 (fodler) > 20220101 (folder) > 20220101_123000.nc (the precipitation field is stored as an array of int64 values in mm/h under a variable called 'rain_rate' in the netCDF file). I have tried the netCDF4 and xarray libraries for python and recieve an OOM error.

The problem is if I was to load the all available 2022 data (~300 days), it would require approximately 180 GB of ram, which I do not have. The netCDF must compress the data as the size of 2022 data on disk is ~5 GB.

How would I go about efficiently loading all this data and passing it into the DGMR?

Thanks for your help.

@jacobbieker
Copy link
Member

Hey, sorry for the delay, I just missed this issue. I wouldn't load it all into ram at once. For training, we lazily load the data we need from either the UK Nimrod dataset or US MRMS, so only have the small examples in memory at any given time. We tend to use Zarr, and xarray, which work fairly well for doing that, but yeah, not loading it all into memory at once.

@peterdudfield
Copy link
Contributor

@all-contributors please add @primeoc for question

@allcontributors
Copy link
Contributor

@peterdudfield

I've put up a pull request to add @primeoc! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants