You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to train this model on Australia's BOM Radar Data, however, I am having trouble loading the data into memory.
I have 1 year's worth of data in netCDF4 format at time steps of 5 minutes. Each time step is a seperate NC file. The file structure to access the precipitation field at 01/01/2022 at 12:30pm would be: BOM Rain Rate Data 2022 (fodler) > 20220101 (folder) > 20220101_123000.nc (the precipitation field is stored as an array of int64 values in mm/h under a variable called 'rain_rate' in the netCDF file). I have tried the netCDF4 and xarray libraries for python and recieve an OOM error.
The problem is if I was to load the all available 2022 data (~300 days), it would require approximately 180 GB of ram, which I do not have. The netCDF must compress the data as the size of 2022 data on disk is ~5 GB.
How would I go about efficiently loading all this data and passing it into the DGMR?
Thanks for your help.
The text was updated successfully, but these errors were encountered:
Hey, sorry for the delay, I just missed this issue. I wouldn't load it all into ram at once. For training, we lazily load the data we need from either the UK Nimrod dataset or US MRMS, so only have the small examples in memory at any given time. We tend to use Zarr, and xarray, which work fairly well for doing that, but yeah, not loading it all into memory at once.
Hello,
I am trying to train this model on Australia's BOM Radar Data, however, I am having trouble loading the data into memory.
I have 1 year's worth of data in netCDF4 format at time steps of 5 minutes. Each time step is a seperate NC file. The file structure to access the precipitation field at 01/01/2022 at 12:30pm would be: BOM Rain Rate Data 2022 (fodler) > 20220101 (folder) > 20220101_123000.nc (the precipitation field is stored as an array of int64 values in mm/h under a variable called 'rain_rate' in the netCDF file). I have tried the netCDF4 and xarray libraries for python and recieve an OOM error.
The problem is if I was to load the all available 2022 data (~300 days), it would require approximately 180 GB of ram, which I do not have. The netCDF must compress the data as the size of 2022 data on disk is ~5 GB.
How would I go about efficiently loading all this data and passing it into the DGMR?
Thanks for your help.
The text was updated successfully, but these errors were encountered: