You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In issue #12 we discuss prospects of caching grids in an efficient binary format such as .npy files. However, that issue implied that the user would be doing the caching on a per-subset basis. Here, instead we propose the one-time storage in an efficient format of an entire grid.
Open Questions: What storage format do we adopt? .npy, .h5, .parquet, .feather, etc. Where to store? Strong preference for zenodo.
Requirements:
Whatever format we have must have some form of storing metadata (teff, logg, Z, fsed, etc)
Should be binary-based (small storage and fast access), should support efficient columnar data access
Format should be stable and have longevity
Pros:
Users will only have to download once
Significantly faster IO
Storage impact is significantly lower
Easier for gollum developers to support cross-platform data downloads
Cons:
Must be managed by gollum maintainers instead of by grid creators/users
Prospect for divergence between native primary source documents and our secondary source documents, requires QA
Use case workflow:
User installs gollum, says "I want to work with Sonora Diamondback"
User runs SonoraDiamondbackSpectrum.download_grid(location=)
Ultra zipped compressed archive is stored somewhere online
Function grabs the archive and downloads it into a directory specified by the location they provided
Now when user asks for a sonora spectrum, all we do is pick out the columns(grid points) of our dataframe that we want and load them in (very fast because the data has been specifically optimized for this operation)
Same with grid loading, presto
The text was updated successfully, but these errors were encountered:
In issue #12 we discuss prospects of caching grids in an efficient binary format such as .npy files. However, that issue implied that the user would be doing the caching on a per-subset basis. Here, instead we propose the one-time storage in an efficient format of an entire grid.
Open Questions: What storage format do we adopt? .npy, .h5, .parquet, .feather, etc. Where to store? Strong preference for zenodo.
Requirements:
Pros:
Cons:
Use case workflow:
The text was updated successfully, but these errors were encountered: