-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load hdf5 dataset > 2GB #13
Comments
I don't know if this component is still actively developed. You should also consider alternatives such as BDV-HDF5 or n5, maybe. For what it's worth, here's a Groovy script I once created to read large HDF5 files in chunks of 2GB: #@ File (label = "HDF5 File Input", style = "extensions:h5/hdf5") h5file
#@ Boolean (label = "Automatic Chunk Size") autoChunkSize
#@ String (visibility = MESSAGE, persist = false, value = "If checked, the following value is ignored") msg
#@ Integer (label = "Chunk Size (number of time points)", min = 1, value = 1000) chunk
#@output imgs
#@ LogService log
import ch.systemsx.cisd.hdf5.HDF5Factory
import ch.systemsx.cisd.hdf5.HDF5DataClass
import ij.ImagePlus
import ij.process.ShortProcessor
import net.imglib2.img.array.ArrayImgs
reader = HDF5Factory.openForReading(h5file)
info = reader.getDataSetInformation("/images")
log.info("Dataset found: $info")
// Make sure we have uint16
assert(!info.getTypeInformation().isSigned()) // u
assert(info.getTypeInformation().getDataClass() == HDF5DataClass.INTEGER) // int
assert(info.getTypeInformation().getElementSize() == 2) // 16
// Make sure we have 3 dimensions (tyx)
dims = info.getDimensions()
assert(dims.length == 3)
// automatically determine optimal chunk size
final twoGiga = 2l * 1024 * 1024 * 1024
optimalChunkSize = twoGiga / (16/8) / dims[2] / dims[1]
log.info("Optimal chunk size: $optimalChunkSize")
if (autoChunkSize) {
chunk = optimalChunkSize as int
}
log.info("Using chunk size $chunk")
numberOfChunks = ((dims[0] / chunk) as int) + 1
log.info("Creating $numberOfChunks chunks in total")
imgs = []
numberOfChunks.times { index ->
log.info("Reading chunk ${index+1}")
shortArray = reader.uint16().readMDArrayBlock("/images", [chunk, dims[1], dims[2]] as int[], [index, 0, 0] as long[])
// Create ArrayImg from MDShortArray
aDims = []
shortArray.dimensions().each { d ->
aDims << d
}
imgs << ArrayImgs.unsignedShorts(shortArray.getAsFlatArray(), aDims.reverse() as long[])
}
// Close HDF5 File
reader.close() |
Thank for your code! |
The limit is not actually 2GB, it is 2G array elements. This plugin can load HDF5 files that are 32-bit floats with dimensions 1024x1024x2047, which is nearly 8 GB. But it cannot load files with dimensions 1024x1024x2048 of any data type, i.e. 8-bit, 16-bit, or 32-bit. |
Note that the HDF5 plugin below will open large HSD5 datasets OK if virtual stack is selected in the dialog box: However, for many applications virtual stacks are not what is needed because they are read-only. For example I have an 8 GB signed integer HDF5 dataset, so I need to read it into a real stack and apply calibration to convert the display to correctly show signed integers. This works fine when I read the data from a netCDF-3 file. But the native Java HDF5 reader plugin fails when the number of array elements is 2^31 or greater. This is a serious and rather silly limitation these days when 128 GB of RAM is less than $1,000. |
We can and should fix this by employing imglib2. BigDataViewer and/or n5-viewer can read large HDF5 datasets without difficulty. This is on my todo list after having updated JHDF5 to 19.04.01. |
@MarkRivers if you have a few minutes for a chat, could you contact me at [email protected]. |
Dear team,
Could I ask for the support to load dataset larger than 2GB? Because I think that hdf5 is chosen for much larger data, which has performances better than stack tiff.
Thanks!!!
The text was updated successfully, but these errors were encountered: