Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load hdf5 dataset > 2GB #13

Open
JiangXL opened this issue Apr 16, 2020 · 6 comments · May be fixed by #17
Open

Load hdf5 dataset > 2GB #13

JiangXL opened this issue Apr 16, 2020 · 6 comments · May be fixed by #17

Comments

@JiangXL
Copy link

JiangXL commented Apr 16, 2020

Dear team,

Could I ask for the support to load dataset larger than 2GB? Because I think that hdf5 is chosen for much larger data, which has performances better than stack tiff.

Thanks!!!

@imagejan
Copy link
Member

I don't know if this component is still actively developed. You should also consider alternatives such as BDV-HDF5 or n5, maybe.

For what it's worth, here's a Groovy script I once created to read large HDF5 files in chunks of 2GB:

#@ File (label = "HDF5 File Input", style = "extensions:h5/hdf5") h5file
#@ Boolean (label = "Automatic Chunk Size") autoChunkSize
#@ String (visibility = MESSAGE, persist = false, value = "If checked, the following value is ignored") msg
#@ Integer (label = "Chunk Size (number of time points)", min = 1, value = 1000) chunk
#@output imgs
#@ LogService log

import ch.systemsx.cisd.hdf5.HDF5Factory
import ch.systemsx.cisd.hdf5.HDF5DataClass
import ij.ImagePlus
import ij.process.ShortProcessor
import net.imglib2.img.array.ArrayImgs

reader = HDF5Factory.openForReading(h5file)

info = reader.getDataSetInformation("/images")

log.info("Dataset found: $info")

// Make sure we have uint16
assert(!info.getTypeInformation().isSigned()) // u
assert(info.getTypeInformation().getDataClass() == HDF5DataClass.INTEGER) // int
assert(info.getTypeInformation().getElementSize() == 2) // 16

// Make sure we have 3 dimensions (tyx)
dims = info.getDimensions()
assert(dims.length == 3)

// automatically determine optimal chunk size
final twoGiga = 2l * 1024 * 1024 * 1024
optimalChunkSize = twoGiga / (16/8) / dims[2] / dims[1]
log.info("Optimal chunk size: $optimalChunkSize")
if (autoChunkSize) {
	chunk = optimalChunkSize as int
}
log.info("Using chunk size $chunk")

numberOfChunks = ((dims[0] / chunk) as int) + 1
log.info("Creating $numberOfChunks chunks in total")

imgs = []
numberOfChunks.times { index ->
	log.info("Reading chunk ${index+1}")
	shortArray = reader.uint16().readMDArrayBlock("/images", [chunk, dims[1], dims[2]] as int[], [index, 0, 0] as long[])
	// Create ArrayImg from MDShortArray
	aDims = []
	shortArray.dimensions().each { d ->
		aDims << d
	}
	imgs << ArrayImgs.unsignedShorts(shortArray.getAsFlatArray(), aDims.reverse() as long[])
}

// Close HDF5 File
reader.close()

@JiangXL
Copy link
Author

JiangXL commented Apr 24, 2020

Thank for your code!
Now, I'm using Big-TIFF for data analysis(by Julia) and visualization(by Fiji). But h5 is still better and native for many programming environments.

@ExtraE113 ExtraE113 linked a pull request Jul 23, 2022 that will close this issue
@MarkRivers
Copy link

The limit is not actually 2GB, it is 2G array elements. This plugin can load HDF5 files that are 32-bit floats with dimensions 1024x1024x2047, which is nearly 8 GB. But it cannot load files with dimensions 1024x1024x2048 of any data type, i.e. 8-bit, 16-bit, or 32-bit.

@MarkRivers
Copy link

Note that the HDF5 plugin below will open large HSD5 datasets OK if virtual stack is selected in the dialog box:
https://github.com/paulscherrerinstitute/ch.psi.imagej.hdf5

However, for many applications virtual stacks are not what is needed because they are read-only. For example I have an 8 GB signed integer HDF5 dataset, so I need to read it into a real stack and apply calibration to convert the display to correctly show signed integers. This works fine when I read the data from a netCDF-3 file. But the native Java HDF5 reader plugin fails when the number of array elements is 2^31 or greater. This is a serious and rather silly limitation these days when 128 GB of RAM is less than $1,000.

@mkitti
Copy link

mkitti commented Dec 5, 2022

We can and should fix this by employing imglib2. BigDataViewer and/or n5-viewer can read large HDF5 datasets without difficulty. This is on my todo list after having updated JHDF5 to 19.04.01.

@mkitti
Copy link

mkitti commented Dec 5, 2022

@MarkRivers if you have a few minutes for a chat, could you contact me at [email protected].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants