-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SWMR in HDF5 (updated) #2653
base: main
Are you sure you want to change the base?
Conversation
* main: (1776 commits) Invert solution as discussed at Unidata#2618 Correct a potential null pointer dereference. Fix a logic error that was resulting in an easy-to-miss error when running configure. Fix issue with dangling test file not getting cleaned up. Turn nczarr zip support off by default in cmake, add a status line indicating whether nczarr-zip-support is available, in libnetcdf.settings. Update the version of the cache action used by github action from v2 to v3. Explicit cast to unsigned char. More issues returned by sanitizer, related to attempts to assign MAX_UNSIGNED_CHAR (255) to a variable of type char. Fixed an issue where memcpy was potentially passed a null pointer. Correct another uninitialized issue. Correct undefined variable error. Fixing issues uncovered by compiling with '-fsanitize=undefined' and running nc_test/nc_test, in support of Unidata#1983, amongst others. The cast as was being used was undefined behavior, and had to be worked around in a style approximating C++'s 'reinterpret_cast' Remove a stray character at the head file. Fix a distcheck failure with nczarr_test/run_interop.sh Turn benchmarks off by default. They require netcdf4, additional logic is required in order to have them on by default. Add execute bit to test scripts Fix missing endif statement Add generated parallel tests for nc_perf, cmake-based build system. Correct typo in CMakeLists.txt Wiring performance benchmarks into cmake, cleaned up serial compression performance test dependency on MPI. ...
Awesome work! Definitely add a flag to netcdf_meta/nc-config for this! |
Ok, I've added |
HDF5 has lots of features that are not supported in netcdf-c. |
@DennisHeimbigner SWMR is a very useful feature for HPC users. It is frequently the case that a model is writing results and other processes want to see those results. To calculate the next timestep, all processors need some data from the previous timestep. The simplest solution would be for those reading processes to read the file and get what they want, but without SWMR it will not work. A NOAA group just asked about this recently, so I believe it is a feature that will be used by multiple modeling groups, There's also a good chance that PIO would allow access to this features, and PIO is used in several important models at NOAA and NCAR. |
@gsjaardema what do you think about SWMR? |
The inability to watch netCDF4 files while they are being written is the main complaint I get from researchers when upgrading applications and libraries. Using SWMR removes that limitation. This is a pretty basic QoL feature that would benefit a lot of users. It would be even better it this feature could be enabled and used completely automatically, but it's not clear if that's even possible. |
I agree with @ZedThree that many users want to be able to query the HDF5 file while it is being written and the inability to do that is a big complaint. Especially when it works for netcdf-3 and netcdf-5 but not netcdf-4. Explaining that to users is confusing. At one time, SWMR did not work with parallel files; I don't remember if that has changed yet. We would at a minimum need to support parallel writing with serial read; parallel write and parallel read would be nice. Serial write/serial read which SWMR definitely supports is also useful for us though. There is also a question of overhead -- does use of SWMR entail any performance hit on the writing. If no or minimal, then I could see using it at all times; if there is a performance hit, then we would probably need it to be turned on/off on a file-by-file basis. |
Just to be clear, in case there's any confusion, this is a completely opt-in feature currently, and requires netCDF to be configured and built with explicit support, as well as a runtime flag when opening the file. I think SWMR is incompatible with parallel files, and only works in serial. I have seen some references to "VFD SWMR", "full SWMR", and "MWMR" (multiple writer, multiple reader), but I'm not sure of the availability of these features. |
Hi all, great stuff! I was just wondering if there had been any updates on this work regarding building into the main branch or any other initiatives in NetCDF over the past six months that do something similar? Very keen to be able to read NetCDF4 model result files (in QGIS for example) while a hydrodynamic model is still running. The ability to review results 'on-the-fly' as a model is running is an important part of flood and coastal modelling workflow, and this functionality will provide great value to researchers and practitioners. |
Where is this change; I do not see it in the files changed for this PR. |
It looks like this branch has fallen out of date a little bit, I'll try and fix that up. I don't think there has been any other work on this though. If you're able, it would be nice to see if you could try out this branch and see if there's any major issues I've missed!
This is just a difference in implementing this feature between #1448 and this PR, so it doesn't appear in the diff between this branch and In if (mode & NC_WRITE && mode & NC_HDF5_SWMR) {
if ((retval = H5Fstart_swmr_write(h5->hdfid))) {
BAIL(retval);
}
} and this PR I instead changed the file access flags: if((mode & NC_HDF5_SWMR)) {
flags |= H5F_ACC_SWMR_WRITE;
} The former starts SWMR mode after the file as been opened; the latter opens the file immediately in SWMR mode. I don't recall exactly why this made any difference, except that it was required to get the tests to pass. |
Hi @ZedThree, (and an early disclaimer here about me being a hack), I've been testing the past few days and have compiled #2652 on top of HDF5 (have completed tests on both HDF5 1.10.11 and 1.14.1-2). For the most part it compiles successfully however tst_filter appears to fall over. See outputs at the bottom of this comments.Any advice is certainly welcome. Despite tst_filter I've pushed on. Our program uses netcdf-fortran to write results, so to try and test your commits i've built v4.6.1 (https://github.com/Unidata/netcdf-fortran/tree/v4.6.1) on top of #2653. Sadly, I appear to be still getting file lock issues (no doubt user error here...). Are you aware of anything within the netcdf-fortran libs what will also need to be updated to enabled swmr, or potentially if there are additional flags I should be setting when first opening our result files via nf90_create. Happy to send more information through. Many thanks for your help. Cheers, Mitch.
|
@Mitch3007 Thanks for taking a look! I'm not sure about the test errors -- it passes everything in CI, and this feature shouldn't interact with the filters at all, so it's possible there's an issue with your environment.
Ah yes the Fortran API will need the There's currently no docs for this feature, but to use it properly, you should first open the file normally and create all the variables and attributes. Then, close and reopen the file in SWMR mode ( Reading through the HDF5 docs, I realise this PR is actually missing a crucial part, a call to |
* main: Fixed various UBSan warnings about working with NULL pointers Fixed misaligned memory access flagged by UBSan disable test that depend on ncpathcvt in cmake build w/o utilities Update RELEASENOTES Remove the execinfo capability fix memory leak ckp remove conflicts Fixed various UBSan warnings about invalid bit shifting Add fenceposting so DLL flags are only introduced wwhen we are compiling DLL-based shared libraries. Working to get the proposed change working with Visual studio CMake: Use helper libraries for nczarr tests Fix most float conversion warnings Update release notes Update internal tinyxml2 code to the latest version Update release notes Improve fetch performance of DAP4
Previously, setting some HDF5 CMake config flags directly would skip both the check for HDF5 1.10 and setting the `HDF5_HAS_SWMR` flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be mistaken but I think the version handling code is wrong in places. You'll always want the highest version possible, but limit downwards to 1.10
(when SWMR
was introduced) when using this feature.
@magnusuMET Yes, I think you're right. Actually, I'm not sure why we can't just sidestep the issue and only call |
@magnusuMET You prompted me to look into this properly -- it turns out that since the original PR (#1448) that this one is based on, a new function The bounds that are currently set (in |
Libver bounds already taken care of elsewhere, and merely opening the file in SWMR mode will correctly set format to at least v110
I've added Things that are now missing:
This is currently gated behind a build-time flag (as well as the runtime flag). Does that still make sense? I notice that the CI now only runs on HDF5 1.10.8+, so if 1.8.x is no longer supported, this feature should always be available (though not always turned on of course!). It would also mean not needing an additional CI job. @WardF @DennisHeimbigner thoughts on how to proceed from here? |
Has anyone tested this under Windows (Visual Studio)? |
I don't have access to a Windows dev environment myself unfortunately. When the Windows tests are running on CI, could use that? |
Hi @ZedThree, I have access to both Linux (using CentOS7) and Windows 10 VS Studio 2019 build environments so am happy to undertake some Visual Studio tests once I've got things up and running properly on Linux. I've had some time to look at this again over the past week, however I'm struggling to build the swmr tests within netcdf-c/nc_test4/CMakeLists.txt: if (ENABLE_HDF5_SWMR) The terminal output when running the tests (which doesn't include test_hdf5_swmr) is provided as follows.
Is there anything additional I need to add to my build environment to add the nc_test4 list? As a test, I've also tried to commenting out the ENABLE_HDF5_SWMR if as follows: #if (ENABLE_HDF5_SWMR) However this still doesn't seem to add the tests. Any thoughts on what I might be doing wrong or need to add? Very much appreciate any help you can provide. I've attached the build script I'm using to build netcdf-c, which I hope provides some further context on how I'm building. build_netcdf.txt. I've renamed from .sh to .txt here to hopefully avoid any security issues. The HDF version I'm building on is 1.14.1-2. Cheers, Mitch. |
@Mitch3007 When you run cmake, there should be a line in the output that either confirms 'Found bash: |
Hi @WardF, thanks for taking a look at this :) I've reviewed the terminal output when building netcdf-c. I don't seem to see any 'Found bash:' or 'bash shell not found output' however the build environment looks to be using -bash: BASH_VERSION = 4.2.46(2)-release. Does that look ok to you, or is there any specific log/output file I can take a look at further? I've attached the full terminal output: build_netcdf_all_terminal_output.txt if it's of any use as reference. For further context the build process I'm using is as follows:
Also of interest, I was expecting test_hdf5_swmr_reader.c and test_hdf5_swmr_writer.c to produce some .o files in netcdf-c/nc_test4 but I don't appear to be building them. Thanks again, Mitch. |
Hi @WardF , Despite the nc_test4 build issues mentioned above I've been able successfully build pull request commit ff7bde5 on both Linux and Windows build environments. To test swmr, I'm using fortran 90 and have put together two small fortran prototype programs. The first opens the file in swmr mode, declares the file then closes it and opens it again for writing. It writes out 20 time steps to a CF-compilant NetCDF grid on long lat coordinates. The second program opens the NetCDF in swmr read mode, reports some information on the length of the time dimension, then closes the file. There is a short pause (0.1 seconds) and then it reopens and repeats the process 200 times. To get it working on fortran I've pulled down netcdf-fortran 138a6b7 and have locally added to netcdf-fortran/fortran/netcdf_constants.F90:
So @DennisHeimbigner and @ZedThree, swmr appears to be working on Windows and Linux. Keen to hear where we need go from here. Happy to upload my test programs if you think it will be useful. Please let me know if you need anything further from me to help progress this one. |
The next step will be to figure out these conflicts, I will take a stab at that. Any additional thoughts @DennisHeimbigner ? |
The only thing that comes to mind is the threading issue. |
I've taken the work @eivindlm started in #1448, and done the following:
main
and fixed conflicts--enable-hdf5-swmr
option toconfigure.ac
add_definition
in CMake to defining a macro inconfig.h
, to be consistent with autotools versionH5Pset_libver_bounds
wasn't set when opening a fileH5Pstart_swmr_write
with file access flagsThis feature needs some documentation, as it's not quite straightforward to use. Where's best to put that?
Should this also have a feature flag set in
netcdf_meta.h
, and innc-config
?Given that this feature needs explicitly enabling at configure time, should it be added to an existing/new CI build?