Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON backend #338

Merged
merged 2 commits into from
Dec 12, 2018
Merged

JSON backend #338

merged 2 commits into from
Dec 12, 2018

Conversation

franzpoeschel
Copy link
Contributor

@franzpoeschel franzpoeschel commented Sep 10, 2018

Implementation of a JSON backend for openPMD.

Todo/Issues:

  • user documentation
  • Some of the methods in AbstractIOHandlerImpl.hpp did not behave exactly as documented (modulo misunderstandings on my side). I will address this separately.
  • see the discussion in JSON #65
  • integration with the build system: Build System Support for JSON Backend #384
  • remove commits from Integer Type System: Refactor #337 and rebase onto master
  • put appropriate license, copyright and author headers in every file

@ax3l ax3l changed the title JSON backend [WIP] JSON backend Sep 11, 2018
@ax3l
Copy link
Member

ax3l commented Sep 13, 2018

@franzpoeschel I know the changes were quite heavy recently (big sorry again). Can you please try the following to update this branch? I would like to remove the number of commits and would like to have no merge commit in it.

# backup in case things go south
git checkout -b json-backend-bak
# switch back to json-backend 
git checkout -

# rebase: squash
git fetch --all
# I assume your "git remote -v" for this mainline repo is called "mainline"
git rebase -i mainline
# an editor will open
#   change "pick" to "s" ("squash") to all your commits *but the first entry*
#   remove the lines with "merge commits",
#   remove commits a7a7974 and 176d37d 
# save
# solve merge conflicts one

# if anything goes very wrong
# git rebase --abort

# in the end, update the PR via
git push -f

@franzpoeschel franzpoeschel force-pushed the json-backend branch 2 times, most recently from 60f1c58 to d507bcf Compare September 13, 2018 14:44
@franzpoeschel
Copy link
Contributor Author

Concerning user documentation, is there a place where the file formats are / should be documented?

@ax3l Done, thanks for the steps.

@ax3l
Copy link
Member

ax3l commented Sep 13, 2018 via email

@ax3l
Copy link
Member

ax3l commented Nov 2, 2018

@franzpoeschel I added complete build system support for you in #384.
Is it ok if I rebase this branch and force-push update it against those changes?

@ax3l
Copy link
Member

ax3l commented Nov 2, 2018

Pushed the rebased branch :)
Please make sure that you don't have uncommited/unpushed local changes before continuing to work on this. Fetch the latest online version, revert the last local commit if they were identical before my rebase and pull the fresh commit.

git status -sb
# nothing uncommited?
# latest commits were pushed online?

# assuming yes

# fetch the updates I pushed on your branch and remove local "outdated" commit
git fetch --all
git reset --hard HEAD~
git pull --ff-only

After that, you can continue as usual.

If you do have local changes that were not yet pushed - commit them and rebase the branch against the online commit (also removing the one outdated commit but keeping all other changes on top).

git fetch --all

git rebase -i mainline/dev
# remove outdated commit
# keep all others

@franzpoeschel
Copy link
Contributor Author

Thanks for the help and the detailed instructions!

@franzpoeschel franzpoeschel force-pushed the json-backend branch 6 times, most recently from 410b9fe to 36ccc25 Compare November 14, 2018 16:31
@franzpoeschel franzpoeschel force-pushed the json-backend branch 3 times, most recently from f7edcce to b905669 Compare November 15, 2018 14:32
@ax3l ax3l changed the title [WIP] JSON backend JSON backend Nov 16, 2018
@franzpoeschel franzpoeschel force-pushed the json-backend branch 2 times, most recently from 757a9e0 to ab2fb04 Compare November 22, 2018 09:32
Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang, I reviewed this a week ago and forgot to submit my comments.

@@ -24,8 +24,10 @@
#include "openPMD/Datatype.hpp"

#include <functional>
#include<iostream>

This comment was marked as resolved.

This comment was marked as resolved.

@@ -87,5 +87,27 @@ remove_directory(std::string const& path);
*/
bool
remove_file(std::string const& path);

// /** Opens the file identified by the given path.

This comment was marked as resolved.

This comment was marked as resolved.

struct JSONFilePosition :
public AbstractFilePosition
#if openPMD_HAVE_JSON
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please implement the bidy in a .cpp file? Avoids pulling in the public nlohmann/json.hpp header into our public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.... but I still need to include nlohmann/json.hpp in the header file?
This being said, JSONFilePosition.hpp is only ever included in JSONIOHandlerImpl.hpp, which is included in JSONIOHandler.hpp and that header file is only included in *.cpp files, so it should be fine?

Copy link
Member

@ax3l ax3l Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if no symbols/declarations are used from nlohmann/json.hpp in the header file, then do not include it there.

This being said, [...]

That sounds good, I wasn't sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if no symbols/declarations are used from nlohmann/json.hpp in the header file

They are. JSONFilePosition.hpp is implemented as a wrapper around json::json_pointer.

* the passed arguments and the template parameter type corresponding to the
* openPMD type.
*/

This comment was marked as resolved.

This comment was marked as resolved.

@@ -0,0 +1,154 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we are finished with the review (until then, pls update here):
Shall we rather put this file into https://github.com/openPMD/openPMD-example-datasets ?

The example code in the :ref:`usage section <usage-serial>` will produce the following JSON serialization
when picking the JSON backend:

.. literalinclude:: json_example.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe link to https://github.com/openPMD/openPMD-example-datasets when it's included there?

},
"subgroups": {
"data": {
"subgroups": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you add a subgroups directory here and before data? Can we skip those maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer: separating namespaces. If this indirection were dropped, the subgroups' and datasets' names will have the same namespace as all other JSON objects at that level, including the objects attributes and for the top-level group platform_byte_widths (meaning you can't have groups with those names). This seems like a nasty corner case for a user of the API to run into, hence I chose clarity at the expense of a certain informational overhead as is explained in here.
Do you have a better solution? I can imagine wrapping the subgroups into quotes or maybe reserving a prefix for "keywords" (such as __attributes maybe). Would this be preferrable?

Copy link
Member

@ax3l ax3l Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a risk of collisions which in practice should not be too often the case. Nevertheless, rather adding a dict of attributes inside an attributes group would probably be best (no __). This keeps the collisions minimal and at the same time is very readable. But that's already the case, no?

The same is true for data sets and groups. Without the subgroups "namespace" one can not write the same group and data set name in the same level and neither can be named "attributes". But imho that's fine.

Copy link
Contributor Author

@franzpoeschel franzpoeschel Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's already the case, no?

yep

I changed it now, so attributes keep their separate namespace but datasets and subgroups are now directly linked into the containing group's object, see here.

Copy link
Member

@ax3l ax3l Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Do you think you need the "object_type": "dataset" and "object_type": "group" keys? They don't hurt, but I wonder if one can already guess by the name (e.g. data) and content already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit eliminates the extent field, next up will be the elimination of object_type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-filling the dataset with null values

makes probably sense, this only works in serial anyway, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit eliminates the extent field, next up will be the elimination of object_type.

Can you please update the checked-in example file? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only works in serial anyway, right?

right
object_type is no more and the example file and documentation are updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks! looks good :)


#if openPMD_HAVE_JSON

TEST_CASE( "json_write_test", "[serial][json]")

This comment was marked as resolved.

This comment was marked as resolved.

@franzpoeschel franzpoeschel force-pushed the json-backend branch 7 times, most recently from feabd6c to ca0b7e9 Compare November 30, 2018 12:37
@ax3l
Copy link
Member

ax3l commented Nov 30, 2018

@franzpoeschel I think I can implement #389 now :)
That means: when it is merged you can remove the changes you made to the vendored nlohmann-json files, which is a good thing. If you like, rebase and squash some commits when you are on it? (But again, wait for both until #389 is merged) :)

@ax3l
Copy link
Member

ax3l commented Dec 1, 2018

@franzpoeschel ready for rebase, remove of the JSON third party patch and some squashing of commits :)

@franzpoeschel franzpoeschel force-pushed the json-backend branch 2 times, most recently from 38bd210 to e6fb8d4 Compare December 12, 2018 09:17
@ax3l
Copy link
Member

ax3l commented Dec 12, 2018

@franzpoeschel just do (in case your git remote for the main repo is called mainline, see git remote -v):

git fetch mainline
git rebase -i mainline/dev
git push -f

Implement JSON backend to dummy state

Misses the actual implementation of AbstractIOHandlerImpl

Declare IOHandlerImpl for JSON and integrate it with other places

Misses the implementation.

Undebugged minimum implementation for JSON writing

First basically runnable version of JSON writing

To address:
No reading or deleting yet.
Datatypes are currently ignored and the data is assumed to be int64_t.
Attribute values are ignored and replaced with a dummy value.
If a subgroup name can be parsed as a nonnegative string, the JSON API
will create a JSON array rather than a JSON object (associative array) as intended.

Correctly handle groups that can be parsed as int

See last commit's description.

Fix index calculation with offsets in WriteData

Fix some mistakes in JSON writing

Correctly handle overwriting files:
-> overwritten files should not be possible to access any longer and be
clearly distinguished from the newly-created file
Make some verifications execute independent of compiler options.

Full implementation of JSON writing

Respects all datatypes now.

Format code according to Clion Stylesheet

https://github.com/ComputationalRadiationPhysics/contributing/blob/master/IDESettings/CLion/CRP_CLion2016_1.xml

Add generic branching over an openPMD datatype

First runnable version of JSON Reading

Cleanup and implementation of dataset extension

Undebugged version of JSON deletion

Properly (de)serialize datatypes

Instead of casting the Datatype enum to and from int (which is likely to break
when altering the enum), serialize to and from String values.

Fix a number of mistakes in JSON reading and writing

Cleanup

Add JSON tests and fix bugs found thusly

Add further tests and fix a further bug

The JSON library does not understand long double values (i.e. 128bit
floats), represent them as a char array.

Handle floating point special values

JSON represents +/-Infinity and NaN values as null. The JSON library
will correctly serialize those values *to* JSON, implement
(semi)-correct handly for deserialization. As it is unclear which exact
value a null represents, deserialize it to NaN.
Take notice that large floating point values (128 bit) might be
serialized to null as well.

Use std::is_floating_point to distinguish them from other types

Additionally write the byte width of the underlying type

Not yet used in reading

Mark the writable written after successfully extending a dataset

Remove support for absolute paths from openPath

Fix some rough edges from rebasing

Add documentation for the JSON backend

Integrate the JSON backend with the build system

Make platform bytewidth information global per JSON file

Was previously annotated for every single Dataset and Attribute.

Fixes

-> shadowed variables
-> return file streams by shared pointer so that Clang won't be mad
about missing copy constructors
-> handle inconsistencies between GCC, Clang and MSVC in switchType
function

Add licensing information and cleanup imports

Cleanup

Remove unused code from Filesystem.(h|c)pp
Unify replace_all and replace_all_nonrecursively in StringManip.hpp

Move implementation of JSONFilePosition class to source file

Formatting

Refactor tests

Move tests into functions of the backend to be used.

Remove /subgroups/ and /datasets/ indirection

Make optional_paths_110_test HDF5 specific again

Eliminate "extent" field from datasets

Remove object_type key
@ax3l
Copy link
Member

ax3l commented Dec 12, 2018

Ouch, looks like #389 did not trigger -isystem yet... (Sorry!)

@ax3l
Copy link
Member

ax3l commented Dec 12, 2018

@franzpoeschel I don't know why this did not work, but you can fix it by adding an extra commit changing the following:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2c73e2d..418428f 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -127,8 +127,10 @@ else()
 endif()
 if(openPMD_HAVE_JSON)
     add_library(openPMD::thirdparty::nlohmann_json INTERFACE IMPORTED)
-    target_link_libraries(openPMD::thirdparty::nlohmann_json
-        INTERFACE nlohmann_json::nlohmann_json)
+    #target_link_libraries(openPMD::thirdparty::nlohmann_json
+    #    INTERFACE nlohmann_json::nlohmann_json)
+    get_target_property(lib_include_dirs nlohmann_json::nlohmann_json INTERFACE_INCLUDE_DIRECTORIES)
+    target_include_directories(openPMD::thirdparty::nlohmann_json SYSTEM INTERFACE ${lib_include_dirs})
 endif()
 
 # external library: HDF5 (optional)

Should have been automatically since wrapped in an `IMPORTED`
target... but for some reason did include still with `-I`.
@ax3l
Copy link
Member

ax3l commented Dec 12, 2018

I pushed it for you.

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurray, thanks a lot this is great! ✨

@ax3l ax3l merged commit b7986f4 into openPMD:dev Dec 12, 2018
@ax3l ax3l mentioned this pull request Dec 12, 2018
Closed
@franzpoeschel franzpoeschel deleted the json-backend branch January 28, 2021 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants