-
-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zarr-v3 Consolidated Metadata #2113
Merged
jhamman
merged 84 commits into
zarr-developers:v3
from
TomAugspurger:user/tom/feature/consolidated-metadata
Oct 10, 2024
Merged
Changes from all commits
Commits
Show all changes
84 commits
Select commit
Hold shift + click to select a range
73d53d7
Fixed MemoryStore.list_dir
TomAugspurger 90940a0
fixup s3
TomAugspurger 8ee89f4
recursive Group.members
TomAugspurger 65a8bd4
Zarr-v3 Consolidated Metadata
TomAugspurger 2515ca3
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger cdaf81f
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 5a86789
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger a839f16
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 5a31390
fixup
TomAugspurger 79bf235
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger fc901eb
read zarr-v2 consolidated metadata
TomAugspurger 3a3eb9d
check writablem
TomAugspurger 78af362
Handle non-root paths
TomAugspurger 750668c
Some error handling
TomAugspurger 63697ab
cleanup
TomAugspurger 5d79274
refactor open
TomAugspurger 0c67972
remove dupe file
TomAugspurger 657ad1e
v2 getitem
TomAugspurger 511ff76
fixup
TomAugspurger b360eb4
Optimzied members
TomAugspurger abcdbe6
Impl flatten
TomAugspurger b9bcfe8
Fixups
TomAugspurger 3575cda
doc
TomAugspurger 7b6bd17
nest the tests
TomAugspurger 500a91e
fixup
TomAugspurger 22d501e
Fixups
TomAugspurger 762cf96
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger d6c6cc7
fixup
TomAugspurger 6755fbc
fixup
TomAugspurger e406f86
fixup
TomAugspurger 07248ea
fixup
TomAugspurger bdf15ad
fixup
TomAugspurger 18eb172
consistent open_consolidated handling
TomAugspurger c11f1ad
fixup
TomAugspurger f6397f4
make clear that flat_to_nested mutates
TomAugspurger f55aa37
fixujp
TomAugspurger 123dc60
fixup
TomAugspurger 34c7720
fixup
TomAugspurger 4db042b
Fixup
TomAugspurger 8febba3
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger d730350
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger a1f1ebb
fixup
TomAugspurger 35a3832
fixup
TomAugspurger c1837fd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger d03f4bd
fixup
TomAugspurger cddd01f
fixup
TomAugspurger 9303cd0
added docs
TomAugspurger 87b65f1
fixup
TomAugspurger ee5d130
Ensure empty dict
TomAugspurger af9788f
fixed name
TomAugspurger 5a08466
fixup nested
TomAugspurger d236e53
removed dupe tests
TomAugspurger 2824de6
fixup
TomAugspurger 08a7682
doc fix
TomAugspurger b8b5f51
fixups
TomAugspurger ba4fb47
fixup
TomAugspurger 10d062f
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger e6142d8
fixup
TomAugspurger 8ad3738
v2 writer
TomAugspurger fc94933
fixup
TomAugspurger 79246dd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger a62240b
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 4bfad1b
fixup
TomAugspurger ae02bb5
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 3265abd
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 8728440
path fix
TomAugspurger 20c97a4
Fixed v2 use_consolidated=False
TomAugspurger f7e5b3f
fixupg
TomAugspurger c31f8a1
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 483681b
Special case object dtype
TomAugspurger 7e76e9e
fixup
TomAugspurger 19b9271
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 418bc6b
Merge branch 'tom/fix/dtype-str-special-case' into user/tom/feature/c…
TomAugspurger 97fa2a0
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger 6fab362
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger cbffcbb
docs
TomAugspurger 56d2704
pr review
TomAugspurger 8ade87d
must_understand
TomAugspurger b5fb721
Updated from_dict checking
TomAugspurger d17f955
cleanup
TomAugspurger 1d17140
cleanup
TomAugspurger 2b2e3da
Fixed fill_value
TomAugspurger 96b274c
Merge remote-tracking branch 'upstream/v3' into user/tom/feature/cons…
TomAugspurger c9229d1
fixup
TomAugspurger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
Consolidated Metadata | ||
===================== | ||
|
||
Zarr-Python implements the `Consolidated Metadata_` extension to the Zarr Spec. | ||
Consolidated metadata can reduce the time needed to load the metadata for an | ||
entire hierarchy, especially when the metadata is being served over a network. | ||
Consolidated metadata essentially stores all the metadata for a hierarchy in the | ||
metadata of the root Group. | ||
|
||
Usage | ||
----- | ||
|
||
If consolidated metadata is present in a Zarr Group's metadata then it is used | ||
by default. The initial read to open the group will need to communicate with | ||
the store (reading from a file for a :class:`zarr.store.LocalStore`, making a | ||
network request for a :class:`zarr.store.RemoteStore`). After that, any subsequent | ||
metadata reads get child Group or Array nodes will *not* require reads from the store. | ||
|
||
In Python, the consolidated metadata is available on the ``.consolidated_metadata`` | ||
attribute of the ``GroupMetadata`` object. | ||
|
||
.. code-block:: python | ||
|
||
>>> import zarr | ||
>>> store = zarr.store.MemoryStore({}, mode="w") | ||
>>> group = zarr.open_group(store=store) | ||
>>> group.create_array(shape=(1,), name="a") | ||
>>> group.create_array(shape=(2, 2), name="b") | ||
>>> group.create_array(shape=(3, 3, 3), name="c") | ||
>>> zarr.consolidate_metadata(store) | ||
|
||
If we open that group, the Group's metadata has a :class:`zarr.ConsolidatedMetadata` | ||
that can be used. | ||
|
||
.. code-block:: python | ||
|
||
>>> consolidated = zarr.open_group(store=store) | ||
>>> consolidated.metadata.consolidated_metadata.metadata | ||
{'b': ArrayV3Metadata(shape=(2, 2), fill_value=np.float64(0.0), ...), | ||
'a': ArrayV3Metadata(shape=(1,), fill_value=np.float64(0.0), ...), | ||
'c': ArrayV3Metadata(shape=(3, 3, 3), fill_value=np.float64(0.0), ...)} | ||
|
||
Operations on the group to get children automatically use the consolidated metadata. | ||
|
||
.. code-block:: python | ||
|
||
>>> consolidated["a"] # no read / HTTP request to the Store is required | ||
<Array memory://.../a shape=(1,) dtype=float64> | ||
|
||
With nested groups, the consolidated metadata is available on the children, recursively. | ||
|
||
... code-block:: python | ||
|
||
>>> child = group.create_group("child", attributes={"kind": "child"}) | ||
>>> grandchild = child.create_group("child", attributes={"kind": "grandchild"}) | ||
>>> consolidated = zarr.consolidate_metadata(store) | ||
|
||
>>> consolidated["child"].metadata.consolidated_metadata | ||
ConsolidatedMetadata(metadata={'child': GroupMetadata(attributes={'kind': 'grandchild'}, zarr_format=3, )}, ...) | ||
|
||
Synchronization and Concurrency | ||
------------------------------- | ||
|
||
Consolidated metadata is intended for read-heavy use cases on slowly changing | ||
hierarchies. For hierarchies where new nodes are constantly being added, | ||
removed, or modified, consolidated metadata may not be desirable. | ||
|
||
1. It will add some overhead to each update operation, since the metadata | ||
would need to be re-consolidated to keep it in sync with the store. | ||
2. Readers using consolidated metadata will regularly see a "past" version | ||
of the metadata, at the time they read the root node with its consolidated | ||
metadata. | ||
|
||
.. _Consolidated Metadata: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#consolidated-metadata | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ Zarr-Python | |
|
||
getting_started | ||
tutorial | ||
consolidated_metadata | ||
api/index | ||
spec | ||
release | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs are great @TomAugspurger! 👏