Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v3] Missing array attributes: nbytes, nchunks, nchunks_initialized #2027

Closed
tomwhite opened this issue Jul 11, 2024 · 6 comments · Fixed by #2065
Closed

[v3] Missing array attributes: nbytes, nchunks, nchunks_initialized #2027

tomwhite opened this issue Jul 11, 2024 · 6 comments · Fixed by #2065
Labels
bug Potential issues with the zarr-python library

Comments

@tomwhite
Copy link
Contributor

Zarr version

3.0.0a1

Numcodecs version

0.12.1

Python Version

3.11.9

Operating System

Mac

Installation

pip

Description

The array attributes nbytes, nchunks, and nchunks_initialized are all missing in v3.

Steps to reproduce

>>> import zarr.v2 as zarr
>>> z = zarr.open(store='example-v2.zarr', mode='w', shape=(3, 2))
>>> z.nbytes, z.nchunks, z.nchunks_initialized
(48, 1, 0)
>>> import zarr
>>> z = zarr.open(store='example-v3.zarr', mode='w', shape=(3, 2))
>>> z.nbytes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'nbytes'
>>> z.nchunks
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'nchunks'. Did you mean: 'chunks'?
>>> z.nchunks_initialized
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Array' object has no attribute 'nchunks_initialized'

Additional output

No response

@tomwhite tomwhite added the bug Potential issues with the zarr-python library label Jul 11, 2024
@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2024

nbytes and nchunks should be easy, but i'm wondering if / how we should implement nchunks_initialized, since this triggers a lot of IO.

@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2024

@tomwhite how essential is nchunks_initialized for your usage?

@tomwhite
Copy link
Contributor Author

@tomwhite how essential is nchunks_initialized for your usage?

It's not as important as the other two, but we use it in Cubed to see if an array has been fully computed when resuming a computation after a restart. Currently we don't need to know how many chunks have been initialized, only if any haven't been - in which case we'll recompute them all. So an API like the one mentioned in zarr-developers/zarr-specs#300 (comment) for TensorStore would be sufficient.

@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2024

so my personal view is that "what's going on with my chunks" is a very important question and zarr-python should provide basic tools with which people can answer that question, and making nchunks_initialized an array attribute gives the false impression that this is a cheap quantity to represent accurately, which it is not. So for me the easiest thing would be to spin nchunks_initialized off into a standalone function in array.py, with a docstring that explains the caveats and whatnot. Would that work for Cubed?

@tomwhite
Copy link
Contributor Author

Would that work for Cubed?

Yes, thanks

@d-v-b
Copy link
Contributor

d-v-b commented Jul 11, 2024

while we are at it, chunks_initialized(array) -> tuple[chunk_identifier, ...] seems a bit better than nchunks_initialized(array) -> int. Something to think about when this PR gets opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants