-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allow to write chunks to invalid chunk coordinates #175
Comments
@jhamman do you think implementing this could cause any kind of problem with zarr-python? For example, this would demand that an array (metadata) is resized before it's new chunks are set. |
@paraseba Would like to try and give this one a shot! Let me know any context and background. |
@emacollins Great! So, the key to this one is in the To know if the coordinates are valid, we need to look into the array metadata, which you can find in the response of You'll probably need to add a new error case to Probably some of our tests fail after this change, and also, we'll want to add tests that verify the new behavior. Let me know if that's enough to get you started or if you want more context. And ping me with any questions. Thank you! |
@paraseba Thanks for the detailed explanation! I think this is enough to get started. I'll familiarize myself about with each of these parts and see what I can do |
Sounds great @emacollins ! Take you time and let me know if you have questions. |
Another change we would need, but it could be done in a separate ticket, would be to make sure that if the user shrinks an array by changing its metadata, all the chunks that are outside of the array bounds disappear from the manifest. |
Alright, it doesn't look like I am able to publish my branch for this (permissions error?), but just want to check in and make sure I am on the right track. My thinking was to add a method to pub struct ZarrArrayMetadata {
pub shape: ArrayShape,
pub data_type: DataType,
pub chunk_shape: ChunkShape,
pub chunk_key_encoding: ChunkKeyEncoding,
pub fill_value: FillValue,
pub codecs: Vec<Codec>,
pub storage_transformers: Option<Vec<StorageTransformer>>,
pub dimension_names: Option<DimensionNames>,
}
impl ZarrArrayMetadata {
pub fn chunk_indicies_valid(&self, chunk_indices: &ChunkIndices) -> bool {
// Check if provided chunk indices are valid for array
// For example, given an array with shape (10000, 10000)
// and chunk shape (1000, 1000) there will be 100 chunks
// laid out in a 10 by 10 grid. The chunk with indices (0, 0)
// provides data for rows 0-999 and columns 0-999 and is stored
// under the key “0.0”; the chunk with indices (2, 4) provides data
// for rows 2000-2999 and columns 4000-4999 and is stored under the
//key “2.4”; etc.
assert_eq!(self.shape.len(), self.chunk_shape.len());
let mut num_chunks: u64;
let mut max_chunk_index: u64;
let mut requested_chunk_index: &u32;
let mut chunk_shape: &NonZeroU64;
for i in 0..self.shape.len() {
chunk_shape = self.chunk_shape.get(i).unwrap();
num_chunks = (self.shape[i] + chunk_shape - 1) / chunk_shape;
max_chunk_index = num_chunks - 1;
requested_chunk_index = chunk_indices.get(i).unwrap();
if max_chunk_index > *requested_chunk_index as u64 {
continue;
} else {
return false;
}
}
return true;
}
} |
Great work @emacollins . My notes:
You need to fork the repository, push a new branch there, and then create a pull request once it's ready. I can also review the fork branch before you want to create a PR.
Great
Yes, very much so. That sound like a fine approach. A few more notes:
|
@paraseba Great, thank you very much for all the detail! I really wanted to make sure I got the core logic of what was needed down first. Now time to clean up and make sure this is implemented idiomatically :) |
No description provided.
The text was updated successfully, but these errors were encountered: