-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when chunk coordinates are invalid #395
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good. Some comments added.
icechunk/src/format/snapshot.rs
Outdated
/// # Returns | ||
/// | ||
/// An iterator over the maximum permitted chunk indices. | ||
fn max_chunk_indices_permitted(&self) -> impl Iterator<Item = u64> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think chunk indices are u32:
pub struct ChunkIndices(pub Vec<u32>);
At the beginning of this method, it could be a good place to:
debug_assert_eq!(self.shape.len(), self.chunk_shape.0.len())
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new commit to address this. I ended up returning a ChunkIndices
type from this using collect()
on the iterator and type casting the calculation result to u32. Not sure if there is a performance consideration with using collect prior to returning, but it seemed to make sense that the type returned be ChunkIndices
. This way the actual validation function compares two ChunkIndices.
Shape and chunk shape were both u64, and I assume chunk indices are u32 since they will always be much smaller values.
Also added the debug assert, which is causing tests in repository.rs
to fail. I noticed this is because the chunk shape initialized is a single value, even when the shape is multiple. Do we assume that if a single chunk shape value is present, it applies to all dimensions? Or should this fail? Example of test data from one of the existing tests.
let zarr_meta = ZarrArrayMetadata {
shape: vec![1, 1, 2],
data_type: DataType::Int32,
chunk_shape: ChunkShape(vec![NonZeroU64::new(2).unwrap()]),
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up returning a ChunkIndices
perfect!
I assume chunk indices are u32
yes, you cand valitade looking here
chunk shape initialized is a single value, even when the shape is multiple.
this is probably a bad test, we should fix it to match. We need to add more validations to the metadata to avoid this kind of thing, but it's not critical because zarr also validates.
Do we assume that if a single chunk shape value is present, it applies to all dimensions
No, I don't think so, probably just a bad test, please change it to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answers! I will return to this PR next week after the holiday.
icechunk/src/format/snapshot.rs
Outdated
self.shape | ||
.iter() | ||
.zip(self.chunk_shape.0.iter()) | ||
.map(|(s, cs)| ((s + cs.get() - 1)) / cs.get() - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The math looks a bit weird to me. Shouldn't it be just s / cs.get() - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I don't know why I left this expanded like this. Simplified it down to (s - 1) / cs.get()
, which should be equivalent and also seems to handle all cases properly.
I added tests to confirm this, but if there are additional edge cases I haven't thought of, let me know. Will continue to think on it.
icechunk/src/format/snapshot.rs
Outdated
/// # Errors | ||
/// | ||
/// Returns `IcechunkFormatError::ChunkCoordinatesNotFound` if the chunk coordinates are invalid. | ||
pub fn valid_chunk_coord(&self, coord: &ChunkIndices) -> IcechunkResult<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this function doesn't really fail. Things would be easier if you make it return a simple bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
I am not sure I am handling the false case in the most elegant way in set_chunk_ref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great @emacollins
icechunk/src/format/snapshot.rs
Outdated
serde_json::Value::from(42), | ||
)))), | ||
}], | ||
storage_transformers: Some(vec![StorageTransformer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can pass None
here to make it shorter
icechunk/src/format/snapshot.rs
Outdated
serde_json::Value::from(42), | ||
)))), | ||
}]), | ||
dimension_names: Some(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
icechunk/src/format/snapshot.rs
Outdated
let coord2 = ChunkIndices(vec![10, 11, 10]); | ||
let coord3 = ChunkIndices(vec![0, 0 ,0]); | ||
|
||
assert_eq!(true, zarr_meta1.valid_chunk_coord(&coord1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace these by assert!(...)
or assert!(! ...)
message: "getting an array".to_string(), | ||
}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add a test for Repository
too? that validates that the error is correctly bubbling up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emacollins Let's just change the error as shown in my comment and we are ready to merge. Thank you!
self.change_set.set_chunk_ref(node_snapshot.id, coord, data); | ||
Ok(()) | ||
} else { | ||
Err(RepositoryError::FormatError(IcechunkFormatError::ChunkCoordinatesNotFound { coords: coord })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that i think about it, this is not really a FormatError
but a user error. This deserves a new addition to RepositoryError
, something like InvalidIndex
or something like that.
@emacollins you may also want to rebase your PR so it has no conflicts. |
Closes #175