-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when chunk coordinates are invalid #395
base: main
Are you sure you want to change the base?
Changes from 3 commits
089307a
107a72c
4804bfa
8b03ab2
d566e82
3742bf1
a2d330e
9e19403
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,7 @@ use crate::metadata::{ | |
use super::{ | ||
format_constants, manifest::ManifestRef, AttributesId, IcechunkFormatError, | ||
IcechunkFormatVersion, IcechunkResult, ManifestId, NodeId, ObjectId, Path, | ||
SnapshotId, TableOffset, | ||
SnapshotId, TableOffset, ChunkIndices | ||
}; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] | ||
|
@@ -49,6 +49,55 @@ pub struct ZarrArrayMetadata { | |
pub dimension_names: Option<DimensionNames>, | ||
} | ||
|
||
impl ZarrArrayMetadata { | ||
|
||
/// Returns an iterator over the maximum permitted chunk indices for the array. | ||
/// | ||
/// This function calculates the maximum chunk indices based on the shape of the array | ||
/// and the chunk shape. | ||
/// | ||
/// # Returns | ||
/// | ||
/// An iterator over the maximum permitted chunk indices. | ||
fn max_chunk_indices_permitted(&self) -> impl Iterator<Item = u64> + '_ { | ||
self.shape | ||
.iter() | ||
.zip(self.chunk_shape.0.iter()) | ||
.map(|(s, cs)| ((s + cs.get() - 1)) / cs.get() - 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The math looks a bit weird to me. Shouldn't it be just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I don't know why I left this expanded like this. Simplified it down to I added tests to confirm this, but if there are additional edge cases I haven't thought of, let me know. Will continue to think on it. |
||
} | ||
|
||
/// Validates the provided chunk coordinates for the array. | ||
/// | ||
/// This function checks if the provided chunk indices are valid for the array. | ||
/// | ||
/// # Arguments | ||
/// | ||
/// * `coord` - The chunk indices to validate. | ||
/// | ||
/// # Returns | ||
/// | ||
/// An `IcechunkResult` indicating whether the chunk coordinates are valid. | ||
/// | ||
/// # Errors | ||
/// | ||
/// Returns `IcechunkFormatError::ChunkCoordinatesNotFound` if the chunk coordinates are invalid. | ||
pub fn valid_chunk_coord(&self, coord: &ChunkIndices) -> IcechunkResult<bool> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, this function doesn't really fail. Things would be easier if you make it return a simple There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! I am not sure I am handling the false case in the most elegant way in |
||
|
||
let valid: bool = coord | ||
.0 | ||
.iter() | ||
.zip(self.max_chunk_indices_permitted()) | ||
.all(|(index, index_permitted)| *index <= index_permitted as u32); | ||
|
||
if valid { | ||
Ok(true) | ||
} else { | ||
Err(IcechunkFormatError::ChunkCoordinatesNotFound { coords: (coord.clone()) }) | ||
} | ||
|
||
} | ||
} | ||
|
||
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] | ||
pub enum NodeData { | ||
Array(ZarrArrayMetadata, Vec<ManifestRef>), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -424,9 +424,20 @@ impl Repository { | |
coord: ChunkIndices, | ||
data: Option<ChunkPayload>, | ||
) -> RepositoryResult<()> { | ||
self.get_array(&path) | ||
.await | ||
.map(|node| self.change_set.set_chunk_ref(node.id, coord, data)) | ||
|
||
let node_snapshot = self.get_array(&path).await?; | ||
|
||
if let NodeData::Array(zarr_metadata, _, ) = node_snapshot.node_data { | ||
zarr_metadata.valid_chunk_coord(&coord)?; | ||
self.change_set.set_chunk_ref(node_snapshot.id, coord, data); | ||
Ok(()) | ||
} else { | ||
Err(RepositoryError::NotAnArray { | ||
node: node_snapshot, | ||
message: "getting an array".to_string(), | ||
}) | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe we can add a test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
} | ||
|
||
pub async fn get_node(&self, path: &Path) -> RepositoryResult<NodeSnapshot> { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think chunk indices are u32:
At the beginning of this method, it could be a good place to:
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a new commit to address this. I ended up returning a
ChunkIndices
type from this usingcollect()
on the iterator and type casting the calculation result to u32. Not sure if there is a performance consideration with using collect prior to returning, but it seemed to make sense that the type returned beChunkIndices
. This way the actual validation function compares two ChunkIndices.Shape and chunk shape were both u64, and I assume chunk indices are u32 since they will always be much smaller values.
Also added the debug assert, which is causing tests in
repository.rs
to fail. I noticed this is because the chunk shape initialized is a single value, even when the shape is multiple. Do we assume that if a single chunk shape value is present, it applies to all dimensions? Or should this fail? Example of test data from one of the existing tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect!
yes, you cand valitade looking here
this is probably a bad test, we should fix it to match. We need to add more validations to the metadata to avoid this kind of thing, but it's not critical because zarr also validates.
No, I don't think so, probably just a bad test, please change it to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for answers! I will return to this PR next week after the holiday.