Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support serialize/deserialize DataFile into avro bytes #797

Merged
merged 3 commits into from
Jan 2, 2025

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Dec 13, 2024

This PR exposes the _serde::DataFile so that the user can serialize && deserialize the data file. related issue: #774

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 13, 2024

cc @liurenjie1024 @Xuanwo @Fokko @sdd

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 18, 2024

I change this PR to add interface to help serialize/deserialize DataFile into avro bytes. The idea comes from #774 (comment).

I think it can be a good start for #774. It provides the interface to let user serialize/deserialize the DataFile.

In later, we can discuss whether to let DataFile itself serializable and essentially, it means that we should contain more info in the DataFile and we don't need to provide the info(e.g. partition type) as parameters in the interface.

@ZENOTME ZENOTME changed the title feat: expose _serde::DataFile feat: support serialize/deserialize DataFile into avro bytes Dec 19, 2024
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr. Left some suggestions to improve api consistency, others LGTM.

@@ -656,6 +656,38 @@ mod _const_schema {
})
};

fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn data_file_fields_v2(partition_type: StructType) -> Vec<NestedFieldRef> {
fn data_file_fields_v2(partition_type: &StructType) -> Vec<NestedFieldRef> {

]
}

pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(super) fn data_file_schema_v2(partition_type: StructType) -> Result<AvroSchema, Error> {
pub(super) fn data_file_schema_v2(partition_type: &StructType) -> Result<AvroSchema, Error> {

)),
];
let schema = Schema::builder().with_fields(fields).build()?;
schema_to_avro_schema("manifest_entry", &schema)
}

fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn data_file_fields_v1(partition_type: StructType) -> Vec<NestedFieldRef> {
fn data_file_fields_v1(partition_type: &StructType) -> Vec<NestedFieldRef> {

crates/iceberg/src/spec/manifest.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/manifest.rs Outdated Show resolved Hide resolved
crates/iceberg/src/spec/manifest.rs Outdated Show resolved Hide resolved
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small nit.

crates/iceberg/src/spec/manifest.rs Outdated Show resolved Hide resolved
@ZENOTME ZENOTME requested a review from Xuanwo January 2, 2025 05:41
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ZENOTME for working this!

@Xuanwo Xuanwo merged commit 09fa1fa into apache:main Jan 2, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants