Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: First version of rest catalog. #78

Merged
merged 13 commits into from
Oct 28, 2023

Conversation

liurenjie1024
Copy link
Contributor

@liurenjie1024 liurenjie1024 commented Oct 13, 2023

In this pr we add initial support for rest, which finished simple rest apis.

Complex apis such as create table, update table, commits which be added in following pr so that we can make each pr's size reasonable.

related: #60

@liurenjie1024
Copy link
Contributor Author

cc @JanKaul @Xuanwo @Fokko @ZENOTME PTAL

crates/iceberg/Cargo.toml Show resolved Hide resolved
crates/iceberg/src/catalog/rest.rs Outdated Show resolved Hide resolved
crates/iceberg/src/catalog/rest.rs Outdated Show resolved Hide resolved
crates/iceberg/src/catalog/rest.rs Outdated Show resolved Hide resolved
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

crates/catalog/rest/src/catalog.rs Outdated Show resolved Hide resolved
crates/catalog/rest/src/catalog.rs Outdated Show resolved Hide resolved
Comment on lines +347 to +355
/// Update a table to the catalog.
async fn update_table(&self, _table: &TableIdent, _commit: TableCommit) -> Result<Table> {
todo!()
}

/// Update multiple tables to the catalog as an atomic operation.
async fn update_tables(&self, _tables: &[(TableIdent, TableCommit)]) -> Result<()> {
todo!()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the way that we want to expose update tables. It is important that the right updates and requirements are set, otherwise, race conditions might occur. Also, there is little validation here, for example, can you do two distinct schema changes (so two updates), in a single commit? I would maybe leave this out for now so we can decide later on.

We might want to introduce a similar API as PyIceberg (which is inspired on Java): https://py.iceberg.apache.org/api/#schema-evolution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. It should be called by transaction api rather by user directly. The main challenge is that we should limit its visibility rather than exposing directly to user. I will figure out a way when actually implementing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I have said in pr description, the goal of this pr is to implement simple apis. Others will be left in following pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just some context around the concern: Within Iceberg, when creating a new table, the IDs are being reassigned (python, java). This can lead to confusion and potentially also errors. Therefore we refrain users from assigning IDs as much as possible (or reassigning them).

tbl = table.create_table(
    Schema(
        Field(1, "id", IntegerType()),
        Field(2, "str", StringType())
    )
)

tbl.set_schema(
    Schema(
        Field(1, "id", IntegerType()),
        Field(3, "dt", DateType()),
        Field(2, "str", StringType())
    )
)

Because the IDs are re-assigned, it will evolve from:

Schema {
  1: id
  2: str
}

to:

Schema {
  1: id
  2: dt
  3: str
}

And this will corrupt the table since there is no valid promotion from string to date. The general consensus is that users shouldn't care about the IDs themselves, and this should not be exposed to users through APIs. The API should handle it.

crates/catalog/rest/src/catalog.rs Outdated Show resolved Hide resolved
crates/catalog/rest/src/catalog.rs Outdated Show resolved Hide resolved
@liurenjie1024
Copy link
Contributor Author

cc @Fokko Any other comments?

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start. Thanks for working on it @liurenjie1024, and @Xuanwo @ZENOTME for the review!

@Fokko Fokko merged commit f17bf30 into apache:main Oct 28, 2023
6 checks passed
@Fokko Fokko mentioned this pull request Apr 24, 2024
73 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants