-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl rest catalog + table updates & requirements #146
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I like the idea in general. I've done a first pass to list a bunch of requested changes.
type MetadataV1 struct { | ||
Schema iceberg.Schema `json:"schema"` | ||
type metadataV1 struct { | ||
Schema *iceberg.Schema `json:"schema"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this change mean we'll have to perform nil checks everywhere we try to use the schema now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the metadataV1 is now unexported, so this shoulod be relatively few places. But yes, it does. However, if we assign schemas directly, vet will throw an error as we are copiyng a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next round of comments
catalog/rest.go
Outdated
type Identifier struct { | ||
Namespace []string `json:"namespace"` | ||
Name string `json:"name"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to export this type, we should probably name it RestIdentifier
or something equivalent to separate it from other catalog identifier types.
catalog/rest.go
Outdated
Identifiers []struct { | ||
Namespace []string `json:"namespace"` | ||
Name string `json:"name"` | ||
} `json:"identifiers"` | ||
Identifiers []Identifier `json:"identifiers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the identifier type used anywhere other than here? The reason I had done it inline here was because it was only used in this one spot and i didn't want it to get confused with table.Identifier
catalog/rest.go
Outdated
for k, v := range ret.Config { | ||
config[k] = v | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why loop instead of just doing maps.Copy
(which does the loop internally)
catalog/rest.go
Outdated
for k, v := range ret.Config { | ||
config[k] = v | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question, why not maps.Copy(config, ret.Config)
?
partitions.go
Outdated
// Fields returns a clone of the partition fields in this spec. | ||
func (ps *PartitionSpec) Fields() []PartitionField { | ||
return slices.Clone(ps.fields) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're okay with bumping to go1.23
we could use iter
here and do slices.Values(ps.Fields)
this way we don't have clone the entire slice but also maintain that we disallow users from modifying the slice.
Thus this function would be:
func (ps *PartitionSpec) Fields() iter.Seq[PartitionField] {
return slices.Values(ps.fields)
}
and a user would be able to iterate over the fields:
for f := range spec.Fields() {
// do something
}
Alternately, you could use slices.All
if you want to preserve the index, value
nature of the range
table/metadata.go
Outdated
func containsBy[S []E, E any](elems S, found func(e E) bool) bool { | ||
for _, e := range elems { | ||
if found(e) { | ||
return true | ||
} | ||
} | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with slices.ContainsFunc
table/metadata.go
Outdated
|
||
// maxBy returns the maximum value of extract(e) for all e in elems. | ||
// If elems is empty, returns 0. | ||
func maxBy[S []E, E any](elems S, extract func(e E) int) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ~[]E
for better coverage of generics and future proofing
table/metadata.go
Outdated
} | ||
|
||
func (c *commonMetadata) Ref() SnapshotRef { return c.SnapshotRefs[MainBranch] } | ||
func (c *commonMetadata) Refs() map[string]SnapshotRef { return maps.Clone(c.SnapshotRefs) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets use maps.All
like i mentioned before for slices.All
/slices.Values
so that we can return an iterator without having to clone the whole map.
table/metadata.go
Outdated
func (c *commonMetadata) SnapshotLogs() []SnapshotLogEntry { return slices.Clone(c.SnapshotLog) } | ||
func (c *commonMetadata) PreviousFiles() []MetadataLogEntry { return slices.Clone(c.MetadataLog) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as before about using iterators and slices.Values
or slices.All
table/metadata.go
Outdated
if c == nil || other == nil { | ||
return c == other | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what scenario is c nil?
Hi @zeroshade I sincerely apologise for not getting back to you sooner, I suddenly found myself with very little time on my hands. |
@jwtryg any updates? |
@zeroshade thank you for your patience :) I have implemented your feedback and made some other changes - specifically, being more mindful of what should be exported, and then undoing the manic nil-pointer checking I somehow got myself into. |
My apologies for the long delay here, could you resolve the conflicts? I should be able to give this a new pass of review in the next day or so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a final few questions / nitpicks for after the conflicts are resolved.
@jwtryg Is there anything else outstanding on this or is this ready for review again? |
Hi @zeroshade
I think it's really cool that you are working on a golang-iceberg implementation, and I would like to contribute if I can.
I have tried to finish the rest catalog implementation, and then I have added table updates and requirements, using a new MetadataBuilder struct that can simplify updating metadata. Like the existing implementation, I have tried to stay close to the pyIceberg implementation.
I thought this could be a good starting point for also implementing transactions and table updates. I would love to get your input and change/adjust the implementation accordingly.