Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounting writer #72

Closed
wants to merge 34 commits into from
Closed

Conversation

thorfour
Copy link

@thorfour thorfour commented May 1, 2024

No description provided.

thorfour and others added 30 commits April 10, 2024 13:29
* replace io.IO with objstore.Bucket
* move Avro schemas out of internal and support template for entries
* Support for Hive catalog
* Support for writing manifest files with builders and writers.
* populate requires specs so they are not nil

* rename hive to hdfs
Spark complains if the field is missing
* table: SnapshotWriter

* remove catalogs

- Glue
- Rest

This removes imports of opencensus that leaks a go routine
* icebucket: used to strip paths from bucket filenames

* hdfs unit test
schemas is an optional field in v1 metadata
* Partition support for manifest lists
* fix: setting upper/lower bounds from Parquet file

Use the single-value serialization required by the spec

* sort the schema by name

* write the table schema into the manifest metadata

* correctly set the upper/lower bounds of the manifest entry

Using the merged table schema while this entry is being written
it will leave empty fields if the field doesn't exist in the file

* update the partition spec if schema changes
* added options for deleting metadata files after commit
Put snapshotwriter into it's own struct and have the hdfs snapshotwriter
embed the snapshot writer to perform the generic portions of the writer

This will allow future table implementations to reuse the writer.
Previously was storing 64 bit values with 32bit precision
Previously was doing a byte comparison regardless of type which doesn't
work for the little endian encoded integers
thorfour and others added 4 commits April 30, 2024 13:18
If a snapshot is expired; walk the expired snapshots manifests and
determine if any are orphaned and require removal
* calculate added rows for manifest

* fix unit test
Then we don't have to call attributes
@thorfour thorfour closed this May 1, 2024
@thorfour thorfour deleted the accounting-writer branch May 1, 2024 21:21
@thorfour thorfour restored the accounting-writer branch May 1, 2024 21:21
@thorfour thorfour deleted the accounting-writer branch May 1, 2024 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant