Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: snapshot_id is optional for V1 #8704

Closed
wants to merge 1 commit into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Oct 3, 2023

It should be required according to the spec:

image

And with Spark, we just write the V2 struct for a V1 table:

{
    "status": 1,
    "snapshot_id": {
        "long": 3668892875277885400
    },
    "data_sequence_number": null,
    "file_sequence_number": null,
    "data_file": {
        "content": {
            "int": 0
        },
        "file_path": "s3://warehouse/default/coordinates/data/00000-0-0397b63a-731b-4ab6-8bde-25672c92546c-0.parquet",
        "file_format": "PARQUET",
        "partition": {},
        "record_count": 3,
        "file_size_in_bytes": 1108,
        "block_size_in_bytes": {
            "long": 67108864
        },
        "column_sizes": {
            "array": [
                {
                    "key": 1,
                    "value": 113
                },
                {
                    "key": 2,
                    "value": 113
                }
            ]
        },
        "value_counts": {
            "array": [
                {
                    "key": 1,
                    "value": 3
                },
                {
                    "key": 2,
                    "value": 3
                }
            ]
        },
        "null_value_counts": {
            "array": [
                {
                    "key": 1,
                    "value": 0
                },
                {
                    "key": 2,
                    "value": 0
                }
            ]
        },
        "nan_value_counts": {
            "array": []
        },
        "lower_bounds": {
            "array": [
                {
                    "key": 1,
                    "value": "ß3\\u0012¡\\u0011\nJ@"
                },
                {
                    "key": 2,
                    "value": "´è�\n¸'\\u0011@"
                }
            ]
        },
        "upper_bounds": {
            "array": [
                {
                    "key": 1,
                    "value": "ÑvLÝ�1J@"
                },
                {
                    "key": 2,
                    "value": "\\u0002\\u0012M ��\\u0013@"
                }
            ]
        },
        "key_metadata": null,
        "split_offsets": {
            "array": [
                4
            ]
        },
        "equality_ids": {
            "array": []
        },
        "sort_order_id": {
            "int": 0
        },
        "spec_id": {
            "int": 0
        }
    }
}

Where:

"snapshot_id": {
        "long": 3668892875277885400
    }

Indicates that it is an optional field (it is encoded as a union [None, Long], where the long is present).

Disclaimer: Don't look too much at the code, I was playing around to see what's needed to get this fixed. But the fixes are not correct since I just generated new SnapshotIds (but they should come from the snapshot.

@rdblue
Copy link
Contributor

rdblue commented Oct 28, 2023

I'm going to close this. The reason why it was optional is that the snapshot_id can be inherited in v1 using a table setting to enable breaking forward compatibility.

@rdblue rdblue closed this Oct 28, 2023
@JFinis
Copy link
Contributor

JFinis commented Dec 12, 2023

@rdblue you closed this issue, so the code continues treating the snapshot-id as optional. However, the spec still says it's required in v1. Do I see it correctly, that if it is considered optional in v1, then the spec should be fixed?

@Fokko FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants