Spec: `snapshot_id` is optional for V1 #8704

Fokko · 2023-10-03T10:38:06Z

It should be required according to the spec:

And with Spark, we just write the V2 struct for a V1 table:

{
    "status": 1,
    "snapshot_id": {
        "long": 3668892875277885400
    },
    "data_sequence_number": null,
    "file_sequence_number": null,
    "data_file": {
        "content": {
            "int": 0
        },
        "file_path": "s3://warehouse/default/coordinates/data/00000-0-0397b63a-731b-4ab6-8bde-25672c92546c-0.parquet",
        "file_format": "PARQUET",
        "partition": {},
        "record_count": 3,
        "file_size_in_bytes": 1108,
        "block_size_in_bytes": {
            "long": 67108864
        },
        "column_sizes": {
            "array": [
                {
                    "key": 1,
                    "value": 113
                },
                {
                    "key": 2,
                    "value": 113
                }
            ]
        },
        "value_counts": {
            "array": [
                {
                    "key": 1,
                    "value": 3
                },
                {
                    "key": 2,
                    "value": 3
                }
            ]
        },
        "null_value_counts": {
            "array": [
                {
                    "key": 1,
                    "value": 0
                },
                {
                    "key": 2,
                    "value": 0
                }
            ]
        },
        "nan_value_counts": {
            "array": []
        },
        "lower_bounds": {
            "array": [
                {
                    "key": 1,
                    "value": "ß3\\u0012¡\\u0011\nJ@"
                },
                {
                    "key": 2,
                    "value": "´è�\n¸'\\u0011@"
                }
            ]
        },
        "upper_bounds": {
            "array": [
                {
                    "key": 1,
                    "value": "ÑvLÝ�1J@"
                },
                {
                    "key": 2,
                    "value": "\\u0002\\u0012M ��\\u0013@"
                }
            ]
        },
        "key_metadata": null,
        "split_offsets": {
            "array": [
                4
            ]
        },
        "equality_ids": {
            "array": []
        },
        "sort_order_id": {
            "int": 0
        },
        "spec_id": {
            "int": 0
        }
    }
}

Where:

"snapshot_id": {
        "long": 3668892875277885400
    }

Indicates that it is an optional field (it is encoded as a union [None, Long], where the long is present).

Disclaimer: Don't look too much at the code, I was playing around to see what's needed to get this fixed. But the fixes are not correct since I just generated new SnapshotIds (but they should come from the snapshot.

rdblue · 2023-10-28T20:01:19Z

I'm going to close this. The reason why it was optional is that the snapshot_id can be inherited in v1 using a table setting to enable breaking forward compatibility.

JFinis · 2023-12-12T14:13:36Z

@rdblue you closed this issue, so the code continues treating the snapshot-id as optional. However, the spec still says it's required in v1. Do I see it correctly, that if it is considered optional in v1, then the spec should be fixed?

@Fokko FYI

Spec: snapshot_id is optional for V1

c429b01

github-actions bot added spark core flink labels Oct 3, 2023

rdblue closed this Oct 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec: `snapshot_id` is optional for V1 #8704

Spec: `snapshot_id` is optional for V1 #8704

Fokko commented Oct 3, 2023 •

edited

Loading

rdblue commented Oct 28, 2023

JFinis commented Dec 12, 2023

Spec: snapshot_id is optional for V1 #8704

Spec: snapshot_id is optional for V1 #8704

Conversation

Fokko commented Oct 3, 2023 • edited Loading

rdblue commented Oct 28, 2023

JFinis commented Dec 12, 2023

Spec: `snapshot_id` is optional for V1 #8704

Spec: `snapshot_id` is optional for V1 #8704

Fokko commented Oct 3, 2023 •

edited

Loading