diff --git a/format/spec.md b/format/spec.md index 32f86428f1e1..a3e12d482f87 100644 --- a/format/spec.md +++ b/format/spec.md @@ -1166,8 +1166,8 @@ Each partition field in the fields list is stored as an object. See the table fo | **`month`** | `JSON string: "month"` | `"month"` | | **`day`** | `JSON string: "day"` | `"day"` | | **`hour`** | `JSON string: "hour"` | `"hour"` | -| **`Partition Field`** | `JSON object: {`
  `"source-id": ,`
  `"field-id": ,`
  `"name": ,`
  `"transform": `
`}` | `{`
  `"source-id": 1,`
  `"field-id": 1000,`
  `"name": "id_bucket",`
  `"transform": "bucket[16]"`
`}` | -| **`Partition Field with multi-arg transform`** [2] | `JSON object: {`
  `"source-id": -1,`
  `"source-ids": ,`
  `"field-id": ,`
  `"name": ,`
  `"transform": `
`}` | `{`
  `"source-id": -1,`
  `"source-ids": [1,2],`
  `"field-id": 1000,`
  `"name": "id_type_bucket",`
  `"transform": "bucketV2[16]"`
`}` | +| **`Partition Field`** [2] | `JSON object: {`
  `"source-id": ,`
  `"field-id": ,`
  `"name": ,`
  `"transform": `
`}` | `{`
  `"source-id": 1,`
  `"field-id": 1000,`
  `"name": "id_bucket",`
  `"transform": "bucket[16]"`
`}` | +| **`Partition Field with multi-arg transform`** [3] | `JSON object: {`
  `"source-id": -1,`
  `"source-ids": ,`
  `"field-id": ,`
  `"name": ,`
  `"transform": `
`}` | `{`
  `"source-id": -1,`
  `"source-ids": [1,2],`
  `"field-id": 1000,`
  `"name": "id_type_bucket",`
  `"transform": "bucketV2[16]"`
`}` | In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated `partition-spec` field in table metadata. The object format should be used unless otherwise noted in this spec. @@ -1177,8 +1177,8 @@ Notes: 1. For multi-arg bucket, the serialized form is `bucketV2[N]` instead of `bucket[N]` to distinguish it from the single-arg bucket transform. Therefore, old readers/writers will identify this transform as an unknown transform, old writer will stop writing the table if it encounters this transform, but old readers would still be able to read the table by scanning all the partitions. This makes adding multi-arg transform a forward-compatible change, but not a backward-compatible change. -2. For partition fields with a transform of multiple arguments, the ids of the source fields are set on `source-ids`. To preserve backward compatibility, `source-id` is set to -1. -3. For partition field with a transform with a single argument, the id of the source field is set on `source-id`, and `source-ids` is omitted. +2. For partition fields with a transform with a single argument, the id of the source field is set on `source-id`, and `source-ids` is omitted. +3. For partition fields with a transform of multiple arguments, the ids of the source fields are set on `source-ids`. To preserve backward compatibility, `source-id` is set to -1. ### Sort Orders @@ -1191,15 +1191,14 @@ Sort orders are serialized as a list of JSON object, each of which contains the Each sort field in the fields list is stored as an object with the following properties: -|Field|JSON representation|Example| -|--- |--- |--- | -|**`Sort Field`**|`JSON object: {`
  `"transform": ,`
  `"source-id": ,`
  `"direction": ,`
  `"null-order": `
`}`|`{`
  ` "transform": "bucket[4]",`
  ` "source-id": 3,`
  ` "direction": "desc",`
  ` "null-order": "nulls-last"`
`}`| +| Field | JSON representation | Example | +|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| **`Sort Field`** [1] | `JSON object: {`
  `"transform": ,`
  `"source-id": ,`
  `"direction": ,`
  `"null-order": `
`}` | `{`
  ` "transform": "bucket[4]",`
  ` "source-id": 3,`
  ` "direction": "desc",`
  ` "null-order": "nulls-last"`
`}` | +| **`Sort Field with multi-arg transform`** [2] | `JSON object: {`
  `"transform": ,`
  `"source-id": -1,`
  `"source-ids": ,`
  `"direction": ,`
  `"null-order": `
`}` | `{`
  ` "transform": "bucketV2[4]",`
  ` "source-id": -1,`
  ` "source-id": [1,2],`
  ` "direction": "desc",`
  ` "null-order": "nulls-last"`
`}` | -Similar with partition fields, sort fields could also contain multi source-ids for sorting: - -| Field | JSON representation | Example | -|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| **`Sort Field(multi-arg transform)`** | `JSON object: {`
  `"transform": ,`
  `"source-id": -1,`
  `"source-ids": ,`
  `"direction": ,`
  `"null-order": `
`}` | `{`
  ` "transform": "bucketV2[4]",`
  ` "source-id": -1,`
  ` "source-id": [1,2],`
  ` "direction": "desc",`
  ` "null-order": "nulls-last"`
`}` | +Notes: +1. For sort fields with a transform with a single argument, the id of the source field is set on `source-id`, and `source-ids` is omitted. +2. For sort fields with a transform of multiple arguments, the ids of the source fields are set on `source-ids`. To preserve backward compatibility, `source-id` is set to -1. The following table describes the possible values for the some of the field within sort field: