Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Update spark partition transform as per spec. #8640

Merged
merged 1 commit into from
Sep 25, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions docs/spark-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,17 @@ PARTITIONED BY (bucket(16, id), days(ts), category)

Supported transformations are:

* `years(ts)`: partition by year
* `months(ts)`: partition by month
* `days(ts)` or `date(ts)`: equivalent to dateint partitioning
* `hours(ts)` or `date_hour(ts)`: equivalent to dateint and hour partitioning
* `year(ts)`: partition by year
* `month(ts)`: partition by month
* `day(ts)` or `date(ts)`: equivalent to dateint partitioning
* `hour(ts)` or `date_hour(ts)`: equivalent to dateint and hour partitioning
* `bucket(N, col)`: partition by hashed value mod N buckets
* `truncate(L, col)`: partition by value truncated to L
* Strings are truncated to the given length
* Integers and longs truncate to bins: `truncate(10, i)` produces partitions 0, 10, 20, 30, ...

Note: Old syntax of `years(ts)`, `months(ts)`, `days(ts)` and `hours(ts)` are also supported for compatibility.

## `CREATE TABLE ... AS SELECT`

Iceberg supports CTAS as an atomic operation when using a [`SparkCatalog`](../spark-configuration#catalog-configuration). CTAS is supported, but is not atomic when using [`SparkSessionCatalog`](../spark-configuration#replacing-the-session-catalog).
Expand Down Expand Up @@ -348,7 +350,7 @@ ALTER TABLE prod.db.sample ADD PARTITION FIELD catalog -- identity transform
```sql
ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id)
ALTER TABLE prod.db.sample ADD PARTITION FIELD truncate(4, data)
ALTER TABLE prod.db.sample ADD PARTITION FIELD years(ts)
ALTER TABLE prod.db.sample ADD PARTITION FIELD year(ts)
-- use optional AS keyword to specify a custom name for the partition field
ALTER TABLE prod.db.sample ADD PARTITION FIELD bucket(16, id) AS shard
```
Expand All @@ -374,7 +376,7 @@ Partition fields can be removed using `DROP PARTITION FIELD`:
ALTER TABLE prod.db.sample DROP PARTITION FIELD catalog
ALTER TABLE prod.db.sample DROP PARTITION FIELD bucket(16, id)
ALTER TABLE prod.db.sample DROP PARTITION FIELD truncate(4, data)
ALTER TABLE prod.db.sample DROP PARTITION FIELD years(ts)
ALTER TABLE prod.db.sample DROP PARTITION FIELD year(ts)
ALTER TABLE prod.db.sample DROP PARTITION FIELD shard
```

Expand All @@ -396,9 +398,9 @@ Be careful when dropping a partition field because it will change the schema of
A partition field can be replaced by a new partition field in a single metadata update by using `REPLACE PARTITION FIELD`:

```sql
ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH days(ts)
ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts)
-- use optional AS keyword to specify a custom name for the new partition field
ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH days(ts) AS day_of_ts
ALTER TABLE prod.db.sample REPLACE PARTITION FIELD ts_day WITH day(ts) AS day_of_ts
```

### `ALTER TABLE ... WRITE ORDERED BY`
Expand Down