Skip to content

Commit

Permalink
Improve docs for add file procedures in Iceberg
Browse files Browse the repository at this point in the history
Reword some aspects to make it more precise and ensure
that it is clear that the catalog and system schema must
be specified.
  • Loading branch information
mosabua committed Oct 8, 2024
1 parent 9f267c1 commit ae96ba1
Showing 1 changed file with 27 additions and 9 deletions.
36 changes: 27 additions & 9 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -568,11 +568,13 @@ nested directories, or `false` to ignore them.
(iceberg-add-files)=
#### Add files

The connector can add files from tables or locations if
The connector can add files from tables or locations to an existing table if
`iceberg.add_files-procedure.enabled` is set to `true` for the catalog.

Use the procedure `system.add_files_from_table` to add existing files from the Hive
table or `system.add_files` to add existing files from specified locations.
Use the procedure `system.add_files_from_table` to add existing files from a
Hive table or `system.add_files` to add existing files from a specified location
to an existing table.

The data files must be the Parquet, ORC, or Avro file format.

:::{warning}
Expand All @@ -584,17 +586,31 @@ relevant schema and table names supplied with the required parameters
`schema_name` and `table_name`:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders')
```

You need to provide a `partition_filter` argument to add files from specified partitions.
Alternatively, you can set the current catalog and schema with a `USE`
statement, and omit catalog and schema information, including the `system`
schema for the procedure from any following `ALTER TABLE` statements:

```sql
USE example.testdb;
ALTER TABLE iceberg_customer_orders
EXECUTE add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders')
```

Use a `partition_filter` argument to add files from specified partitions.
The following example adds files from a partition where the `region` is `ASIA` and
`country` is `JAPAN`:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders',
partition_filter => map(ARRAY['region', 'country'], ARRAY['ASIA', 'JAPAN']))
Expand All @@ -604,7 +620,8 @@ In addition, you can provide a `recursive_directory` argument to migrate a
Hive table that contains subdirectories:

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files_from_table(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files_from_table(
schema_name => 'testdb',
table_name => 'hive_customer_orders',
recursive_directory => 'true')
Expand All @@ -614,12 +631,13 @@ The default value of `recursive_directory` is `fail`, which causes the procedure
to throw an exception if subdirectories are found. Set the value to `true` to add
files from nested directories, or `false` to ignore them.

`add_files` procedure supports adding files from a specified location.
The `add_files` procedure supports adding files from a specified location.
The procedure does not validate file schemas for compatibility with
the target Iceberg table. The `location` property is supported for partitioned tables.

```sql
ALTER TABLE testdb.iceberg_customer_orders EXECUTE add_files(
ALTER TABLE testdb.iceberg_customer_orders
EXECUTE example.system.add_files(
location => 's3://my-bucket/a/path',
format => 'ORC')
```
Expand Down

0 comments on commit ae96ba1

Please sign in to comment.