Skip to content

Commit

Permalink
Add docs for Python UDFs
Browse files Browse the repository at this point in the history
  • Loading branch information
mosabua committed Dec 13, 2024
1 parent 9bf8355 commit 3616d50
Show file tree
Hide file tree
Showing 5 changed files with 244 additions and 3 deletions.
1 change: 1 addition & 0 deletions docs/src/main/sphinx/udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ More details are available in the following sections:
udf/introduction
udf/function
udf/sql
udf/python
```
7 changes: 6 additions & 1 deletion docs/src/main/sphinx/udf/function.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ FUNCTION name ( [ parameter_name data_type [, ...] ] )
[ CALLED ON NULL INPUT ]
[ SECURITY { DEFINER | INVOKER } ]
[ COMMENT description]
[ WITH properties AS block]
statements
```

Expand All @@ -31,7 +32,7 @@ The `type` value after the `RETURNS` keyword identifies the [data
type](/language/types) of the UDF output.

The optional `LANGUAGE` characteristic identifies the language used for the UDF
definition with `language`. Only `SQL` is supported.
definition with `language`. Only `SQL` or `PYTHON` are supported.

The optional `DETERMINISTIC` or `NOT DETERMINISTIC` characteristic declares that
the UDF is deterministic. This means that repeated UDF calls with identical
Expand All @@ -58,6 +59,10 @@ The `COMMENT` characteristic can be used to provide information about the
function to other users as `description`. The information is accessible with
[](/sql/show-functions).

Use the `WITH properties AS block` section to define the `handler` property for
a [](/udf/python) to specify the Python method to invoke as the function. The
Python code in `block` is encapsulated within `$$`.

The body of the UDF can either be a simple single `RETURN` statement with an
expression, or compound list of `statements` in a `BEGIN` block. UDF must
contain a `RETURN` statement at the end of the top-level block, even if it's
Expand Down
5 changes: 3 additions & 2 deletions docs/src/main/sphinx/udf/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ A user-defined function (UDF) is a custom function authored by a user of Trino
in a client application. UDFs are scalar functions that return a single output
value, similar to [built-in functions](/functions).

UDFs are defined and written using the [SQL routine language](/udf/sql).
UDFs are defined and written using the [SQL routine language](/udf/sql) or
[Python](/udf/python).

:::{note}
User-defined functions can alternatively be written in Java and deployed as a
Expand All @@ -15,7 +16,7 @@ plugin. Details are available in the [developer guide](/develop/functions).
## UDF declaration

Declare the UDF with a [](/udf/function) keyword using the supported statements
for [](/udf/sql).
for [](/udf/sql) or [](/udf/python).

A UDF can be declared and used as an [inline UDF](udf-inline) or declared as a
[catalog UDF](udf-catalog) and used repeatedly.
Expand Down
185 changes: 185 additions & 0 deletions docs/src/main/sphinx/udf/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# Python user-defined functions

A Python user-defined function is a [user-defined function](/udf) that uses the
[Python programming language and statements](python-udf-lang) for the definition
of the function.

:::{warning}
Python user-defined functions are an experimental feature.
:::

## Python UDF declaration

Declare a Python UDF as [inline](udf-inline) or [catalog UDF](udf-catalog) with
the following steps:

* Use the [](/udf/function) keyword to declare the UDF name and parameters.
* Add the `RETURNS` declaration to specify the data type of the result.
* Set the `LANGUAGE` to `PYTHON`.
* Declare the name of the Python function to call with the `handler` property in
the `WITH` block.
* Use `$$` to enclose the Python code after the `AS` keyword.
* Add the function from the handler property and ensure it returns the declared
data type.
* Expand your Python code section to implement the function using the available
[Python language](python-udf-lang).

The following snippet shows pseudo-code:

```text
FUNCTION python_udf_name(input_parameter data_type)
RETURNS result_data_type
LANGUAGE PYTHON
WITH (handler = 'python_function')
AS
$$
...
def python_function(input):
return ...
...
$$
```

A minimal example declares the UDF `doubleup` that returns the input integer
value `x` multiplied by two. The example shows declaration as [](udf-inline) and
invocation with the value `21` to yield the result `42`.

Set the language to `PYTHON` to override the default `SQL` for [](/udf/sql).
The Python code is enclosed with ``$$` and must use valid formatting.

```text
WITH
FUNCTION doubleup(x integer)
RETURNS integer
LANGUAGE PYTHON
WITH (handler = 'malzwei')
AS
$$
def malzwei(a):
return a * 2
$$
SELECT doubleup(21);
-- 42
```

The same UDF can also be declared as [](udf-catalog).

Refer to the [](/udf/python/examples) for more complex use cases and examples.

```{toctree}
:titlesonly: true
:hidden:
/udf/python/examples
```

(python-udf-lang)=
## Python language details

The Trino Python UDF integrations uses Python 3.13.0 in a sandboxed environment.
Python code runs within a WebAssembley (WASM) runtime within the Java virtual
machine running Trino.

Python language rules including indents must be observed.

Python UDFs therefore only have access to the Python language and core libraries
included in the sandboxed runtime. Access to external resources with network or
file system operations is not supported. Usage of other Python libraries as well
as command line tools or package managers is not supported.

The following libraries are explicitly removed from the runtime and therefore
not available witin a Python UDF:

* `_*_support*`
* `_pyrepl`
* `bdb`
* `concurrent`
* `curses`
* `ensurepip`
* `doctest*`
* `idlelib`
* `multiprocessing`
* `pdb`
* `pydoc*`
* `socketserver*`
* `sqlite3`
* `ssl*`
* `subprocess*`
* `tkinter`
* `turtle*`
* `unittest`
* `venv`
* `webbrowser*`
* `wsgiref`
* `xmlrpc`

## Type mapping

The following table shows supported Trino types and their corresponding Python
types for input and output values of a Python UDF:

:::{list-table} File system support properties
:widths: 50, 50
:header-rows: 1

* - Trino type
- Python type
* - row
- tuple
* - array
- list
* - map
- dict
* - boolean
- bool
* - tinyint
- int
* - smallint
- int
* - integer
- int
* - bigint
- int
* - real
- float
* - double
- float
* - decimal
- decimal.Decimal
* - varchar
- str
* - varbinary
- bytes
* - date
- datetime.date
* - time
- datetime.time
* - time with time zone
- datetime.time with datetime.tzinfo
* - timestamp
- datetime.datetime
* - timestamp with time zone
- datetime.datetime with datetime.tzinfo 1
* - interval year to month
- int as the number of months
* - interval day to second
- datetime.timedelta
* - json
- str
* - uuid
- uuid.UUID
* - ipaddress
- ipaddress.IPv4Address or ipaddress.IPv6Address

:::

### Date and time

Python datetime objects only support microsecond precision. Trino argument
values with greater precision arerounded when converted to Python values, and
Python return values are rounded if the Trino return type has less than
microsecond precision.

Only fixed offset time zones are supported. Timestamps with political time zones
have the zone converted to the zone's offset for the timestamp's instant.

49 changes: 49 additions & 0 deletions docs/src/main/sphinx/udf/python/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Example Python UDFs

After learning about [](/udf/python), the following sections show numerous examples
of valid Python UDFs. The UDFs are suitable as [](udf-inline) or [](udf-catalog),
after adjusting the name and the example invocations.


```text
WITH FUNCTION xor(a boolean, b boolean)
RETURNS boolean
LANGUAGE PYTHON
WITH (handler = 'bool_xor')
AS $$
import operator
def bool_xor(a, b):
return operator.xor(a, b)
$$
SELECT xor(true, false), xor(false, true);
```


```text
WITH FUNCTION reverse_words(s varchar)
RETURNS varchar
LANGUAGE PYTHON
WITH (handler = 'reverse_words')
AS $$
import re
def reverse(s):
str = ""
for i in s:
str = i + str
return str
pattern = re.compile(r"\w+[.,'!?\"]\w*")
def process_word(word):
# Reverse only words without non-letter signs
return word if pattern.match(word) else reverse(word)
def reverse_words(payload):
text_words = payload.split(' ')
return ' '.join([process_word(w) for w in text_words])
$$
SELECT comment, reverse_words(comment)
FROM nation
WHERE nationkey IN (5, 6, 12)
```

0 comments on commit 3616d50

Please sign in to comment.