From 3616d5089434733404d0d573b59a532b6199ac40 Mon Sep 17 00:00:00 2001 From: Manfred Moser Date: Thu, 12 Dec 2024 14:17:09 -0800 Subject: [PATCH] Add docs for Python UDFs --- docs/src/main/sphinx/udf.md | 1 + docs/src/main/sphinx/udf/function.md | 7 +- docs/src/main/sphinx/udf/introduction.md | 5 +- docs/src/main/sphinx/udf/python.md | 185 ++++++++++++++++++++ docs/src/main/sphinx/udf/python/examples.md | 49 ++++++ 5 files changed, 244 insertions(+), 3 deletions(-) create mode 100644 docs/src/main/sphinx/udf/python.md create mode 100644 docs/src/main/sphinx/udf/python/examples.md diff --git a/docs/src/main/sphinx/udf.md b/docs/src/main/sphinx/udf.md index ac715cc029a9..4e8c54a61420 100644 --- a/docs/src/main/sphinx/udf.md +++ b/docs/src/main/sphinx/udf.md @@ -13,4 +13,5 @@ More details are available in the following sections: udf/introduction udf/function udf/sql +udf/python ``` diff --git a/docs/src/main/sphinx/udf/function.md b/docs/src/main/sphinx/udf/function.md index a8d5b6425dc9..049cb5cfeafb 100644 --- a/docs/src/main/sphinx/udf/function.md +++ b/docs/src/main/sphinx/udf/function.md @@ -11,6 +11,7 @@ FUNCTION name ( [ parameter_name data_type [, ...] ] ) [ CALLED ON NULL INPUT ] [ SECURITY { DEFINER | INVOKER } ] [ COMMENT description] + [ WITH properties AS block] statements ``` @@ -31,7 +32,7 @@ The `type` value after the `RETURNS` keyword identifies the [data type](/language/types) of the UDF output. The optional `LANGUAGE` characteristic identifies the language used for the UDF -definition with `language`. Only `SQL` is supported. +definition with `language`. Only `SQL` or `PYTHON` are supported. The optional `DETERMINISTIC` or `NOT DETERMINISTIC` characteristic declares that the UDF is deterministic. This means that repeated UDF calls with identical @@ -58,6 +59,10 @@ The `COMMENT` characteristic can be used to provide information about the function to other users as `description`. The information is accessible with [](/sql/show-functions). +Use the `WITH properties AS block` section to define the `handler` property for +a [](/udf/python) to specify the Python method to invoke as the function. The +Python code in `block` is encapsulated within `$$`. + The body of the UDF can either be a simple single `RETURN` statement with an expression, or compound list of `statements` in a `BEGIN` block. UDF must contain a `RETURN` statement at the end of the top-level block, even if it's diff --git a/docs/src/main/sphinx/udf/introduction.md b/docs/src/main/sphinx/udf/introduction.md index 4056b707088a..6194776139e3 100644 --- a/docs/src/main/sphinx/udf/introduction.md +++ b/docs/src/main/sphinx/udf/introduction.md @@ -4,7 +4,8 @@ A user-defined function (UDF) is a custom function authored by a user of Trino in a client application. UDFs are scalar functions that return a single output value, similar to [built-in functions](/functions). -UDFs are defined and written using the [SQL routine language](/udf/sql). +UDFs are defined and written using the [SQL routine language](/udf/sql) or +[Python](/udf/python). :::{note} User-defined functions can alternatively be written in Java and deployed as a @@ -15,7 +16,7 @@ plugin. Details are available in the [developer guide](/develop/functions). ## UDF declaration Declare the UDF with a [](/udf/function) keyword using the supported statements -for [](/udf/sql). +for [](/udf/sql) or [](/udf/python). A UDF can be declared and used as an [inline UDF](udf-inline) or declared as a [catalog UDF](udf-catalog) and used repeatedly. diff --git a/docs/src/main/sphinx/udf/python.md b/docs/src/main/sphinx/udf/python.md new file mode 100644 index 000000000000..960bb2e8e12e --- /dev/null +++ b/docs/src/main/sphinx/udf/python.md @@ -0,0 +1,185 @@ +# Python user-defined functions + +A Python user-defined function is a [user-defined function](/udf) that uses the +[Python programming language and statements](python-udf-lang) for the definition +of the function. + +:::{warning} +Python user-defined functions are an experimental feature. +::: + +## Python UDF declaration + +Declare a Python UDF as [inline](udf-inline) or [catalog UDF](udf-catalog) with +the following steps: + +* Use the [](/udf/function) keyword to declare the UDF name and parameters. +* Add the `RETURNS` declaration to specify the data type of the result. +* Set the `LANGUAGE` to `PYTHON`. +* Declare the name of the Python function to call with the `handler` property in + the `WITH` block. +* Use `$$` to enclose the Python code after the `AS` keyword. +* Add the function from the handler property and ensure it returns the declared + data type. +* Expand your Python code section to implement the function using the available + [Python language](python-udf-lang). + +The following snippet shows pseudo-code: + +```text + FUNCTION python_udf_name(input_parameter data_type) + RETURNS result_data_type + LANGUAGE PYTHON + WITH (handler = 'python_function') + AS + $$ + ... + def python_function(input): + return ... + ... + $$ +``` + +A minimal example declares the UDF `doubleup` that returns the input integer +value `x` multiplied by two. The example shows declaration as [](udf-inline) and +invocation with the value `21` to yield the result `42`. + +Set the language to `PYTHON` to override the default `SQL` for [](/udf/sql). +The Python code is enclosed with ``$$` and must use valid formatting. + +```text +WITH + FUNCTION doubleup(x integer) + RETURNS integer + LANGUAGE PYTHON + WITH (handler = 'malzwei') + AS + $$ + def malzwei(a): + return a * 2 + $$ +SELECT doubleup(21); +-- 42 +``` + +The same UDF can also be declared as [](udf-catalog). + +Refer to the [](/udf/python/examples) for more complex use cases and examples. + +```{toctree} +:titlesonly: true +:hidden: + +/udf/python/examples +``` + +(python-udf-lang)= +## Python language details + +The Trino Python UDF integrations uses Python 3.13.0 in a sandboxed environment. +Python code runs within a WebAssembley (WASM) runtime within the Java virtual +machine running Trino. + +Python language rules including indents must be observed. + +Python UDFs therefore only have access to the Python language and core libraries +included in the sandboxed runtime. Access to external resources with network or +file system operations is not supported. Usage of other Python libraries as well +as command line tools or package managers is not supported. + +The following libraries are explicitly removed from the runtime and therefore +not available witin a Python UDF: + +* `_*_support*` +* `_pyrepl` +* `bdb` +* `concurrent` +* `curses` +* `ensurepip` +* `doctest*` +* `idlelib` +* `multiprocessing` +* `pdb` +* `pydoc*` +* `socketserver*` +* `sqlite3` +* `ssl*` +* `subprocess*` +* `tkinter` +* `turtle*` +* `unittest` +* `venv` +* `webbrowser*` +* `wsgiref` +* `xmlrpc` + +## Type mapping + +The following table shows supported Trino types and their corresponding Python +types for input and output values of a Python UDF: + +:::{list-table} File system support properties +:widths: 50, 50 +:header-rows: 1 + +* - Trino type + - Python type +* - row + - tuple +* - array + - list +* - map + - dict +* - boolean + - bool +* - tinyint + - int +* - smallint + - int +* - integer + - int +* - bigint + - int +* - real + - float +* - double + - float +* - decimal + - decimal.Decimal +* - varchar + - str +* - varbinary + - bytes +* - date + - datetime.date +* - time + - datetime.time +* - time with time zone + - datetime.time with datetime.tzinfo +* - timestamp + - datetime.datetime +* - timestamp with time zone + - datetime.datetime with datetime.tzinfo 1 +* - interval year to month + - int as the number of months +* - interval day to second + - datetime.timedelta +* - json + - str +* - uuid + - uuid.UUID +* - ipaddress + - ipaddress.IPv4Address or ipaddress.IPv6Address + +::: + +### Date and time + +Python datetime objects only support microsecond precision. Trino argument +values with greater precision arerounded when converted to Python values, and +Python return values are rounded if the Trino return type has less than +microsecond precision. + +Only fixed offset time zones are supported. Timestamps with political time zones +have the zone converted to the zone's offset for the timestamp's instant. + diff --git a/docs/src/main/sphinx/udf/python/examples.md b/docs/src/main/sphinx/udf/python/examples.md new file mode 100644 index 000000000000..e9e5a2451238 --- /dev/null +++ b/docs/src/main/sphinx/udf/python/examples.md @@ -0,0 +1,49 @@ +# Example Python UDFs + +After learning about [](/udf/python), the following sections show numerous examples +of valid Python UDFs. The UDFs are suitable as [](udf-inline) or [](udf-catalog), +after adjusting the name and the example invocations. + + +```text +WITH FUNCTION xor(a boolean, b boolean) +RETURNS boolean +LANGUAGE PYTHON +WITH (handler = 'bool_xor') +AS $$ +import operator +def bool_xor(a, b): + return operator.xor(a, b) +$$ +SELECT xor(true, false), xor(false, true); +``` + + +```text +WITH FUNCTION reverse_words(s varchar) +RETURNS varchar +LANGUAGE PYTHON +WITH (handler = 'reverse_words') +AS $$ +import re + +def reverse(s): + str = "" + for i in s: + str = i + str + return str + +pattern = re.compile(r"\w+[.,'!?\"]\w*") + +def process_word(word): + # Reverse only words without non-letter signs + return word if pattern.match(word) else reverse(word) + +def reverse_words(payload): + text_words = payload.split(' ') + return ' '.join([process_word(w) for w in text_words]) +$$ +SELECT comment, reverse_words(comment) +FROM nation +WHERE nationkey IN (5, 6, 12) +``` \ No newline at end of file