-
Notifications
You must be signed in to change notification settings - Fork 505
User-Defined Functions #1510
base: master
Are you sure you want to change the base?
User-Defined Functions #1510
Conversation
…tion context not causing problems
…of missing deps, but everything compiles and links thus far
I believe we are at a point now where we can begin the review process for user-defined functions. I have enumerated the files in the diff and grouped them into logical "sections" below (largely corresponding to the component of the system in which they reside). Under the assumption that this PR will have two reviewers, I grouped the groups into two major categories: "frontend" and "backend". These identifiers refer to the layers of the system involved - the files in the "frontend" category include those that implement functionality above the execution engine layer, while those in the "backend" comprise the execution engine and below. I believe the work breakdown between the two is nearly even - there are more files in the "frontend" section, but the changes are more superficial and will not require the same level of scrutiny that those in the "backend" might. Auxiliary + "Frontend"Documentation
Documentation primarily consists of notes that I made for myself when I first started working on this PR. They are things that helped me understand how our implementation works (e.g. how do we use lambdas to make queries within UDFs work?). Anyone can take a glance at these if they feel so inclined. If we are of the mind that these are not generally useful to have around, I can remove them from the repository and just keep them for myself. Tests
The important tests to look at will be TPL tests for TPL closures. I covered some basic test cases but didn't actually get too crazy with the TPL unit tests. The C++ unit tests I left entirely untouched from Tanuj's original pull request. The primary tests for UDF functionality are the JUnit integration tests. The functions are defined in Network
Network changes are trivial, just adding support for new SQL statements (i.e. Traffic Cop
Nothing major is changed in the traffic cop. I added support for Parser
Most of the changes in the parser are trivial and not worth much time. I updated the Postgres parser to add support for Beyond that, the Binder
We require changes to the binder because now, when we are binding a query, we may be doing so in the context of a user-defined function (i.e. a SQL query embedded in a UDF, either directly or in the form of a query-fed for-loop). Therefore, we may encounter names during binding that refer to PL/pgSQL variables, and we need to be able to recognize and resolve these. Planner
Changes to the planner are made to support Optimizer
It looks like the changes to the optimizer are non-trivial because so many files are touched, but all I did here is update the necessary files to add the Catalog
Changes to the catalog are relatively minor. I just cleaned up the API related to creation, manipulation (querying), and dropping of procedures. "Backend"Execution Engine (SQL)
The DDL executors "tie together" all of the functionality of UDFs. Execution Engine (Parser)
Updates to the execution engine parser are made to add support for TPL closures, which manifest as lambda expressions. Updates in the parser are minor, and should be unsurprising for anyone familiar with parsers. Execution Engine (AST + Semantic Analysis)
Most updates to the execution engine AST and semantic analysis components are made to add support for TPL closures. The largest part of the diff in this section, however, comes from new files Execution Engine (Compiler)
Naturally, updates to the compiler constitute the largest part of this pull request. I will call out two specific places to look in this section. First, and most importantly, all of the code generation we do for UDFs is implemented in Second, I had to update some fundamental aspects of code generation at the intersection of operator translators and pipelines. To make a long story short, because we now (sometimes) execute SQL queries embedded in the context of a UDF, I had to make the function signature for some of the top-level pipeline functions more flexible. For instance, embedded queries must have access to the TPL closure that implements their output callback in the top-level output translator. For this reason, the pipeline Run function must accept an additional parameter - the output callback. I am not particularly thrilled about how much complexity this adds to the code generation infrastructure, and there might be a larger discussion here regarding how to accomplish this in a more principled way. However, for now, this implementation works and does not affect queries that do not make use of output callbacks. Execution Engine (VM)
Despite the number of files that are touched in this section, changes to the VM are actually relatively minor. We add some new bytecode operations in order to "inject" parameter values from a PL/pgSQL function into an embedded SQL query. The bytecodes themselves are simple. The only updates made at the LLVM-level (i.e. |
…olved to the correct type
This PR adds support for user-defined functions.
Background
For his master's thesis work, Tanuj (@tanujnay112 ) implemented support for user-defined functions in NoisePage on a branch in his fork of the project, the most recent version of which is here. However, because his research was primarily focused on an evaluation of different UDF performance enhancements (see the Froid paper for an introduction to UDF inlining) he also implemented some degree of support for other big name features, namely common table expressions and lateral joins. These two features are now largely being handled by a separate PR, so to avoid overlap, reduce the blast radius of the PRs, and (hopefully) integrate these features in a more timely manner, we are splitting the functionality implemented by Tanuj (and others) into distinct PRs.
Therefore, this PR is concerned with cherry-picking the UDF-relevant components from the existing fork, and preparing them to be integrated into
master
in a clean, controlled manner.Starting Point
The basic statistics for the original PR for user-defined functions and common table expressions are as follows:
Excluding non-source files (e.g. Java, Python, etc.) we have:
.cpp
) Files.hpp
) Files.tpl
) Test FilesClearly we need some way to more accurately assess both the scale of the PR as it relates to user-defined functions alone, as well as a way to track current progress, given the large number of components of the system that will be affected.
Current Status
The enumeration below lists all of the files in the original PR that pertain to UDF support. While the primary goal is simply to integrate these into the current
master
branch, I reserve the right to perform any refactoring I see fit while doing so.Binder (5/5)
src/binder/bind_node_visitor.cpp
src/include/binder/bind_node_visitor.h
src/binder/binder_context.cpp
src/include/binder/binder_context.h
src/include/binder/binder_sherpa.h
Catalog (1/1)
src/catalog/database_catalog.cpp
Execution: AST (13/13)
src/execution/ast/ast.cpp
src/include/execution/ast/ast.h
src/execution/ast/ast_clone.cpp
src/include/execution/ast/ast_clone.h
src/execution/ast/ast_dump.cpp
src/execution/ast/ast_pretty_print.cpp
src/execution/ast/context.cpp
src/execution/ast/type.cpp
src/include/execution/ast/type.h
src/execution/ast/type_printer.cpp
src/include/execution/ast/ast_node_factory.h
src/include/execution/ast/builtins.h
src/include/execution/compiler/ast_fwd.h
Execution: Compiler (18/18)
src/execution/compiler/codegen.cpp
src/include/execution/compiler/codegen.h
src/execution/compiler/compilation_context.cpp
src/include/execution/compiler/compilation_context.h
src/execution/compiler/executable_query.cpp
src/include/execution/compiler/executable_query.h
src/execution/compiler/executable_query_builder.cpp
src/execution/compiler/expression/expression_translator.cpp
src/include/execution/compiler/expression/expression_translator.h
src/execution/compiler/expression/function_translator.cpp
src/include/execution/compiler/expression/function_translator.h
src/execution/compiler/function_builder.cpp
src/include/execution/compiler/function_builder.h
src/execution/compiler/operator/output_translator.cpp
src/execution/compiler/operator/operator_translator.cpp
src/include/execution/compiler/operator/operator_translator.h
src/execution/compiler/pipeline.cpp
src/include/execution/compiler/pipeline.h
Execution: Exec (4/4)
src/execution/exec/execution_context.cpp
src/execution/exec/output.cpp
src/include/execution/exec/execution_context.h
src/include/execution/exec/output.h
Execution: Functions (1/1)
src/include/execution/functions/function_context.h
Execution: Parsing (4/4)
src/execution/parsing/parser.cpp
src/include/execution/parsing/parser.h
src/execution/parsing/scanner.cpp
src/include/execution/parsing/token.h
Execution: SEMA (9/9)
src/execution/sema/scope.cpp
src/include/execution/sema/scope.h
src/execution/sema/sema_builtin.cpp
src/execution/sema/sema_checking.cpp
src/execution/sema/sema_decl.cpp
src/execution/sema/sema_expr.cpp
src/execution/sema/sema_stmt.cpp
src/execution/sema/sema_type.cpp
src/include/execution/sema/error_message.h
Execution: SQL (2/2)
src/execution/sql/ddl_executors.cpp
src/include/execution/sql/ddl_executors.h
Execution: VM (13/13)
src/execution/vm/bytecode_emitter.cpp
src/include/execution/vm/bytecode_emitter.h
src/execution/vm/bytecode_function_info.cpp
src/include/execution/vm/bytecode_function_info.h
src/execution/vm/bytecode_generator.cpp
src/include/execution/vm/bytecode_generator.h
src/execution/vm/bytecode_handlers.cpp
src/include/execution/vm/bytecode_handlers.h
src/execution/vm/bytecode_module.cpp
src/execution/vm/llvm_engine.cpp
src/execution/vm/module.cpp
src/execution/vm/vm.cpp
src/include/execution/vm/bytecodes.h
Network (1/1)
src/include/network/network_defs.h
Parser (4/4)
src/parser/postgresparser.cpp
src/include/parser/postgresparser.h
src/include/parser/create_function_statement.h
src/include/parser/expression/column_value_expression.h
src/parser/expression/constant_value_expression.cpp
src/include/parser/expression/constant_value_expression.h
Parser: UDF (10/10)
src/include/parser/udf/ast_node_visitor.h
src/parser/udf/ast_nodes.cpp
src/include/parser/udf/ast_nodes.h
src/include/parser/udf/udf_ast_context.h
src/parser/udf/udf_codegen.cpp
src/include/parser/udf/udf_codegen.h
src/parser/udf/udf_handler.cpp
src/include/parser/udf/udf_handler.h
src/parser/udf/udf_parser.cpp
src/include/parser/udf/udf_parser.h
Traffic Cop (2/2)
src/traffic_cop/traffic_cop.cpp
src/traffic_cop/traffic_cop_util.cpp
TPL Test Files (4/4)
sample_tpl/agg.tpl
sample_tpl/call.tpl
sample_tpl/param.tpl
sample_tpl/struct.tpl
Questions / Comments
database_catalog.cpp
, the original PR essentially hand-rolled the functionality that was already present inPgProcImpl
. Why? Is there some shortcoming of the implementation withinPgProcImpl
that I have not encountered yet? Regardless, the update should be made inPgProcImpl
rather than inDatabaseCatalog
as it was in the original PR.-lambda
suffix.src/parser/udf/
andsrc/include/parser/udf/
directories, I moved them to their respective parts of the system that I thought made sense. This involved creating newudf/
subdirectories inparser/
,execution/compiler/
andexecution/ast
.src/parser/udf/
andsrc/include/parser/udf/
directories, I omitted three files that were either empty, unreachable (not included anywhere) or entirely commented out:src/parser/udf/ast_nodes.cpp
,src/parser/udf/udf_handler.cpp
, andsrc/include/parser/udf/udf_handler.h
.binder_context.cpp
andbinder_context.h
as files that were required as part of the integration. This is not the case; the changes to these files are only concerned with CTE implementation.ast_fwd.h
header in bothinclude/execution/ast/
andinclude/execution/compiler
; the files are identical, except when you update one and neglect to update the other... I spent about half an hour trying to debug an issue related to this. We should probably remove the duplicate forward declaration files.binder_test.cpp
as having modifications related to UDFs. This is not the case; I have since removed this file from theTODO
list.