Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Adds SQL function modules #2725

Merged
merged 3 commits into from
Aug 28, 2024
Merged

Conversation

RCHowell
Copy link
Contributor

@RCHowell RCHowell commented Aug 26, 2024

This commit defines SQLFunctions and SQLModule which are responsible for registering the Daft FunctionExpr to be accessible from SQL statements. It is basic and operates on-top-of FunctionExpr .. that is, without modifying it.

  • SQLFunctions – map of names (String) to functions (SQLFunction)
  • SQLFunction – wrapper type over FunctionExpr to get the validate method.
  • SQLModule – holds necessary logic for a FunctionExpr variant i.e. NumericExpr, Utf8Expr, etc.

Overview

/// SQLModule for FunctionExpr::Numeric
impl SQLModule for SQLModuleNumeric {
    fn register(parent: &mut SQLFunctions) {
        use FunctionExpr::Numeric as f;
        use NumericExpr::*;
        parent.add("abs", f(Abs));
        ....
    }
}

// register
let mut functions = SQLFunctions::new();
functions.register::<SQLModuleNumeric>();


// usage
let args = plan_args(args)
let func = functions.get("abs")
let expr = func.validate(args)?

Future Considerations

  • Remove SQLFunctions global lazy-static singleton; rather, make it owned by (field of) the planner.
  • Remove SQLFunction wrapper type in favor of extending FunctionEvaluator trait
  • Add arity checks once there are variants.
  • Use multi-map to support function variants.
  • Extract input validation duplication from to_field and evaluate to be used by SQLPlanner as well.
  • Derive a module from macros on the daft_dsl declarations

@github-actions github-actions bot added the enhancement New feature or request label Aug 26, 2024
@RCHowell
Copy link
Contributor Author

@universalmind303

The workflows fail due to missing max which was added in the initial SQLFunctions. I have only handled scalar functions with this PR, and I would like your opinion on how to proceed with aggregations.

  1. Should I extend SQLModule to support aggregations?
  2. Should aggregations be handled elsewhere in the planner?
  3. Should I delete the single unit test for max in order to get this work in first?

I am inclined to remove the single unit test to help get this in sooner, but I am not sure the best approach for the handling of aggregations. It may not actually be much effort to add them here just like you had done for max.

Please let me know how you would prefer to proceed, thanks!

Copy link

codspeed-hq bot commented Aug 26, 2024

CodSpeed Performance Report

Merging #2725 will not alter performance

Comparing RCHowell:sql-fns (62c6799) with main (3cacd59)

Summary

✅ 10 untouched benchmarks

@universalmind303 universalmind303 self-assigned this Aug 26, 2024
/// ```
ToDatetime,
/// [SQLFunction] extends [FunctionExpr] with basic input validation (arity-check).
pub struct SQLFunction(FunctionExpr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue with using FunctionExpr here is that it prevents us from using our newer exprs that have been moved over to daft-functions. Eventually we plan to migrate all of the FunctionExpr impls over to the newer ScalarUDF instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this work with aggregates (max), as well as ScalarUDF's, we'll likely want to use a trait object instead

pub trait SQLFunction {/*...*/}

pub struct SQLFunctions {
    map: HashMap<String, Arc<dyn SQLFunction>>,
}

@RCHowell
Copy link
Contributor Author

In this revision, I've only added aggregations – I've left some notes for later PRs which might extend an existing trait upon completion of the FunctionExpr -> ScalarUDF migration.

Please let me know if you wish to include ScalarUDF in this as well, or if it's better to get this PR in as-is with a follow-up PR. Thanks!

impl SQLModule for SQLModuleAggs {
fn register(parent: &mut SQLFunctions) {
use AggExpr::*;
// HACK TO USE AggExpr as an enum rather than a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than a ...?

Comment on lines +15 to +21
let nil = Arc::new(Expr::Literal(LiteralValue::Null));
parent.add_agg("count", Count(nil.clone(), daft_core::CountMode::Valid));
parent.add_agg("sum", Sum(nil.clone()));
parent.add_agg("avg", Mean(nil.clone()));
parent.add_agg("mean", Mean(nil.clone()));
parent.add_agg("min", Min(nil.clone()));
parent.add_agg("max", Max(nil.clone()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let nil = Arc::new(Expr::Literal(LiteralValue::Null));
parent.add_agg("count", Count(nil.clone(), daft_core::CountMode::Valid));
parent.add_agg("sum", Sum(nil.clone()));
parent.add_agg("avg", Mean(nil.clone()));
parent.add_agg("mean", Mean(nil.clone()));
parent.add_agg("min", Min(nil.clone()));
parent.add_agg("max", Max(nil.clone()));
use daft_dsl::null_lit;
parent.add_agg("count", Count(null_lit(), daft_core::CountMode::Valid));
parent.add_agg("sum", Sum(null_lit());
parent.add_agg("avg", Mean(null_lit());
parent.add_agg("mean", Mean(null_lit());
parent.add_agg("min", Min(null_lit());
parent.add_agg("max", Max(null_lit());

@universalmind303 universalmind303 merged commit 372d9bc into Eventual-Inc:main Aug 28, 2024
44 checks passed
@RCHowell RCHowell deleted the sql-fns branch August 28, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants