-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LogicalType for TypeSignature Numeric
and String
, Coercible
#13240
Changes from 11 commits
30be8a0
4b6d433
5cb2aa9
c9ac3c5
ec65ca1
1aba4cb
acd2ceb
3855c6f
804242c
793b0da
2cfa273
6e97cb1
99f2cd3
f96e5d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -19,6 +19,7 @@ | |||||||||||||||||||||||||||||
//! and return types of functions in DataFusion. | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
use arrow::datatypes::DataType; | ||||||||||||||||||||||||||||||
use datafusion_common::types::LogicalTypeRef; | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
/// Constant that is used as a placeholder for any valid timezone. | ||||||||||||||||||||||||||||||
/// This is used where a function can accept a timestamp type with any | ||||||||||||||||||||||||||||||
|
@@ -109,7 +110,7 @@ pub enum TypeSignature { | |||||||||||||||||||||||||||||
/// For example, `Coercible(vec![DataType::Float64])` accepts | ||||||||||||||||||||||||||||||
/// arguments like `vec![DataType::Int32]` or `vec![DataType::Float32]` | ||||||||||||||||||||||||||||||
/// since i32 and f32 can be casted to f64 | ||||||||||||||||||||||||||||||
goldmedal marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||
Coercible(Vec<DataType>), | ||||||||||||||||||||||||||||||
Coercible(Vec<LogicalTypeRef>), | ||||||||||||||||||||||||||||||
/// Fixed number of arguments of arbitrary types | ||||||||||||||||||||||||||||||
/// If a function takes 0 argument, its `TypeSignature` should be `Any(0)` | ||||||||||||||||||||||||||||||
Any(usize), | ||||||||||||||||||||||||||||||
|
@@ -123,8 +124,19 @@ pub enum TypeSignature { | |||||||||||||||||||||||||||||
/// Specifies Signatures for array functions | ||||||||||||||||||||||||||||||
ArraySignature(ArrayFunctionSignature), | ||||||||||||||||||||||||||||||
/// Fixed number of arguments of numeric types. | ||||||||||||||||||||||||||||||
/// See <https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#method.is_numeric> to know which type is considered numeric | ||||||||||||||||||||||||||||||
/// See [`NativeType::is_numeric`] to know which type is considered numeric | ||||||||||||||||||||||||||||||
/// | ||||||||||||||||||||||||||||||
/// [`NativeType::is_numeric`]: datafusion_common | ||||||||||||||||||||||||||||||
Numeric(usize), | ||||||||||||||||||||||||||||||
/// Fixed number of arguments of numeric types. | ||||||||||||||||||||||||||||||
/// See [`NativeType::is_numeric`] to know which type is considered numeric | ||||||||||||||||||||||||||||||
/// This signature accepts numeric string | ||||||||||||||||||||||||||||||
/// Example of functions In Postgres that support numeric string | ||||||||||||||||||||||||||||||
/// 1. Mathematical Functions, like `abs` | ||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i confirm this works in PostgreSQL (was testing with PostgreSQL v 17) select abs('-123'); https://www.postgresql.org/docs/17/typeconv-oper.html#id-1.5.9.7.8 suggests why this works
indeed, However, this doesn't work in PostgreSQL select abs(CAST('-123' AS varchar)); This fails with "Query Error: function abs(character varying) does not exist" BTW, coercion rules can be retrieved from PostgreSQL from https://www.postgresql.org/docs/17/catalog-pg-cast.html PostgreSQL doesn't declare any implicit coercions between (selected) numeric types and (selected) varchar type, and with selected_types(name) AS (VALUES ('int4'), ('int8'), ('float8'), ('varchar'))
select
src.typname src_type,
dst.typname dst_type,
castcontext,
case castcontext
when 'e' then 'only explicit'
when 'a' then 'explicit | implicitly in assignment'
when 'i' then 'implicitly in expressions, as well as the other cases'
else '???' -- undocumented @ https://www.postgresql.org/docs/17/catalog-pg-cast.html
end castcontext_explained
from pg_cast
join pg_type src on pg_cast.castsource = src.oid
join pg_type dst on pg_cast.casttarget = dst.oid
where true
and src.oid != dst.oid
and src.typname in (select name from selected_types)
and dst.typname in (select name from selected_types)
order by src.typname, dst.typname
;
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @findepi. A similar behavior in DuckDB is with
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to follow PostgreSQL behavior, we would need to model There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make sense. We need to model unknown type literal and avoid numeric string to string coercion |
||||||||||||||||||||||||||||||
/// 2. `to_timestamp` | ||||||||||||||||||||||||||||||
/// | ||||||||||||||||||||||||||||||
/// [`NativeType::is_numeric`]: datafusion_common | ||||||||||||||||||||||||||||||
NumericAndNumericString(usize), | ||||||||||||||||||||||||||||||
/// Fixed number of arguments of all the same string types. | ||||||||||||||||||||||||||||||
/// The precedence of type from high to low is Utf8View, LargeUtf8 and Utf8. | ||||||||||||||||||||||||||||||
/// Null is considerd as `Utf8` by default | ||||||||||||||||||||||||||||||
|
@@ -201,7 +213,13 @@ impl TypeSignature { | |||||||||||||||||||||||||||||
TypeSignature::Numeric(num) => { | ||||||||||||||||||||||||||||||
vec![format!("Numeric({num})")] | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
TypeSignature::Exact(types) | TypeSignature::Coercible(types) => { | ||||||||||||||||||||||||||||||
TypeSignature::NumericAndNumericString(num) => { | ||||||||||||||||||||||||||||||
vec![format!("NumericAndNumericString({num})")] | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
TypeSignature::Coercible(types) => { | ||||||||||||||||||||||||||||||
vec![Self::join_types(types, ", ")] | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
TypeSignature::Exact(types) => { | ||||||||||||||||||||||||||||||
vec![Self::join_types(types, ", ")] | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
TypeSignature::Any(arg_count) => { | ||||||||||||||||||||||||||||||
|
@@ -288,6 +306,13 @@ impl Signature { | |||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
pub fn numeric_and_numeric_string(arg_count: usize, volatility: Volatility) -> Self { | ||||||||||||||||||||||||||||||
Self { | ||||||||||||||||||||||||||||||
type_signature: TypeSignature::NumericAndNumericString(arg_count), | ||||||||||||||||||||||||||||||
volatility, | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
/// A specified number of numeric arguments | ||||||||||||||||||||||||||||||
pub fn string(arg_count: usize, volatility: Volatility) -> Self { | ||||||||||||||||||||||||||||||
Self { | ||||||||||||||||||||||||||||||
|
@@ -322,7 +347,7 @@ impl Signature { | |||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||||||
/// Target coerce types in order | ||||||||||||||||||||||||||||||
pub fn coercible(target_types: Vec<DataType>, volatility: Volatility) -> Self { | ||||||||||||||||||||||||||||||
pub fn coercible(target_types: Vec<LogicalTypeRef>, volatility: Volatility) -> Self { | ||||||||||||||||||||||||||||||
Self { | ||||||||||||||||||||||||||||||
type_signature: TypeSignature::Coercible(target_types), | ||||||||||||||||||||||||||||||
volatility, | ||||||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make a distinction between
Debug
andDisplay
. Currently, we print something likeI imagine we can print
Int32
toInt32
, printJSON
toJSON
... 🤔However, I think it may not be the point of this PR. We can do it in another PR.