-
Notifications
You must be signed in to change notification settings - Fork 302
UDF Introduction en
Introduction to UDFs
Function | Feature | Version |
---|---|---|
contain | Determines that the field value is within a given set. A single value or multiple values are supported. | ALL |
notcontain | Determines that the field value is not within the given set. A single value or multiple values are supported. | ALL |
MATCHINDEX | Returns inverted indexes of the specified field based on the specified conditions. | ALL |
QUERY | Returns inverted indexes based on the specified conditions. The syntax of this function is the same as the syntax of the query clause supported by Havenask V3. | ALL |
sp_query_match | Returns the inverted indexes that match the conditions that you specified by using the sp syntax. | >=3.7.5 |
hashcombine | Merges values in multiple fields of the INT64 type into one value of the INT64 type. | >=3.7.5 |
rangevalue | Converts values in the specified ranges in a field to values of the specified data type. | >=3.7.5 |
sphere_distance | Calculates a distance value based on the specified longitude and latitude. | >=3.7.5 |
ha_in | Determines that the field value is within a given set. | >=3.7.5 |
index | Returns the value for multi-value fields by given subscript. | ALL, required |
- Query full data in the table
SELECT nid, price, brand, size FROM phone ORDER BY nid LIMIT 1000
USE_TIME: 0.881, ROW_COUNT: 10
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
1 | 3599 | Huawei | 5.9 |
2 | 4388 | Huawei | 5.5 |
3 | 899 | Xiaomi | 5 |
4 | 2999 | OPPO | 5.5 |
5 | 1299 | Meizu | 5.5 |
6 | 169 | Nokia | 1.4 |
7 | 3599 | Apple | 4.7 |
8 | 5998 | Apple | 5.5 |
9 | 4298 | Apple | 4.7 |
10 | 5688 | Samsung | 5.6 |
- Syntax
boolean contain(int32 a, const string b)
boolean contain(int64 a, const string b)
boolean contain(string a, const string b)
boolean contain(ARRAY<int32> a, const string b)
boolean contain(ARRAY<int64> a, const string b)
boolean contain(ARRAY<string> a, const string b)
- Description
You can use this function to determine whether values in the a field are included in the b value set.
- Parameters
Parameter a: The input is of the int32/int64/string type
with single-value and multi-value parameters. Parameter b: The input is a constant string expression separated by |
, indicating that any item is satisfied.
- Return values
This function returns values of the BOOLEAN type. true: indicates that the value in the a field is included in the b set. false: indicates that the value in the a field is excluded from the b value set.
- Examples
In the following statement, the contain
function is used to query data entries in the rows in which the nid field value is 1, 2, or 3.[]
SELECT nid, price, brand, size FROM phone WHERE contain(nid, '1|2|3') ORDER BY nid LIMIT 100
USE_TIME: 0.059, ROW_COUNT: 3
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
1 | 3599 | Huawei | 5.9 |
2 | 4388 | Huawei | 5.5 |
3 | 899 | Xiaomi | 5 |
- Syntax
boolean notcontain(int32 a, const string b)
boolean notcontain(int64 a, const string b)
boolean notcontain(string a, const string b)
boolean notcontain(ARRAY<int32> a, const string b)
boolean notcontain(ARRAY<int64> a, const string b)
boolean notcontain(ARRAY<string> a, const string b)
- Description
You can use this function to determine whether values in the a field are not included in the b value set.
- Parameters
a: specifies a field in the data table. The field that you specify can be a single-value field or multi-value field of the INT32, INT64, or STRING type.
b: specifies a constant string. The specified constants are separated with |
. If a value matches one of the specified constants, this function returns false.
- Return values
This function returns values of the BOOLEAN type. true: indicates that the value in the a field is not included in the b value set. false: indicates that the value in the a field is included in the b value set.
- Examples
In the following statement, the notcontain
is used to query data entries in the rows in which the nid field values are not 1, 2, or 3.[]
SELECT nid, price, brand, size FROM phone WHERE notcontain(nid, '1|2|3') ORDER BY nid LIMIT 100
USE_TIME: 0.092, ROW_COUNT: 7
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
4 | 2999 | OPPO | 5.5 |
5 | 1299 | Meizu | 5.5 |
6 | 169 | Nokia | 1.4 |
7 | 3599 | Apple | 4.7 |
8 | 5998 | Apple | 5.5 |
9 | 4298 | Apple | 4.7 |
10 | 5688 | Samsung | 5.6 |
- Syntax
boolean MATCHINDEX(const string a, const string b)
boolean MATCHINDEX(const string a, const string b, const string c)
- Description
You can use this function to determine whether a value in the a field includes the value of the b parameter. You can use this function to obtain the index of a field.
You can use this function in only the WHERE clause to optimize inverted indexing in the index retrieval phase.
- Parameters
a: specifies a field of the STRING type. The system creates an inverted index based on the specified field to optimize the query.
b: specifies a keyword of the STRING type.
The value of the b parameter can be used as a search query string for positional indexing. For more information, see query clause.
c: optional. You can specify an analyzer, stop words, and other information.
You can include the following parameters in the value of the c parameter to specify an analyzer, stop words, and other information:
You must specify a colon (:
) between a parameter name and the corresponding value and separate multiple parameters with commas (,
).
- global_analyzer
- specific_index_analyzer
- no_token_indexes
- tokenize_query
- remove_stopwords
- default_op
- Return values
This function returns values of the BOOLEAN type. true: indicates that the value in the a field includes the value of the b parameter. false: indicates that the value in the a field does not include the value of the b parameter.****
- Examples
In the following statement, the MATCHINDEX
function is used to query data entries in the rows in which the title
field values include the Lens keyword.
SELECT nid, brand FROM phone WHERE MATCHINDEX('title', 'Lens')
------------------------------- TABLE INFO ---------------------------
nid | brand |
1 | Huawei |
- Syntax
boolean QUERY(const string a, const string b)
boolean QUERY(const string a, const string b, const string c)
- Description
You can use this function to determine whether a value in the a field includes the value of the b parameter. The system automatically converts the specified keyword to terms and performs a query.
This function supports the syntax of the query clause in SQL statements supported by Havenask V3. For information about the query clause, see query clause.
You can use this function in only the WHERE clause to optimize inverted indexing in the index retrieval phase.
- Parameters
a: specifies a field of the STRING type. The system automatically uses the specified field as the value of the default_index parameter in the config clause. For information about the config clause, see config clause.
b: specifies a keyword of the STRING type.
The system includes the value of the b parameter in the statement after the system optimizes the statement and queries data from the specified ranges. For more information, see query clause.
c: optional. You can specify an analyzer, stop words, and other information.
You must specify a colon (:
) between a parameter name and the corresponding value and separate multiple parameters with commas (,
).
You can include the following parameters in the value of the c parameter to specify an analyzer, stop words, and other information:
- global_analyzer
- specific_index_analyzer
- no_token_indexes
- tokenize_query
- remove_stopwords
- default_op
- Return values
This function returns values of the BOOLEAN type. true: indicates that the value in the a field includes the value of the b parameter. false: indicates that the value in the a field does not include the value of the b parameter.****
-
Examples
- In the following statement, the
QUERY
function is used to query data entries in the rows in which thetitle
field values include the Huawei keyword.
- In the following statement, the
SELECT nid, price, brand, size FROM phone WHERE QUERY(title, 'Huawei')
USE_TIME: 0.034, ROW_COUNT: 1
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
2 | 4388 | Huawei | 5.5 |
- In the following statement, the Huawei keyword and the OPPO keyword are used to query data entries in the rows in which the title field values include the Huawei keyword or the OPPO keyword.``
SELECT nid, price, brand, size FROM phone
WHERE QUERY(title, 'Huawei OR OPPO')
USE_TIME: 0.03, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
nid | price | brand | size |
2 | 4388 | Huawei | 5.5 |
4 | 2999 | OPPO | 5.5 |
- In the following statement, the default_op parameter is included in the value of the c parameter. You can include multiple parameters in the value of the c parameter by specifying a colon
:
between a parameter and the corresponding value and separating multiple parameters with commas,
.
SELECT nid, price, brand, size FROM phone
WHERE QUERY(title, 'Huawei OPPO Mobile Phone', 'default_op:OR,remove_stopwords:true')
- Remarks
Parameter 2 of the QUERY UDF is parsed by the query syntax parser of Havenask V3. For more information about the Havenask V3 query syntax, see query clause . When you set parameter 2 to a constant string, remove the single quotation marks before and after when you use Havenask V3 for queries. For example, QUERY(title, 'Huawei OPPO ') is equivalent to query=Huawei OPPO by using Havenask V3. If you want to enclose quotation marks in the query description, for example, query='Huawei Mobile Phone' AND 'OPPO Mobile Phone' using Havenask V3, the equivalent form in QUERY UDF is QUERY(title, '''Huawei Mobile Phone" AND ''OPPO Mobile Phone"').
-
Typical errors
| Error type | Error form | Correct form |
| --- | --- | --- |
| Syntax error. No query result. | QUERY('pidvid','123:456') | QUERY('pidvid','"123:456"') |
- Syntax
boolean sp_query_match(string a)
- Description
You can use this function to query inverted indexes based on the conditions that are specified by using the sp syntax.
- Parameters
a: specifies a search query in the sp format.
-
Return values
-
Examples
In the following statement, the sp_query_match
function is used to query data entries in the rows in which the brand field value is Apple and the nid field value is 6, 7, or 8.
SELECT * FROM phone WHERE sp_query_match('brand:Apple+(nid:6|nid:7|nid:8'))
- Syntax
int64 hashcombine(int64 k1, int64 k2, ...)
- Description
You can use this function to merge values in multiple fields of the INT64 type into one value of the INT64 type. This function uses the built-in hash function to merge the specified values.
- Parameters
Multiple fields of the INT64 type.
- Return values
The value that is returned after values in the specified fields are merged based on the hash algorithm.
- Examples
In the following statement, the hashcombine
function is used to merge values in the cate_id and user_id fields.
SELECT hashcombine(cate_id,user_id) FROM phone;
- Syntax
float rangevalue(float v, string desc)
- Description
You can use this function to convert continuous values in the specified ranges to discrete values.
- Parameters
v: specifies the field in which the values that you want to convert are stored.
desc: describes the mapping rules.
- Return values
The values that are returned after the specified values are converted to the specified discrete values.
- Examples
In the following statement, the rangevalue
function is used to convert values in the price field based on the following rules: Values that are not greater than 1000 are converted to 1.0. Values that are greater than 1000 but not greater than 5000 are converted to 2.0. Other values are not converted.
SELECT rangevalue(price,'(,1000]:1.0;(1000,5000]:2.0') FROM phone;
- Syntax
double sphere_distance(LOCATION point, double longitude, double latitude)
- Description
You can use this function to calculate the distance between a document and the location specified by the latitude and longitude.
- Parameters
point: specifies a field of the LOCATION type. Values in the specified field must be in the single-value format.
longitude: specifies the longitude of a location.
latitude: specifies the latitude of a location.
- Return values
The values returned are the spherical distance, in kilometers.
- Examples
In the following statement, the sphere_distance
function is used. The geo field must be a single-value field of the LOCATION type, and a forward index is created based on this field.
SELECT sphere_distance(geo,127.0,30.0) FROM phone
- Syntax
boolean ha_in(int32 v, const string desc, [const string sep])
boolean ha_in(ARRAY<int32> v, const string desc, [const string sep])
boolean ha_in(int64 v, const string desc, [const string sep])
boolean ha_in(ARRAY<int64> v, const string desc, [const string sep])
boolean ha_in(string v, const string desc, [const string sep])
boolean ha_in(ARRAY<string> v, const string desc, [const string sep])
- Description
You can use this function in a similar manner as the contain function. You can use this function to determine whether a value in a field is included in a specified value set. You can use this function to query data from a summary table, a key-value table, or a Pkey-Skey-value table. For more information, see Query data from a summary table and Query data from a key-value table or a Pkey-Skey-value table.
- Parameters
v: specifies the field in which the data that you want to query is stored. The field can be a single-value field or multi-value field.
desc: specifies a value set. The value set can include multiple constants. If a value matches one of the specified constants, the function returns true.
sep: optional. Specifies the delimiter that is used to separate multiple constants in the value of the desc parameter. Default delimiter: |
.
- Return values
true: indicates that the value is included in the value of the desc parameter. false: indicates that the value is excluded from the value of the desc parameter.
- Examples
In the following example, the ha_in
function is used to query data entries in the rows in which the nid field values are included in the value of the desc parameter.
select nid from mainse_excellent_search where ha_in(nid, '641999912989|577150359640|123456')
USE_TIME: 32.141ms, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
nid(int64) |
641999912989 |
577150359640 |
- Syntax
int8 index(ARRAY<int8> mv, const int64 idx, [const int8 def])
uint8 index(ARRAY<uint8> mv, const int64 idx, [const uint8 def])
int16 index(ARRAY<int16> mv, const int64 idx, [const int16 def])
uint16 index(ARRAY<uint16> mv, const int64 idx, [const uint16 def])
int32 index(ARRAY<int32> mv, const int64 idx, [const int32 def])
uint32 index(ARRAY<uint32> mv, const int64 idx, [const uint32 def])
int64 index(ARRAY<int64> mv, const int64 idx, [const int64 def])
uint64 index(ARRAY<uint64> mv, const int64 idx, [const uint64 def])
float index(ARRAY<float> mv, const int64 idx, [const float def])
string index(ARRAY<string> mv, const int64 idx, [const string def])
- Description
You can use this function to return the specific single value based on the given subscript from a multi-value field. You can specify the default value returned when the subscript is out of range.
- Parameters
mv: specifies the multi-value field of the NUMERIC or STRING type.
idx: specifies the specific subscript that is returned, a constant of the INT64 type.
def: optional. Specifies the value that is returned when the subscript overflows, a constant of the same type as the single-value field of the mv parameter. If this parameter is unspecified, the default value of the NUMERIC type is 0 and the default value of the STRING type is an empty string.
- Return values
The value of mv[idx] is returned when the subscript is valid and the value of def is returned when the subscript overflows.
- Examples
For the three documents with the prices field, the original content is as follows:
doc1: prices=1.0 2.0 3.0
doc2: prices=100.0
doc3: prices=1000.0 2000.0
If no default value is specified, the function is executed as follows:
select index(prices, 2) as price_2 from phone
USE_TIME: 32.141ms, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
price_2(double) |
2.0 |
0.0 |
2000.0 |
If the default value is specified, the function is executed as follows:
select index(prices, 2, -1.0) as price_2 from phone
USE_TIME: 32.141ms, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
price_2(double) |
2.0 |
-1.0 |
2000.0 |
- Syntax
ARRAY<int8> index(ARRAY<int8> mv, const int64 begin, const int64 length)
ARRAY<uint8> index(ARRAY<uint8> mv, const int64 begin, const int64 length)
ARRAY<int16> index(ARRAY<int16> mv, const int64 begin, const int64 length)
ARRAY<uint16> index(ARRAY<uint16> mv, const int64 begin, const int64 length)
ARRAY<int32> index(ARRAY<int32> mv, const int64 begin, const int64 length)
ARRAY<uint32> index(ARRAY<uint32> mv, const int64 begin, const int64 length)
ARRAY<int64> index(ARRAY<int64> mv, const int64 begin, const int64 length)
ARRAY<uint64> index(ARRAY<uint64> mv, const int64 begin, const int64 length)
ARRAY<float> index(ARRAY<float> mv, const int64 begin, const int64 length)
ARRAY<double> index(ARRAY<double> mv, const int64 begin, const int64 length)
ARRAY<string> index(ARRAY<string> mv, const int64 begin, const int64 length)
- Description
You can use this function to return the subset of a given subscript from a multi-value field by its beginpoint and number.
- Parameters
mv: specifies the multi-value field of the NUMERIC or STRING type.
begin: specifies the beginpoint of the subscript that is returned. The value is a constant of the INT64 type.
length: specifies the number of values from the beginpoint. The value is a constant of the INT64 type.
- Return values
The value of mv within the range from begin to begin + length-1 is returned. The value is truncated based on the value of mv.
- Examples
For the three documents with the prices field, the original content is as follows:
doc1: prices=1.0 2.0 3.0
doc2: prices=100.0
doc3: prices=1000.0 2000.0
select array_slice(prices, 1, 2) as price_2 from phone
USE_TIME: 32.141ms, ROW_COUNT: 2
------------------------------- TABLE INFO ---------------------------
price_2(multi_double) |
2.0 3.0 |
|
2000.0 |
You can configure the following parameters to specify analyzers when you use functions such as MATCHINDEX and QUERY.
Specifies a global analyzer. The analyzer that is specified by this parameter overrides the analyzer that you specified in the schema. The specified analyzer must be included in analyzer.json.
Specifies an analyzer for the specified field. The analyzer that is specified by this parameter overrides the analyzer that you specified in the global_analyzer parameter and the analyzer that you specified in the schema.
The analysis process does not need to be performed for the index specified in the query. Other processes such as normalization and stop word removal can still be performed.``
- tokenize_query
true
: specifies to convert the search query to terms. false
: specifies to not convert the search query to terms. If you configured an analyzer when you specified the QUERY
UDF, this parameter does not take effect.
The default value is true
.
- remove_stopwords
true
: specifies to delete stop words. false
: specifies to not delete stop words. You can specify stop words when you configure the analyzer.
The default value is true
.
Specifies the logical relationship between the terms that are returned after the system tokenizes the search query by using the default analyzer. Valid values: AND
and OR
. You can specify a default operator when you configure biz. For information about how to configure biz, see biz configuration.
Function | Feature | Version |
---|---|---|
type cast | Converts values from one data type to another data type. | ALL |