-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework on OpenSearchDataType
: parse, store and use mapping information
#1314
Rework on OpenSearchDataType
: parse, store and use mapping information
#1314
Conversation
…ion (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address #180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. #180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to #180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]>
…have the common style. Signed-off-by: Yury-Fridlyand <[email protected]>
…mapping-use Signed-off-by: Yury-Fridlyand <[email protected]>
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1314 +/- ##
============================================
+ Coverage 98.36% 98.40% +0.03%
- Complexity 3640 3733 +93
============================================
Files 343 345 +2
Lines 9016 9210 +194
Branches 585 597 +12
============================================
+ Hits 8869 9063 +194
Misses 142 142
Partials 5 5
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cachedFieldTypes == null) { | ||
cachedFieldTypes = new OpenSearchDescribeIndexRequest(client, indexName).getFieldTypes(); | ||
} | ||
return OpenSearchDataType.traverseAndFlatten(cachedFieldTypes).entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of traverseAndFlatten
can be cached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renaming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this issue opensearch-project/observability#1392 need to be updated?
After this PR they will start getting text
in some cases instead of string
, right?
No, we won't. There are no breaking changes. |
…t and keyword. Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
Signed-off-by: Yury-Fridlyand <[email protected]>
b968514
if (cachedFieldOpenSearchTypes == null) { | ||
cachedFieldOpenSearchTypes = new OpenSearchDescribeIndexRequest(client, indexName) | ||
.getFieldTypes(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate with getFieldOpenSearchTypes()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost yes. getFieldOpenSearchTypes
returns full type info. It is used to create OpenSearchExprValueFactory
- a class responsible to proper parsing values received from the cluster.
getFieldTypes
used to create TypeEnvironment
in visitRelation
. TypeEnvironment
keeps simplified type info as ExprCoreType
. It is used later for function resolution.
Text("text", ExprCoreType.UNKNOWN), | ||
Keyword("keyword", ExprCoreType.STRING), | ||
Ip("ip", ExprCoreType.UNKNOWN), | ||
GeoPoint("geo_point", ExprCoreType.UNKNOWN), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question? does geo_point will in response schema? if yes, does it means?
- geo_point type should be supported in core.
- jdbc driver should support geo_point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query select * from datatypes
(a table from IT) response in legacy:
{
"schema": [
{
"name": "boolean_value",
"type": "boolean"
},
{
"name": "double_range_value",
"type": "double_range"
},
{
"name": "ip_value",
"type": "ip"
},
{
"name": "long_range_value",
"type": "long_range"
},
{
"name": "keyword_value",
"type": "keyword"
},
{
"name": "date_range_value",
"type": "date_range"
},
{
"name": "binary_value",
"type": "binary"
},
{
"name": "date_value",
"type": "date"
},
{
"name": "text_value",
"type": "text"
},
{
"name": "nested_value",
"alias": "",
"type": "text"
},
{
"name": "object_value",
"alias": "",
"type": "text"
}
],
"total": 1,
"datarows": [[
true,
{
"lte": 4,
"gte": 1
},
"127.0.0.1",
{
"lte": 100500,
"gte": 0
},
"keyword",
{
"lte": "2019-05-15",
"gte": "2019-05-01"
},
"U29tZSBiaW5hcnkgYmxvYg==",
"2020-10-13 13:00:00.000",
"text",
[
{
"last": "Smith",
"first": "John"
},
{
"last": "White",
"first": "Alice"
}
],
{
"last": "Dale",
"first": "Dale"
}
]],
"size": 1,
"status": 200
}
From V2 without these changes:
{
"schema": [
{
"name": "date_value",
"type": "timestamp"
},
{
"name": "boolean_value",
"type": "boolean"
},
{
"name": "ip_value",
"type": "ip"
},
{
"name": "text_value",
"type": "text"
},
{
"name": "nested_value",
"type": "nested"
},
{
"name": "object_value",
"type": "object"
},
{
"name": "keyword_value",
"type": "keyword"
},
{
"name": "binary_value",
"type": "binary"
},
{
"name": "geo_value",
"type": "geo_point"
}
],
"datarows": [
[
"2020-10-13 13:00:00",
true,
"127.0.0.1",
"text",
[
{
"last": "Smith",
"first": "John"
},
{
"last": "White",
"first": "Alice"
}
],
{
"last": "Dale",
"first": "Dale"
},
"keyword",
"U29tZSBiaW5hcnkgYmxvYg==",
{
"lat": 40.71,
"lon": 74.0
}
]
],
"total": 1,
"size": 1,
"status": 200
}
And with these changes:
{
"schema": [
{
"name": "text_value",
"type": "text"
},
{
"name": "date_value",
"type": "timestamp"
},
{
"name": "boolean_value",
"type": "boolean"
},
{
"name": "ip_value",
"type": "ip"
},
{
"name": "nested_value",
"type": "nested"
},
{
"name": "object_value",
"type": "object"
},
{
"name": "keyword_value",
"type": "keyword"
},
{
"name": "binary_value",
"type": "binary"
},
{
"name": "geo_value",
"type": "geo_point"
}
],
"datarows": [
[
"text",
"2020-10-13 13:00:00",
true,
"127.0.0.1",
[
{
"last": "Smith",
"first": "John"
},
{
"last": "White",
"first": "Alice"
}
],
{
"last": "Dale",
"first": "Dale"
},
"keyword",
"U29tZSBiaW5hcnkgYmxvYg==",
{
"lat": 40.71,
"lon": 74.0
}
]
],
"total": 1,
"size": 1,
"status": 200
}
} | ||
|
||
public static OpenSearchBinaryType of() { | ||
return OpenSearchBinaryType.instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return instrance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A singleton
} | ||
res = new OpenSearchDataType(mappingType); | ||
res.exprCoreType = exprCoreType; | ||
instances.put(mappingType.toString(), res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not init the instances statically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the idea! Fixed in be7e9c7.
public String legacyTypeName() { | ||
return LEGACY_TYPE_NAME_MAPPING.getOrDefault(this, typeName()); | ||
if (mappingType == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case mappingType is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When object created from aggregation, see changes in AggregationQueryBuilder. Aggregation adds a new column which type has no index mapping; the column type should be passed into OpenSearchExprValueFactory
as OpenSearchDataType
to provide proper data parsing.
|
||
/** | ||
* Text field doesn't have doc value (exception thrown even when you call "get") | ||
* Limitation: assume inner field name is always "keyword". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue #1113 not solved, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, will the changes in this PR fix the linked issue #794 ? |
No, not yet. It will in the upcoming PR. |
Signed-off-by: Yury-Fridlyand <[email protected]>
my understand is
|
…ion (#1314) * Rework on `OpenSearchDataType`: parse, store and use mapping information (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address Bit-Quill#180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. Bit-Quill#180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]> (cherry picked from commit d44cd39)
…ion (#1314) (#1455) * Rework on `OpenSearchDataType`: parse, store and use mapping information (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address Bit-Quill#180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. Bit-Quill#180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <[email protected]> Signed-off-by: Yury-Fridlyand <[email protected]> (cherry picked from commit d44cd39) Co-authored-by: Yury-Fridlyand <[email protected]>
Currently, V2 engine follows 4. We have #1432 to track |
Description
Rework on
OpenSearchDataType
according to Bit-Quill#169 (comment)OpenSearchDataType
is not a enum anymore, it is a class which stores all required mapping information.TODOs
Fixes could be done in scope of this PR or in follow-up fixes:
ExprCoreType
. Type resolution is hardcoded and doesn't consider real column types: https://github.com/Bit-Quill/opensearch-project-sql/blob/e4d89de579e8902a85438d56d0a4645c985f6fa6/core/src/main/java/org/opensearch/sql/data/type/ExprCoreType.java#L86traverseAndFlatten
- do recursive traverse onlyARRAY
https://github.com/Bit-Quill/opensearch-project-sql/blob/16d7ce07d6211fe4978c0b6cf698502a8446df47/opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java#L244Further future improvements
Features that should be done in scope of other PRs:
text
is not completely supported as a separate type yet.geo_shape
,ip_range
.alias
is not resolved.Map<String, ExprType> getFieldTypes()
inOpenSearchIndex
should be replaced byMap<String, OpenSearchDataType> getFieldOpenSearchTypes()
. This requires base interface change.ExprCoreType
.Issues Resolved
Assuming that these changes would fix #794, #1038, #1112 and #1113. Relevant for #1111 too.
Notes
added to bypass Ignore switch case default cases which can't be covered jacoco/jacoco#1211. See also https://www.javaspecialists.eu/archive/Issue272-Hacking-Enums-Revisited.html
See team review and discussion in Bit-Quill#180.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.