Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON Support to V2 Engine #217

Merged
merged 18 commits into from
Mar 23, 2023
Merged

Conversation

margarit-h
Copy link

@margarit-h margarit-h commented Feb 3, 2023

Description

Phase 1 of adding JSON Support to V2 Engine
For example: The following request was handled by V2 engine:

curl --location --request POST 'http://localhost:9200/_plugins/_sql?format=json' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data-raw '{
    "query": "select * from location"
}'
{"took":2,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":1.0,"hits":[{"_index":"location","_id":"1","_score":1.0,"_source":{"center":{"lon":100.5,"lat":0.5},"description":"square"}},{"_index":"location","_id":"2","_score":1.0,"_source":{"center":{"lon":105.0,"lat":5.0},"description":"bigSquare"}}]}}

Returning the raw response from OpenSearch made IT tests fail due to missing in-memory operations, difference in aggregation type (V2 is composite_bucket), and some bugs were discovered. The solution for these failing IT tests were to fall back to legacy for all functions, expect failures for IT tests with a comment for issues related.

List of expected failures in legacy IT:

  • DateFormatIT.and (1436)

  • DateFormatIT.equalTo (1436)

  • DateFormatIT.greaterThan (1436)

  • DateFormatIT.greaterThanOrEqualTo (1436)

  • DateFormatIT.lessThan (1436)

  • DateFormatIT.lessThanOrEqualTo (1436)

  • DateFormatIT.or (1436)

  • DateFormatIT.sortByDateFormat (1436)

  • QueryFunctionsIT.wildcardQuery (Dependent on PR #1314)

  • QueryIT.selectAllWithFieldAndOrderBy (785)

  • QueryIT.selectAllWithFieldReturnsAll (785)

  • QueryIT.selectAllWithFieldReverseOrder (785)

  • QueryIT.selectAllWithMultipleFields (785)

  • QueryIT.notLikeTests (1425)

Proposal for phase 2 of JSON support for V2

opensearch-project#1450

Issues Resolved

opensearch-project#1317

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@margarit-h margarit-h marked this pull request as draft February 3, 2023 21:51
@codecov
Copy link

codecov bot commented Feb 3, 2023

Codecov Report

Merging #217 (b3e5dae) into integ-v2-json-support (bc39346) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@                   Coverage Diff                    @@
##             integ-v2-json-support     #217   +/-   ##
========================================================
  Coverage                    98.43%   98.43%           
- Complexity                    3775     3799   +24     
========================================================
  Files                          343      346    +3     
  Lines                         9364     9414   +50     
  Branches                       599      602    +3     
========================================================
+ Hits                          9217     9267   +50     
  Misses                         142      142           
  Partials                         5        5           
Flag Coverage Δ
sql-engine 98.43% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rg/opensearch/sql/analysis/JsonSupportVisitor.java 100.00% <100.00%> (ø)
...search/sql/analysis/JsonSupportVisitorContext.java 100.00% <100.00%> (ø)
...a/org/opensearch/sql/executor/ExecutionEngine.java 100.00% <100.00%> (ø)
.../opensearch/sql/planner/physical/PhysicalPlan.java 100.00% <100.00%> (ø)
...opensearch/executor/OpenSearchExecutionEngine.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/response/OpenSearchResponse.java 100.00% <100.00%> (ø)
...ch/sql/opensearch/storage/OpenSearchIndexScan.java 100.00% <100.00%> (ø)
...pensearch/sql/protocol/response/format/Format.java 100.00% <100.00%> (ø)
...c/main/java/org/opensearch/sql/sql/SQLService.java 100.00% <100.00%> (ø)
.../org/opensearch/sql/sql/antlr/SQLSyntaxParser.java 100.00% <100.00%> (ø)
... and 1 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@margarit-h margarit-h marked this pull request as ready for review February 6, 2023 23:52
@Yury-Fridlyand
Copy link

I did a small test and there is the difference. Can we omit missed information?
legacy:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "books",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 2,
          "author": "Alan Alexander Milne",
          "title": "Winnie-the-Pooh"
        }
      }
    ]
  }
}

v2:

{
  "_shards": {
    "total": 0,
    "successful": 0,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "hits": [
      {
        "_source": {
          "author": "Alan Alexander Milne",
          "id": 2,
          "title": "Winnie-the-Pooh"
        }
      }
    ]
  }
}

@Yury-Fridlyand
Copy link

another small test for query

select 1, 1+1, now()
select 1, 1+1, now() from books; // for legacy

legacy

{
  "took": 413,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "books",
        "_id": "1",
        "_score": 1,
        "_source": {
          "id": 2,
          "author": "Alan Alexander Milne",
          "title": "Winnie-the-Pooh"
        },
        "fields": {
          "assign_1": [
            1
          ],
          "now()": [
            "16:36:42"
          ],
          "add(1, 1)": [
            2
          ]
        }
      }
    ]
  }
}

v2

{
  "_shards": {
    "total": 0,
    "successful": 0,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "hits": [
      {
        "_source": {
          "1": 1,
          "1+1": 2
        },
        "fields": {
          "now()": "2023-02-06 16:58:14"
        }
      }
    ]
  }
}

@dai-chen
Copy link

dai-chen commented Feb 7, 2023

Just some thoughts on our options:

  1. In the response formatter, rebuild SearchHits similarly as what we do in this PR
  2. In meta field support in V2 engine, add _source meta field. Fill original JSON response to each row in OpenSearchIndexScan for later use

@GumpacG GumpacG force-pushed the dev-v2-json-support branch from 4bf6d33 to 6e16fcc Compare March 2, 2023 00:16
@GumpacG GumpacG mentioned this pull request Mar 2, 2023
6 tasks
@GumpacG GumpacG force-pushed the dev-v2-json-support branch from 6e16fcc to 682110f Compare March 3, 2023 22:55
}

@Override
public void open() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid making member variables protected why not leave this function in the base class and override fetchNextBatch instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in 3de83d9. Thanks for the idea!

@dai-chen
Copy link

Does the getRawResponse() changes in core engine only work for OpenSearch? wonder have we evaluated the 2nd option in #217 (comment) ?

*/
public class JsonTypeResponseFormatter extends JsonResponseFormatter<QueryResult> {

private final Style style;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used, can we remove it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this file anymore as we just return the raw response now.

public Object buildJsonObject(QueryResult response) {
JsonResponse.JsonResponseBuilder json = JsonResponse.builder();
Total total = new Total(response.size(), "eq");
Shards shards = new Shards(0, 0, 0, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are defaulting - maybe we can add a comment here that says the response doesn't have the number of shards listed?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this file anymore as we just return the raw response now.


@Builder
@Getter
public static class JsonResponse {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make all of these classes private

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this file anymore as we just return the raw response now.

class QueryResponse {
private final Schema schema;
private final List<ExprValue> results;
private final String rawResponse;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on what this is for?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c7208ea

try {
rawResponse = ((OpenSearchIndexScan) ((ProjectOperator) physicalPlan).getInput())
.getRawResponse();
} catch (Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is possible to be more specific that Exception here? This will catch everything and silence it.
Even better if we could avoid the try...catch block if possible (and check to see if the RawResponse exists...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c7208ea. ClassCastException is caught because queries like SELECT 1 has a ValuesOperator that can't be casted to OpenSearchIndexScan which means it won't have rawResponse.

}
});

assertNotNull(result.get());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could check that it's equal to "not empty", right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in c7208ea. Thanks

@@ -252,4 +285,42 @@ public String explain() {
return "explain";
}
}

public static class FakeOpenSearchIndexScan extends OpenSearchIndexScan {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like you could @Mock this class and use the when override instead of declaring these functions protected with an @Override. But maybe you tried that and it didn't work out...?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it another shot and it worked. Changed in c7208ea

Copy link

@GabeFernandez310 GabeFernandez310 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an integration test might need to be fixed?

@GumpacG
Copy link

GumpacG commented Mar 16, 2023

Looks like an integration test might need to be fixed?

@GabeFernandez310 Integration tests are expected to fail here and should pass once #237 is merged

@acarbonetto
Copy link

In meta field support in V2 engine, add _source meta field. Fill original JSON response to each row in OpenSearchIndexScan for later use

Once my PR goes in, I think this is possible to retrieve the hit's result and store it in the response. It'll be slightly different for each storage source, but the _source metafield would only be available for OpenSearchIndexes. We could have something similar for Prometheus/Spark

@GumpacG
Copy link

GumpacG commented Mar 16, 2023

Does the getRawResponse() changes in core engine only work for OpenSearch? wonder have we evaluated the 2nd option in #217 (comment) ?

@dai-chen It does only work for OpenSearch. Currently, the JSON support in V2 will return the raw response from OpenSearch. In memory operations will not be included in the response yet until we have a more concrete idea of what we want/should include in the response. In the case that the user has function calls in their query with json format, the plugin will fall back to legacy and is implemented in #237.

margarit-h and others added 9 commits March 18, 2023 08:11
Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Margarit Hakobyan <[email protected]>
GumpacG and others added 8 commits March 18, 2023 08:11
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
* Implement json support for V2 engine

Signed-off-by: Margarit Hakobyan <[email protected]>

* Reverted some changes

Signed-off-by: Margarit Hakobyan <[email protected]>

* Removed some fields

Signed-off-by: Margarit Hakobyan <[email protected]>

* minor fix

Signed-off-by: Margarit Hakobyan <[email protected]>

* Added a unit test, cleaned up

Signed-off-by: Margarit Hakobyan <[email protected]>

* Returning raw OpenSearch response when type is json

Signed-off-by: Margarit Hakobyan <[email protected]>

* Add an integration test, fix checkstyle errors

Signed-off-by: Margarit Hakobyan <[email protected]>

* Made new engine fallback to legacy for in memory operations for json format

Signed-off-by: Guian Gumpac <[email protected]>

* Address build failures

Signed-off-by: MaxKsyunz <[email protected]>

* Added legacy fall back

Signed-off-by: Guian Gumpac <[email protected]>

* Refactored fall back logic to use visitor design pattern

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary IT

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR feedback

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary context

Signed-off-by: Guian Gumpac <[email protected]>

* Added fall back for Filter functions

Signed-off-by: Guian Gumpac <[email protected]>

* Made new engine fallback to legacy for in memory operations for json format

Signed-off-by: Guian Gumpac <[email protected]>

* Address build failures

Signed-off-by: MaxKsyunz <[email protected]>

* Added legacy fall back

Signed-off-by: Guian Gumpac <[email protected]>

* Refactored fall back logic to use visitor design pattern

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary IT

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR feedback

Signed-off-by: Guian Gumpac <[email protected]>

* Removed unnecessary context

Signed-off-by: Guian Gumpac <[email protected]>

* Added fall back for Filter functions

Signed-off-by: Guian Gumpac <[email protected]>


---------

Signed-off-by: MaxKsyunz <[email protected]>

* Fixed checkstyle errors

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments and fixed the visitor

Signed-off-by: Guian Gumpac <[email protected]>

* Added comment to visitor class

Signed-off-by: Guian Gumpac <[email protected]>

* Addressed PR comments to improve visitor class

Signed-off-by: Guian Gumpac <[email protected]>

* Added unit tests for JsonSupportVisitor

Signed-off-by: Guian Gumpac <[email protected]>

* Added helper function for SQLServiceTest

Signed-off-by: Guian Gumpac <[email protected]>

* Added expected failures

Signed-off-by: Guian Gumpac <[email protected]>

* Reworked the visitor class to have type Void instead of Boolean

Signed-off-by: Guian Gumpac <[email protected]>

* Fixed typo

Signed-off-by: Guian Gumpac <[email protected]>

* Added github link for tracking issue

Signed-off-by: Guian Gumpac <[email protected]>

---------

Signed-off-by: Margarit Hakobyan <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
Signed-off-by: MaxKsyunz <[email protected]>
Co-authored-by: Margarit Hakobyan <[email protected]>
Co-authored-by: MaxKsyunz <[email protected]>
Signed-off-by: Guian Gumpac <[email protected]>
@GumpacG GumpacG force-pushed the dev-v2-json-support branch from 66ae9aa to 36a9997 Compare March 18, 2023 15:11
Signed-off-by: Guian Gumpac <[email protected]>
void testGetRawResponse() {
when(child.getRawResponse()).thenReturn("not empty", "", null);
assertEquals("not empty", testPlan.getRawResponse());
assertEquals("", testPlan.getRawResponse());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertEquals twice?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last one is for the null case. It returns "" as well.

@GumpacG GumpacG merged commit 707d88e into integ-v2-json-support Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants