Star Tree Request/Response structure #227

sandeshkr419 · 2024-08-11T22:35:08Z

Description

For an index supporting star-tree composite index, this changes tries to achieve resolving a metric aggregation with/without a numeric terms query with the help of star-tree.

This PR in present state primarily focuses on flow of information request to response. Therefore, the code pieces around calculating the correct response values are still inaccurate and are in WIP.

This PR depends on opensearch-project#14809, therefore the depending unmerged changes have been utilized for now in my private fork to discuss the changes in parallel.

Approach

A new StarTreeQuery is introduced which helps resolve to star-tree documents. This star-tree query is formed (if it can be) at the shard level, this is not done at coordinator level to avoid node to node transportation. Also, all the information is present at shard level and OpenSearch does majority of query rewrite at shard level itself. This star tree query is encapsulated in an OriginalOrStarTreeQuery which helps preserve the original query alongwith the new star tree query. This encapsulation is done so as to preserve both the queries and decision whether to use which query can be taken at a segment level.

Example query shape:

Request:

{
    "query": {
        "term": {
            "status": 200
        }
    },
    "size": 0,
    "aggs": {
                        "sum_status": {
                            "sum": {
                                "field": "size"
                            }
                        }
                    }
}

Response:

{
    "took": 4038,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 42,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sum_status": {
            "value": 24745.0
        }
    }
}

Star Tree Flow Response:

{
    "took": 21120,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "sum_status": {
            "value": 24745.0
        }
    }
}

Approach:

The query shape is identified at the shard level (SearchService.java) and the query/aggregation (if can be resolved by star-tree) is parsed to a star-tree query.
The star-tree query is wrapped around OriginalOrStarTreeQuery to preserve the original query - this is because the decision to decide which implementation (default/startree) to use can be taken for a segment level.
If star-tree can be utilized to answer the query, the star-tree document set is then collected by the relevant aggregator/collector. In this POC, I have made changes to SumAggregator to demonstrate the flow of changes.

Related Issues

opensearch-project#15257

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)
Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

bharath-techie · 2024-08-12T05:05:59Z

server/src/main/java/org/opensearch/index/query/QueryShardContext.java

+        // Check if the query builder is an instance of TermQueryBuilder
+        if (queryBuilder instanceof TermQueryBuilder) {
+            TermQueryBuilder tq = (TermQueryBuilder) queryBuilder;
+            String field = tq.fieldName();


Shouldn't this be converted to dimension field name of star tree?

bharath-techie · 2024-08-12T05:06:21Z

server/src/main/java/org/opensearch/index/query/QueryShardContext.java

+            long inputQueryVal = Long.parseLong(tq.value().toString());
+
+            // Get or create the list of predicates for the given field
+            List<Predicate<Long>> predicates = predicateMap.getOrDefault(field, new ArrayList<>());


If we use predicates, we can't use binary search during star tree traversal.

Haven't really put much thought into this yet, let me revisit the part on how to better store information to use filters. For request/response parsing I had just inspired changes from previous POCs.

bharath-techie · 2024-08-12T05:09:49Z

server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java

+            @Override
+            public void collect(int doc, long bucket) throws IOException {
+                // TODO: Fix the response for collecting star tree sum
+                sums = bigArrays.grow(sums, bucket + 1);


Can we extract out the default implementation getDefaultLeafCollector and reuse the same logic.

I really like the approach of reusing the existing aggregators.

Yes, I'll try and do that, I am inclined to on refactoring & re-using same implementations wherever possible.

final SortedNumericDoubleValues values = valuesSource.doubleValues(ctx);

we can check if we are able to get sortedNumericDoubleValues , otherwise we need to convert to double for each doc

bharath-techie · 2024-08-12T05:12:03Z

server/src/main/java/org/opensearch/search/aggregations/metrics/NumericMetricsAggregator.java

+
+    protected StarTreeValues getStarTreeValues(LeafReaderContext ctx, CompositeIndexFieldInfo starTree) throws IOException {
+        SegmentReader reader = Lucene.segmentReader(ctx.reader());
+        if (!(reader.getDocValuesReader() instanceof CompositeIndexReader)) return null;


We need to see if its better to load them as doubleValuesSource similar to how existing fields are loaded. And that too load the specific fields requested instead of loading the entire star tree values. ( for example in sum aggregator, we can fetch the doubleFieldData of a particular field of star tree metric , for eg : sum_status_metric can be loaded in )

Then you don't need to worry about conversion either.

bharath-techie · 2024-08-12T05:12:32Z

server/src/main/java/org/opensearch/search/SearchService.java

+        // Can be marked false for majority cases for which star-tree cannot be used
+        // Will save checking the criteria later and we can have a limit on what search requests are supported
+        // As we increment the cases where star-tree can be used, this can be set back to true
+        boolean canUseStarTree = context.mapperService().isCompositeIndexPresent();


Should this be based on dimensions and metric rather ?

This is just initialization, will toggle this off for multiple cases. (TODO-in this PR iteself)

Signed-off-by: Sandesh Kumar <[email protected]>

bharath-techie · 2024-08-13T03:34:47Z

server/src/main/java/org/opensearch/index/query/QueryShardContext.java

+        if (queryBuilder instanceof TermQueryBuilder) {
+            TermQueryBuilder tq = (TermQueryBuilder) queryBuilder;
+            String field = tq.fieldName();
+            long inputQueryVal = Long.parseLong(tq.value().toString());


Convert to sortable long

Signed-off-by: Sandesh Kumar <[email protected]>

bharath-techie · 2024-08-14T05:03:39Z

server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java

+                    kahanSummation.reset(sum, compensation);
+
+                    for (int i = 0; i < valuesCount; i++) {
+                        double value = Double.longBitsToDouble(dv.nextValue());


shouldn't we do this ?

public static Double sortableLongtoDouble(Long value) { return NumericUtils.sortableLongToDouble(value); }

Also lets see if we can get double sorted numeric dv ahead

@Override public SortedNumericDoubleValues getDoubleValues() { try { SortedNumericDocValues raw = DocValues.getSortedNumeric(reader, field); return FieldData.sortableLongBitsToDoubles(raw); } catch (IOException e) { throw new IllegalStateException("Cannot load doc values", e); } }

sandeshkr419 · 2024-08-19T03:13:28Z

Closing this PR in lieu of opensearch-project#15289

bharath-techie reviewed Aug 12, 2024

View reviewed changes

Star stree request/response changes

da0056b

Signed-off-by: Sandesh Kumar <[email protected]>

sandeshkr419 force-pushed the poc1 branch from de44784 to da0056b Compare August 13, 2024 02:11

sandeshkr419 changed the title ~~doc values file format~~ Star Tree Request/Response structure Aug 13, 2024

Merge branch 'startree-file-formats-codec-merge' into poc1

dcddc5c

sandeshkr419 self-assigned this Aug 13, 2024

sandeshkr419 mentioned this pull request Aug 13, 2024

[Star Tree][Search][RFC] Parse aggregation request to resolve via star tree data structure opensearch-project/OpenSearch#14871

Closed

bharath-techie reviewed Aug 13, 2024

View reviewed changes

fix sum aggregator code

54b9148

Signed-off-by: Sandesh Kumar <[email protected]>

bharath-techie reviewed Aug 14, 2024

View reviewed changes

sandeshkr419 closed this Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Star Tree Request/Response structure #227

Star Tree Request/Response structure #227

sandeshkr419 commented Aug 11, 2024 •

edited

Loading

bharath-techie Aug 12, 2024

bharath-techie Aug 12, 2024

sandeshkr419 Aug 13, 2024

bharath-techie Aug 12, 2024

sandeshkr419 Aug 13, 2024

bharath-techie Aug 13, 2024

bharath-techie Aug 12, 2024

bharath-techie Aug 12, 2024

sandeshkr419 Aug 13, 2024

bharath-techie Aug 13, 2024

bharath-techie Aug 14, 2024

bharath-techie Aug 14, 2024

sandeshkr419 commented Aug 19, 2024

Star Tree Request/Response structure #227

Star Tree Request/Response structure #227

Conversation

sandeshkr419 commented Aug 11, 2024 • edited Loading

Description

Approach

Approach:

Related Issues

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandeshkr419 commented Aug 19, 2024

sandeshkr419 commented Aug 11, 2024 •

edited

Loading