Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GROUP BY does not work properly with array values #3132

Open
normanj-bitquill opened this issue Oct 28, 2024 · 2 comments
Open

[BUG] GROUP BY does not work properly with array values #3132

normanj-bitquill opened this issue Oct 28, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@normanj-bitquill
Copy link
Contributor

What is the bug?
If a field that contains array values is used in a GROUP BY clause, each value in the array is considered as a separate row value.

Consider an index with the following data:

{1, [1, 2]}
{2, [3, 4]}
{3, [1, 5]}
{4, [1, 2]}
{5, [2, 3]}

and this query:

SELECT COUNT(x), y FROM test3 GROUP BY y;
{3, 1}
{3, 2}
{2, 3}
{1, 4}
{1, 5}

The expected results are:

{2, [1, 2]}
{1, [3, 4]}
{1, [1, 5]}
{1, [2, 3]}

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Load the data above into a new index
  2. Run the query above on the index

What is the expected behavior?
Preserve the array value and perform comparisons on the array value as a whole. Only group rows that have the same array values.

What is your host/environment?

  • OS: MacOS (should not matter)
  • Version [e.g. 3.0 code base]
  • Plugins: SQL plugin

Do you have any screenshots?
N/A

Do you have any additional context?
Issue #1300 had a change recently merged in that allows array values to be used in query evaluation and in the result set.

@normanj-bitquill normanj-bitquill added bug Something isn't working untriaged labels Oct 28, 2024
@normanj-bitquill
Copy link
Contributor Author

This appears to be due to how the OpenSearch engine performs grouping on multi-valued fields.

POST test3/_search
{
  "from" : 0,
  "size" : 10000,
  "timeout" : "1m",
  "aggregations" : {
    "composite_buckets" : {
      "composite" : {
        "size" : 1000,
        "sources" : [ {
          "y" : {
            "terms" : {
              "field" : "y",
              "missing_bucket" : true,
              "missing_order" : "first",
              "order" : "asc"
            }
          }
        } ]
      },
      "aggregations" : {
        "MIN(x)" : {
          "min" : {
            "field" : "x"
          }
        }
      }
    }
  }
}
{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test3",
        "_id": "ATXWtZIB72QJDYYHLMzM",
        "_score": 1,
        "_source": {
          "x": 1,
          "y": [
            1,
            2
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "AjXWtZIB72QJDYYHXszf",
        "_score": 1,
        "_source": {
          "x": 2,
          "y": [
            3,
            4
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "AzXWtZIB72QJDYYHqczI",
        "_score": 1,
        "_source": {
          "x": 3,
          "y": [
            1,
            5
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "BDXWtZIB72QJDYYHy8wa",
        "_score": 1,
        "_source": {
          "x": 4,
          "y": [
            1,
            2
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "Q6i4xJIBxt0sLj_lfLXQ",
        "_score": 1,
        "_source": {
          "x": 5,
          "y": [
            2,
            3
          ]
        }
      }
    ]
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "y": 5
      },
      "buckets": [
        {
          "key": {
            "y": 1
          },
          "doc_count": 3,
          "MIN(x)": {
            "value": 1
          }
        },
        {
          "key": {
            "y": 2
          },
          "doc_count": 3,
          "MIN(x)": {
            "value": 1
          }
        },
        {
          "key": {
            "y": 3
          },
          "doc_count": 2,
          "MIN(x)": {
            "value": 2
          }
        },
        {
          "key": {
            "y": 4
          },
          "doc_count": 1,
          "MIN(x)": {
            "value": 2
          }
        },
        {
          "key": {
            "y": 5
          },
          "doc_count": 1,
          "MIN(x)": {
            "value": 3
          }
        }
      ]
    }
  }
}

@andrross
Copy link
Member

[Catch All Triage - 1, 2, 3, 4, 5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants