Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trino-Pinot-Connector throws error when a table with a JSON dimensional fields with invalid JSON is queried. #19663

Open
nizarhejazi opened this issue Nov 8, 2023 · 8 comments

Comments

@nizarhejazi
Copy link

nizarhejazi commented Nov 8, 2023

Running simple Trino SQL queries against Pinot table that has JSON dimensional fields with null handling enabled and an invalid JSON value throws the following error:
Cannot convert value to JSON: ''

SQL query:
SELECT * FROM "pinotcluster2"."default"."role_with_company_object_history_record00011" LIMIT 1000;

Error logs:

...
at io.trino.plugin.base.util.JsonTypeUtil.jsonParse(JsonTypeUtil.java:53)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2974)
at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4793)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4916)
at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59)
at [Source: (InputStreamReader); line: 1, column: 0]
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

at java.base/java.lang.Thread.run(Thread.java:833)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at io.trino.$gen.Trino_423_e_3____20231108_001947_2.run(Unknown Source)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:555)
...
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:555)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:887)
...
at io.trino.plugin.pinot.PinotBrokerPageSource.getNextPage(PinotBrokerPageSource.java:123)
at io.trino.plugin.pinot.decoders.JsonDecoder.decode(JsonDecoder.java:39)
at io.trino.plugin.base.util.JsonTypeUtil.jsonParse(JsonTypeUtil.java:62)
io.trino.spi.TrinoException: Cannot convert value to JSON: ''

Table has the following property: "nullHandlingEnabled": true
Note: pushing query options (e.g. option(enableNullHandling)) is currently missing from Trino.

@nizarhejazi
Copy link
Author

Got a suggestion to use dynamic tables but it does not help. Here is the corresponding query that returned the same error:

SELECT *
FROM "pinotcluster2"."default"."SELECT * FROM role_with_company_object_history_record00011 LIMIT 2000"
LIMIT 2000

@ebyhr
Copy link
Member

ebyhr commented Nov 8, 2023

cc: @elonazoulay

@nizarhejazi nizarhejazi changed the title Trino-Pinot-Connector throws error when a table with nullable JSON dimensional fields is queried. Trino-Pinot-Connector throws error when a table with a JSON dimensional fields with invalid JSON is queried. Nov 8, 2023
@nizarhejazi
Copy link
Author

The JSON value stored in some records is invalid JSON. Nevertheless, Presto handles the invalid JSON and does not fail the whole query. The following query succeeds:

  SELECT id, field_new_value, field_new_value = CAST('' AS JSON) FROM pinot_table
  WHERE company = 'c_id' AND id IN ('id1', 'id2')

However, the following query throws: invald json vlaue:

  SELECT id, field_new_value, is_json_scalar(field_new_value) FROM pinot_table
  WHERE company = 'c_id' AND id IN ('id1', 'id2')

@elonazoulay
Copy link
Member

Can you give an example of invalid json? Also, how this is handled differently in presto. I will take a look.
In the meantime you can filter out the invalid json if it's simple enough, i.e. empty.

@elonazoulay
Copy link
Member

cc @nizarhejazi - A recent pr added support for query options. Can you try nullHandlingEnabled=true at the end of your query and switch to using the dynamic table (query in double quotes)?
For an example see the pr which added support for it: #19078 -- cc @xiangfu0 for viz.

@nizarhejazi
Copy link
Author

@elonazoulay I answered why we cannot simply switch to dynamic tables in another ticket. Also, I am waiting on #19078 to make it to a SEP release.

@elonazoulay
Copy link
Member

Got it. Can you share examples of the invalid json? Is it empty or malformed?

@nizarhejazi
Copy link
Author

It is malformed JSON. On our end, we will work on addressing this data quality issue (I hit it because of SELECT-ing all columns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants