You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some information about the issue we are facing at the moment:
We have a partitioned Hudi table and there are certain partitions that do not have a data file (in our S3). And deleting all those "empty" partitions, so there is no metadata about them in our Hive Metastore, will let the query run successfully afterwards.
Basically what we do is a simple "select count(*) from exampletable" which gets stuck and seems to run forever and to compare with it a "select count(*) from exampletable where happeneddayde = '2023-09-19'" which returns with the expected result right away. So happeneddayde is a partition key and for this particular day there is data in the S3. For the stuck query we can see in the Trino dashboard that it is blocked.
Here is the query json for the blocked/stuck query:
Update to @uroell (we're working in the same department): #20151 fixes the problem of Trino failing on empty Hudi partitions for us. #20027 should then additionally fix that Trino doesn't get stuck on errors.
Hi,
after some discussion in the Hudi Slack channel (see https://app.slack.com/client/T4D7BR6T1/C4D716NPQ/thread/C4D716NPQ-1695733714.877639), I open this issue to track it.
Some information about the issue we are facing at the moment:
We have a partitioned Hudi table and there are certain partitions that do not have a data file (in our S3). And deleting all those "empty" partitions, so there is no metadata about them in our Hive Metastore, will let the query run successfully afterwards.
Basically what we do is a simple "select count(*) from exampletable" which gets stuck and seems to run forever and to compare with it a "select count(*) from exampletable where happeneddayde = '2023-09-19'" which returns with the expected result right away. So happeneddayde is a partition key and for this particular day there is data in the S3. For the stuck query we can see in the Trino dashboard that it is blocked.
Here is the query json for the blocked/stuck query:
20231017_131611_00008_r7bcz_anonymized.json
The text was updated successfully, but these errors were encountered: