Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: requirement failed while read migrated parquet table #8863

Closed
camper42 opened this issue Oct 18, 2023 · 9 comments
Labels

Comments

@camper42
Copy link

camper42 commented Oct 18, 2023

Apache Iceberg version

1.4.0 (latest release)

Query engine

Spark 3.4.1

Please describe the bug 🐞

We have some irrationally partitioned parquet table, which have 3 level partition and have tons of small files. It looks like table/log_date=2023-10-16/date=2023-10-17/attr=attr1.

after call spark_catalog.system.migrate("db.table") success, spark read failed on some of partitions that have about 1k+ parquet(zstd) files (about 30KiB per file), but we can read from table_backup_.

Error:

23/10/18 01:08:56 WARN TaskSetManager: Lost task 70.0 in stage 2.0 (TID 1029) (10.113.6.67 executor 9): java.lang.IllegalArgumentException: requirement failed: length (-254841238) cannot be smaller than -1
	at scala.Predef$.require(Predef.scala:281)
	at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79)
	at org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala)
	at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:89)
	at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:41)
	at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141)
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120)
	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63)
	at scala.Option.exists(Option.scala:376)
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.columnartorow_nextBatch_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)


@camper42
Copy link
Author

can reproduce with snapshot:
image

previous failure on migrated table:
image
image

@camper42
Copy link
Author

I'm lost.... seems both snapshot and migrate use SparkTableUtil.importSparkTable.
if read error on migrated table, it should failed on snapshot table. but no error when read from snapshot table.

and I try to:

  1. cp failed partition to a new table (new_test)
  2. REPAIR TABLE new_test
  3. snapshot + rewrite -> no error
  4. migrate + rewrite -> no error

@Omega359
Copy link
Contributor

I've just encountered this exception as well but the circumstances are somewhat different. One process is writing out an iceberg table primarily via appends with the occasional delete. Upon completion a new cluster is spun up that reads that table. This issue presented itself during that process:

23/10/24 22:25:51 ERROR BaseReader: Error reading file(s): s3://vdcint-transaction-dev-txn/AR_IDw/dev/transaction_2023_10_ndleq8xl/source_alias=MC3/transaction_date_year=2016/00081-10437993-40ecbe40-af70-41a4-9fde-61bc6a6abeb2-00001.parquet
java.lang.IllegalArgumentException: requirement failed: length (-135733377) cannot be smaller than -1
	at scala.Predef$.require(Predef.scala:281) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:89) ~[app.jar:?]
	at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:41) ~[app.jar:?]
	at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141) ~[app.jar:?]
	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at scala.Option.exists(Option.scala:376) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.columnartorow_nextBatch_0$(Unknown Source) ~[?:?]
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.hashAgg_doAggregateWithoutKey_0$(Unknown Source) ~[?:?]
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) ~[?:?]
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:35) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.hasNext(Unknown Source) ~[?:?]
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:959) ~[spark-sql_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[scala-library-2.12.15.jar:?]
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:142) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.scheduler.Task.run(Task.scala:141) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) ~[spark-core_2.12-3.4.1-amzn-1.jar:3.4.1-amzn-1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

Iceberg 1.4.0, Spark 3.4.1 (EMR 6.14.0), JDK 17.

@camper42
Copy link
Author

Can this caused by critical bug described in #8834 ? @bryanck

If it is, this issue could be closed

@camper42
Copy link
Author

camper42 commented Nov 8, 2023

after 1.4.2, no IllegalArgumentException

@Hathoute
Copy link

Hathoute commented Nov 9, 2023

Upgrading to 1.4.1 or later AND rewriting manifests seems to resolve this.

@valerio-leap
Copy link

Any idea how to overcome this issue if upgrading is not an option?

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Sep 24, 2024
Copy link

github-actions bot commented Oct 9, 2024

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants