-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689
Comments
@rjayapalan do you have a small reproducible example by any chance? That would greatly help for anyone looking at this issue |
@nastra I cannot produce the exact code that I used due to confidential information in there. But these are the steps that I used to reproduce
|
Thanks for the repro steps @rjayapalan ! I just wanted to ask a clarifying question which would help narrow down possible culprits before I go ahead and attempt a repro. Specifically, did you have any writes in your table performed by Iceberg 1.4.0? This was a known issue that produced invalid split offsets for data files that as you saw was fixed in subsequent release candidate 1.4.2. The fix was to ignore split offsets if we detected they are out of bounds of the file (the exception you are seeing). If you did perform any writes on the table with Iceberg 1.4.0 then it could be that there is somehow a remaining place where we should be ignoring the invalid split offsets but we aren't. If you did not, then this would most likely be some other issue that we should investigate and the repro seems to be a good starting point. |
Hi @amogh-jahagirdar |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Apache Iceberg version
1.4.2
Query engine
Spark
Please describe the bug 🐞
I am aware of this similar issue that was addressed as part of iceberg 1.4.1 release
#8834
But this one seems different to me. This error comes up after performing a DDL change on the existing iceberg table (adding a new column) and then performing "rewrite_manifest" maintenance operation using Spark stored procedure.
I was able to recreate the issue following this pattern (ALTER TABLE ADD COLUMN... -> CALL rewrite_manifest(...)). Not sure what is causing this or is this a bug in the first place?
Error stack trace:
An error was encountered: An error occurred while calling o431.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 432.0 failed 4 times, most recent failure: Lost task 3.3 in stage 432.0 (TID 23496) ([2600:1f18:610f:a400:cea3:e3ca:45b8:9398] executor 409): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to s3://bucket/prefix/crm_dev_unload. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:789) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:421) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 at scala.Predef$.require(Predef.scala:281) at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79) at org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:93) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63) at scala.Option.exists(Option.scala:376) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1575) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411) ... 15 more
Environment : EMR 6.15 || Spark 3.4 || Iceberg 1.4.2
The text was updated successfully, but these errors were encountered: