java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

rjayapalan · 2024-02-08T20:14:30Z

Apache Iceberg version

1.4.2

Query engine

Spark

Please describe the bug 🐞

I am aware of this similar issue that was addressed as part of iceberg 1.4.1 release
#8834

But this one seems different to me. This error comes up after performing a DDL change on the existing iceberg table (adding a new column) and then performing "rewrite_manifest" maintenance operation using Spark stored procedure.

I was able to recreate the issue following this pattern (ALTER TABLE ADD COLUMN... -> CALL rewrite_manifest(...)). Not sure what is causing this or is this a bug in the first place?

Error stack trace:

An error was encountered: An error occurred while calling o431.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 432.0 failed 4 times, most recent failure: Lost task 3.3 in stage 432.0 (TID 23496) ([2600:1f18:610f:a400:cea3:e3ca:45b8:9398] executor 409): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to s3://bucket/prefix/crm_dev_unload. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:789) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:421) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 at scala.Predef$.require(Predef.scala:281) at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79) at org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:93) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63) at scala.Option.exists(Option.scala:376) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1575) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411) ... 15 more

Environment : EMR 6.15 || Spark 3.4 || Iceberg 1.4.2

The text was updated successfully, but these errors were encountered:

nastra · 2024-02-09T08:11:13Z

@rjayapalan do you have a small reproducible example by any chance? That would greatly help for anyone looking at this issue

rjayapalan · 2024-02-09T18:34:48Z

@nastra I cannot produce the exact code that I used due to confidential information in there. But these are the steps that I used to reproduce

Create an iceberg table with 189 columns with the following table properties specified during table creation

spark.sql("""
    CREATE TABLE hive.dev.test (
      ........
      ........
      ........
    )
    USING iceberg
    TBLPROPERTIES (
      'format-version' = '2',
      'write.metadata.delete-after-commit.enabled' = 'true',
      'write.metadata.previous-versions-max' = '25')
"""
)

Load close to 1 million rows using MERGE command from source table to the above target table. Repeat the load for atleast 6 times so that 6 commits/snapshots are produced
Run the following maintenance procedures in the same order

from datetime import datetime, timedelta

now_str = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
olderthan_str = (datetime.utcnow() - timedelta(days=3)).strftime("%Y-%m-%d %H:%M:%S")

spark.sql("CALL hive.system.rewrite_data_files(table => 'dev.test', strategy => 'sort' , sort_order => 'usertoken ASC NULLS LAST' )").show(truncate=False)
spark.sql(f"CALL hive.system.expire_snapshots(table => 'dev.test', older_than => TIMESTAMP '{now_str}', retain_last => 2)").show(truncate=False)
spark.sql("CALL hive.system.rewrite_manifests(table => 'dev.test', use_caching => true)").show(truncate=False)
spark.sql(f"CALL hive.system.remove_orphan_files(table => 'dev.test', older_than => TIMESTAMP '{olderthan_str}')").show(truncate=False)

Now read all the data from the iceberg table and write it out as parquet files. This step should succeed without any issues.

df = spark.table("hive.dev.test")
df.write.mode("overwrite").parquet("s3://bucket/tmp/dev_unload/")

Now add a new column to the above iceberg table

spark.sql("""alter table hive.dev.test add column testcolumn string""")

Repeat STEP #2 and then STEP #3
Now executing STEP #4 gave me the error

amogh-jahagirdar · 2024-02-14T01:24:36Z

Thanks for the repro steps @rjayapalan ! I just wanted to ask a clarifying question which would help narrow down possible culprits before I go ahead and attempt a repro.

Specifically, did you have any writes in your table performed by Iceberg 1.4.0? This was a known issue that produced invalid split offsets for data files that as you saw was fixed in subsequent release candidate 1.4.2. The fix was to ignore split offsets if we detected they are out of bounds of the file (the exception you are seeing).

If you did perform any writes on the table with Iceberg 1.4.0 then it could be that there is somehow a remaining place where we should be ignoring the invalid split offsets but we aren't.

If you did not, then this would most likely be some other issue that we should investigate and the repro seems to be a good starting point.

rjayapalan · 2024-02-14T17:41:11Z

Hi @amogh-jahagirdar
Yes we were writing to the iceberg table initially using version 1.4.0 until we came across this same issue. So as per recommendation we upgraded to version 1.4.2. Since we could not recover from that error, we did a INSERT OVERWRITE on the existing table (created as part of version 1.4.0) using version 1.4.2 from a backup assuming new metadata files will be created and things should go normal. Things were working fine and again we ran into this error.

github-actions · 2024-10-14T00:16:16Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-10-29T00:15:26Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions bot added the stale label Oct 14, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

rjayapalan commented Feb 8, 2024 •

edited

Loading

nastra commented Feb 9, 2024

rjayapalan commented Feb 9, 2024

amogh-jahagirdar commented Feb 14, 2024

rjayapalan commented Feb 14, 2024

github-actions bot commented Oct 14, 2024

github-actions bot commented Oct 29, 2024

java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

Comments

rjayapalan commented Feb 8, 2024 • edited Loading

Apache Iceberg version

Query engine

Please describe the bug 🐞

nastra commented Feb 9, 2024

rjayapalan commented Feb 9, 2024

amogh-jahagirdar commented Feb 14, 2024

rjayapalan commented Feb 14, 2024

github-actions bot commented Oct 14, 2024

github-actions bot commented Oct 29, 2024

rjayapalan commented Feb 8, 2024 •

edited

Loading