Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 #9689

Closed
rjayapalan opened this issue Feb 8, 2024 · 6 comments
Labels

Comments

@rjayapalan
Copy link

rjayapalan commented Feb 8, 2024

Apache Iceberg version

1.4.2

Query engine

Spark

Please describe the bug 🐞

I am aware of this similar issue that was addressed as part of iceberg 1.4.1 release
#8834

But this one seems different to me. This error comes up after performing a DDL change on the existing iceberg table (adding a new column) and then performing "rewrite_manifest" maintenance operation using Spark stored procedure.

I was able to recreate the issue following this pattern (ALTER TABLE ADD COLUMN... -> CALL rewrite_manifest(...)). Not sure what is causing this or is this a bug in the first place?

Error stack trace:

An error was encountered: An error occurred while calling o431.parquet. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 432.0 failed 4 times, most recent failure: Lost task 3.3 in stage 432.0 (TID 23496) ([2600:1f18:610f:a400:cea3:e3ca:45b8:9398] executor 409): org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to s3://bucket/prefix/crm_dev_unload. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:789) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:421) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:563) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:566) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalArgumentException: requirement failed: length (-6235972) cannot be smaller than -1 at scala.Predef$.require(Predef.scala:281) at org.apache.spark.rdd.InputFileBlockHolder$.set(InputFileBlockHolder.scala:79) at org.apache.spark.rdd.InputFileBlockHolder.set(InputFileBlockHolder.scala) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:93) at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:43) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:63) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:63) at scala.Option.exists(Option.scala:376) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:404) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1575) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:411) ... 15 more

Environment : EMR 6.15 || Spark 3.4 || Iceberg 1.4.2

@nastra
Copy link
Contributor

nastra commented Feb 9, 2024

@rjayapalan do you have a small reproducible example by any chance? That would greatly help for anyone looking at this issue

@rjayapalan
Copy link
Author

@nastra I cannot produce the exact code that I used due to confidential information in there. But these are the steps that I used to reproduce

  1. Create an iceberg table with 189 columns with the following table properties specified during table creation
spark.sql("""
    CREATE TABLE hive.dev.test (
      ........
      ........
      ........
    )
    USING iceberg
    TBLPROPERTIES (
      'format-version' = '2',
      'write.metadata.delete-after-commit.enabled' = 'true',
      'write.metadata.previous-versions-max' = '25')
"""
)
  1. Load close to 1 million rows using MERGE command from source table to the above target table. Repeat the load for atleast 6 times so that 6 commits/snapshots are produced
  2. Run the following maintenance procedures in the same order
from datetime import datetime, timedelta

now_str = datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
olderthan_str = (datetime.utcnow() - timedelta(days=3)).strftime("%Y-%m-%d %H:%M:%S")

spark.sql("CALL hive.system.rewrite_data_files(table => 'dev.test', strategy => 'sort' , sort_order => 'usertoken ASC NULLS LAST' )").show(truncate=False)
spark.sql(f"CALL hive.system.expire_snapshots(table => 'dev.test', older_than => TIMESTAMP '{now_str}', retain_last => 2)").show(truncate=False)
spark.sql("CALL hive.system.rewrite_manifests(table => 'dev.test', use_caching => true)").show(truncate=False)
spark.sql(f"CALL hive.system.remove_orphan_files(table => 'dev.test', older_than => TIMESTAMP '{olderthan_str}')").show(truncate=False)
  1. Now read all the data from the iceberg table and write it out as parquet files. This step should succeed without any issues.
df = spark.table("hive.dev.test")
df.write.mode("overwrite").parquet("s3://bucket/tmp/dev_unload/")
  1. Now add a new column to the above iceberg table
spark.sql("""alter table hive.dev.test add column testcolumn string""")
  1. Repeat STEP #2 and then STEP #3
  2. Now executing STEP #4 gave me the error

@amogh-jahagirdar
Copy link
Contributor

Thanks for the repro steps @rjayapalan ! I just wanted to ask a clarifying question which would help narrow down possible culprits before I go ahead and attempt a repro.

Specifically, did you have any writes in your table performed by Iceberg 1.4.0? This was a known issue that produced invalid split offsets for data files that as you saw was fixed in subsequent release candidate 1.4.2. The fix was to ignore split offsets if we detected they are out of bounds of the file (the exception you are seeing).

If you did perform any writes on the table with Iceberg 1.4.0 then it could be that there is somehow a remaining place where we should be ignoring the invalid split offsets but we aren't.

If you did not, then this would most likely be some other issue that we should investigate and the repro seems to be a good starting point.

@rjayapalan
Copy link
Author

Hi @amogh-jahagirdar
Yes we were writing to the iceberg table initially using version 1.4.0 until we came across this same issue. So as per recommendation we upgraded to version 1.4.2. Since we could not recover from that error, we did a INSERT OVERWRITE on the existing table (created as part of version 1.4.0) using version 1.4.2 from a backup assuming new metadata files will be created and things should go normal. Things were working fine and again we ran into this error.

Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Oct 14, 2024
Copy link

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants