Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(deps): update dependency io.delta:delta-core_2.12 to v1.2.1 #1293

Closed
wants to merge 1 commit into from

Conversation

renovate[bot]
Copy link

@renovate renovate bot commented Sep 21, 2022

Mend Renovate

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
io.delta:delta-core_2.12 (source) 1.1.0-nessie -> 1.2.1 age adoption passing confidence

Release Notes

delta-io/delta

v1.2.1

We are excited to announce the release of Delta Lake 1.2.1 on Apache Spark 3.2. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Key features in this release

  • Fix an issue with loading error messages in --packages mode. Previous release had a bug that resulted in user getting NullPointerException instead of proper error message when using Delta Lake with --packages mode either in pyspark or spark-shell (Fix, Test)
  • Fix incorrect exception type thrown in some Python APIs. A bug caused pyspark to throw incorrect type of exceptions instead of expected AnalysisException. This issue is fixed. See issue #​1086 for more details.
  • Fix for S3 multi-cluster mode configuration. A bug in the S3 multi-cluster mode caused --conf to not work for certain configuration parameters. This issue is fixed by having these configuration parameters begin with spark. See the updated documentation.
  • Make the GCS LogStore configuration simpler by automatically deriving the LogStore implementation class config spark.delta.logStore.gs.impl from the scheme in the table path. See the updated documentation.
  • Make SetAccumulator thread safe. SetAccumulator used by Merge was not thread safe and might cause executor heartbeat failures in rare cases. This was fixed by using a synchronized set.

Credits

Allison Portis, Chang Yong Lik, Kam Cheung Ting, Rahul Mahadev, Scott Sandre, Venki Korukanti

v1.2.0

We are excited to announce the release of Delta Lake 1.2.0 on Apache Spark 3.2. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

Key features in this release

  • Support multi-cluster write in Delta Lake tables stored in S3. Users now have the option of specifying a new and experimental LogStore implementation that supports concurrent reads and writes to a single Delta Lake table in S3 from multiple Spark drivers. See the documentation for more details.

  • Support for compacting small files (optimize) into larger files in a Delta Lake table. Reduced number of data files improves read latency due to reduced metadata size and per-file overheads such as file-open overhead and file-close overhead. See the documentation for more details.

  • Support for data skipping using column statistics. Column statistics are collected for each file as part of the Delta Lake table writes. These statistics can be used during the reading of a Delta Lake table to skip reading files not matching the filters in the query. See the documentation for more details.

  • Support for restoring a Delta table to an earlier version. Restoring to an earlier version number or a version of a specific timestamp is supported using the SQL command, Scala APIs or Python APIs. See the documentation for more details.

  • Support for column renaming in a Delta Lake table without the need to rewrite the underlying Parquet data files. See the documentation for more details.

  • Support for arbitrary characters in column names in Delta tables. Before, the supported list of characters was limited by the support of the same in Parquet data format. Column names containing special characters such space, tab, ,, {, ( etc. are supported now. See the documentation for more details.

  • Support for automatic data skipping using generated columns. For any partition column that is a generated column, partition filters will be automatically generated from any data filters on its generating column(s), when possible.

  • Support for Google Cloud Storage is now generally available. See the documentation on how to read and write Delta Lake tables in Google Cloud Storage.

  • Other notable changes

    • Create a new module delta-storage. This extracts out the LogStore interface and implementations in a separate module which is published as its own jar. This enables new implementations of LogStore without depending upon the complete Delta jars. See the migration guide here for more details.
    • Improve the error messages and exceptions to be better organized and queryable.
    • Support for gettimestamp expression in generated columns.
    • Snapshot/Checkpoint management improvements
      • Make loading snapshots resilient to corrupt checkpoints in Delta. When reading a checkpoint fails, we try to search for an alternative checkpoint and use it to construct a snapshot.
      • Fix to snapshot writing to not fail the write when a checkpoint fails due to non-fatal errors.
      • Optimization to reduce the number of list calls to storage
    • Improved output metrics for DELETE table command.
    • Improved output metrics for UPDATE table command.
    • Optimize merge operation in a Delta table with a large number of columns.
    • Fix a NullPointerException when trying to reference a DeltaLog created with a SparkContext that has stopped.
    • Fix an issue in handling null partition column values in the change data capture feature.
    • Fix an issue in adding a new column to the Delta table when the preceding column is of type Array.
    • Fix an issue where we are not closing the file list iterator when reading large log files in the Delta Streaming source.
    • Throw proper exceptions when searching for a Delta table in the catalog.
    • Fix a schema evolution issue when the column type is an array of structs.
    • Better handling of FileNotFoundException when reading Delta log files to distinguish between the corrupt log files and no files found.

Benchmark Framework

Independent of this release, we have also built a framework for writing large scale performance benchmarks on Delta tables using a real cluster. Currently, the framework provides a TPC-DS inspired benchmark to measure the ingestion time (e.g. time taken to create TPC-DS tables) and query times. But we encourage the community to contribute more benchmarks to measure performance of different real-world workloads on Delta tables.

Credits

Adam Binford, Alex Liu, Allison Portis, Anton Okolnychyi, Bart Samwel, Carmen Kwan, Chang Yong Lik, Christian Williams, Christos Stavrakakis, David Lewis, Denny Lee, Fabio Badalì, Fred Liu, Gengliang Wang, Hoang Pham, Hussein Nagree, Hyukjin Kwon, Jackie Zhang, Jan Paw, John ODwyer, Junlin Zeng, Jackie Zhang, Junyong Lee, Kam Cheung Ting, Kapil Sreedharan, Lars Kroll, Liwen Sun, Maksym Dovhal, Mariusz Krynski, Meng Tong, Peng Zhong, Prakhar Jain, Pranav, Ryan Johnson, Sabir Akhadov, Scott Sandre, Shixiong Zhu, Sri Tikkireddy, Tathagata Das, Tyson Condie, Vegard Stikbakke, Venkata Sai Akhil Gudesa, Venki Korukanti, Vini Jaiswal, Wenchen Fan, Will Jones, Xinyi Yu, Yann Byron, Yaohua Zhao, Yijia Cui


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled due to failing status checks.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, click this checkbox.

This PR has been generated by Mend Renovate. View repository job log here.

@renovate renovate bot added the dependencies Pull requests that update a dependency file label Sep 21, 2022
@renovate renovate bot enabled auto-merge (squash) September 21, 2022 20:34
@snazy snazy force-pushed the main branch 2 times, most recently from f108430 to 6883748 Compare September 21, 2022 23:22
@renovate renovate bot force-pushed the renovate/io.delta branch 16 times, most recently from 28da575 to a1d3e08 Compare September 24, 2022 03:20
@renovate renovate bot force-pushed the renovate/io.delta branch from a1d3e08 to 3fb7935 Compare September 26, 2022 02:34
@snazy snazy closed this Sep 26, 2022
auto-merge was automatically disabled September 26, 2022 08:12

Pull request was closed

@snazy snazy deleted the renovate/io.delta branch September 26, 2022 17:55
@renovate
Copy link
Author

renovate bot commented Feb 21, 2023

Renovate Ignore Notification

As this PR has been closed unmerged, Renovate will now ignore this update (1.2.1). You will still receive a PR once a newer version is released, so if you wish to permanently ignore this dependency, please add it to the ignoreDeps array of your renovate config.

If this PR was closed by mistake or you changed your mind, you can simply rename this PR and you will soon get a fresh replacement PR opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant