Skip to content

CDAP 6.2.0

Compare
Choose a tag to compare
@elfenheart elfenheart released this 29 May 03:44
· 362 commits to release/6.2 since this release
4549e37

Summary

This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:

  1. Replication
    • A CDAP application using which you can easily replicate data at low-latency and in real-time from transactional and operational databases into analytical data warehouses.
  2. Google Cloud Dataproc Runtime Improvement
    • The Google Cloud Dataproc runtime now uses native Dataproc API's for job submission instead of SSH.
  3. Pipeline Studio Improvements
    • Added the ability to perform bulk operations (copy, delete) in the pipeline Studio. Also added a right-click context menu for the Studio.

New Features

  • Added JDBC plugin selector widget. (CDAP-16385)
  • Introduced a new REST endpoint for fetching scheduled time for multiple programs. (CDAP-16339)
  • Added new capability to start system applications using application specific config during startup. (CDAP-16243)
  • Added Replication feature. (CDAP-16223)
  • Added support for connecting to multiple hubs through market.base.urls property in cdap-site. (CDAP-16210)
  • Added the ability to right-click on the Pipeline Studio canvas to add a Wrangler source. This allows you to add multiple Wrangler sources (source + Wrangler transform) in the same pipeline without losing context. (CDAP-16130)
  • Added support for Spark 2.4. (CDAP-16107)
  • Added date picker widget to allow users to specify a single date or date range in a plugin. (CDAP-15941)
  • Added support to launch a job using Google Cloud Dataproc APIs. (CDAP-15633)
  • Added the ability to select multiple plugins and connections from Pipeline Studio copy or delete them in bulk. (CDAP-9014)

Improvements

  • Added option to generate scoped GoogleCredentials with Google BigQuery and Google Drive scope for all Google BigQuery requests. (CDAP-16633)
  • Added macro support for Format field in Google Cloud Storage plugin. (CDAP-16572)
  • Added an option for Database source to replace characters in the field names. (CDAP-16525)
  • Added support for copying header on compressed file. (CDAP-16809)
  • Added support for rendering large schemas (>1000 fields) in Pipeline UI by collapsing complex schemas and lazy-load fields in record types. (CDAP-16656)
  • Make the View Raw Logs and Download Logs buttons to be enabled all the time in the log viewer page. (CDAP-16616)
  • Added restrictions on the maximum number of network tags for Dataproc VM to be 64. (CDAP-16593)
  • Changed behavior for selecting multiple nodes in Studio to require the user to hold the key [shift] and click on the plugins (instead of holding [ctrl] and then click). (CDAP-16586)
  • Improved program startup performance by using a thread pool to start a program instead of starting from a single thread. (CDAP-16521)
  • Added an option to skip header in the files in delimited, csv, tsv, and text formats. (CDAP-16517)
  • Reduced memory footprint for StructureRecord which improves overall memory consumption for pipeline execution. (CDAP-16509)
  • Added an API that returns the names of input stages. (CDAP-16351)
  • Replaced config.getProperties with config.getRawProperties to make sure validation happens on raw value before macros are evaluated. (CDAP-16330)
  • Added macro support for Analytics plugins. (CDAP-16324)
  • Reduced preview startup by 60%. Also added limit to maximum concurrent preview runs (10 by default). (CDAP-16308)
  • Added ability to show dropped field operation from field level lineage page. (CDAP-16249)
  • For field level lineage, added ability for user to view all fields in a cause or impact dataset (not just the related fields). (CDAP-16248)
  • Unified JSON structure used by REST endpoints for fetching pipeline configuration and deploying pipelines. (CDAP-16211)
  • Added ability for user to navigate to non-target dataset by selecting the header of the dataset in field level lineage. (CDAP-15894)
  • Added the ability for SparkCompute and SparkSink to record field level lineage. (CDAP-15579)
  • Added a page level error when the user navigates to an invalid pipeline via the URL. (CDAP-15061)
  • Added support for recording field level lineage in streaming pipelines. (CDAP-13643)

Bug Fixes

  • Fixed schedule properties to overwrite preferences set on the application instead of the other way around. This most visibly fixed a bug where the compute profile set on a pipeline schedule or trigger would get overwritten by the profile for the pipeline. (CDAP-16816)
  • Fixed a bug where UI overwrites scale and precision properties in a schema with decimal logical type if the value is 0. (CDAP-16751)
  • Fixed record schema comparison to include record name. (CDAP-16736)
  • Fixed a bug where concurrent preview runs were failing because SparkConf for the new preview runs was getting populated with the configurations from the previously started in-progress preview run. (CDAP-16725)
  • Fixed a bug in Wrangler that would cause it to go out of memory when sampling a Google Cloud Storage object that has a lot of rows. (CDAP-16724)
  • Fixed a bug that resulted in failure to update/upsert to Google BigQuery in a different project. (CDAP-16664)
  • Fixed a bug where UI incorrectly showed "No schema available" when the output of the previous stage is a macro. (CDAP-16663)
  • Fixed a bug in File source that prevented reading files from Google Cloud Storage. (CDAP-16655)
  • Fixed the fetch run records API to honor the limit query parameter correctly. (CDAP-16614)
  • Fixed a bug that prevented a user from using parse-as-json directive in Wrangler. (CDAP-16581)
  • Fixed a bug in the PluginProperties class where internal map was modifiable. (CDAP-16538)
  • Fixed Google BigQuery sink to properly allow certain types as clustering fields. (CDAP-16526)
  • Fixed a bug to correctly update pipeline stage metrics in UI. (CDAP-16501)
  • Fixed a bug that would leave zombie processes when using the Remote Hadoop Provisioner. (CDAP-16471)
  • Fixed a bug where Wrangler database connections could show more tables than those in the configured database. (CDAP-16465)
  • Fixed a bug with LimitingInputFormat that made Database source plugin fail in preview mode. (CDAP-16453)
  • Fixed macro support for output schema in Google BigQuery source plugin. (CDAP-16425)
  • Fixed a race condition bug that can cause failure when running Spark program. (CDAP-16309)
  • Fixed a bug to show master and worker memory in Google Cloud Dataproc compute profiles in GB. (CDAP-16240)
  • Fixed a bug where the failure message emitted by Spark driver was not being collected. (CDAP-16055)
  • Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files. (CDAP-16002)
  • Fixed a bug that disallowed writing to an empty Google BigQuery table without any data or schema. (CDAP-15775)
  • Fixed a bug that would cause the Google BigQuery sink to fail the pipeline run if there was no data to write. (CDAP-15649)
  • Fixed a bug in the custom date range picker that prevented users from setting a custom date range that is not in the current year. (CDAP-14850)
  • Fixed a bug where users cannot delete the entire column name in Wrangler. (CDAP-14190)