Release CDAP 6.2.0 · cdapio/cdap

Summary

This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:

Replication
- A CDAP application using which you can easily replicate data at low-latency and in real-time from transactional and operational databases into analytical data warehouses.
Google Cloud Dataproc Runtime Improvement
- The Google Cloud Dataproc runtime now uses native Dataproc API's for job submission instead of SSH.
Pipeline Studio Improvements
- Added the ability to perform bulk operations (copy, delete) in the pipeline Studio. Also added a right-click context menu for the Studio.

New Features

Added JDBC plugin selector widget. (CDAP-16385)
Introduced a new REST endpoint for fetching scheduled time for multiple programs. (CDAP-16339)
Added new capability to start system applications using application specific config during startup. (CDAP-16243)
Added Replication feature. (CDAP-16223)
Added support for connecting to multiple hubs through market.base.urls property in cdap-site. (CDAP-16210)
Added the ability to right-click on the Pipeline Studio canvas to add a Wrangler source. This allows you to add multiple Wrangler sources (source + Wrangler transform) in the same pipeline without losing context. (CDAP-16130)
Added support for Spark 2.4. (CDAP-16107)
Added date picker widget to allow users to specify a single date or date range in a plugin. (CDAP-15941)
Added support to launch a job using Google Cloud Dataproc APIs. (CDAP-15633)
Added the ability to select multiple plugins and connections from Pipeline Studio copy or delete them in bulk. (CDAP-9014)

Improvements

Added option to generate scoped GoogleCredentials with Google BigQuery and Google Drive scope for all Google BigQuery requests. (CDAP-16633)
Added macro support for Format field in Google Cloud Storage plugin. (CDAP-16572)
Added an option for Database source to replace characters in the field names. (CDAP-16525)
Added support for copying header on compressed file. (CDAP-16809)
Added support for rendering large schemas (>1000 fields) in Pipeline UI by collapsing complex schemas and lazy-load fields in record types. (CDAP-16656)
Make the View Raw Logs and Download Logs buttons to be enabled all the time in the log viewer page. (CDAP-16616)
Added restrictions on the maximum number of network tags for Dataproc VM to be 64. (CDAP-16593)
Changed behavior for selecting multiple nodes in Studio to require the user to hold the key [shift] and click on the plugins (instead of holding [ctrl] and then click). (CDAP-16586)
Improved program startup performance by using a thread pool to start a program instead of starting from a single thread. (CDAP-16521)
Added an option to skip header in the files in delimited, csv, tsv, and text formats. (CDAP-16517)
Reduced memory footprint for StructureRecord which improves overall memory consumption for pipeline execution. (CDAP-16509)
Added an API that returns the names of input stages. (CDAP-16351)
Replaced config.getProperties with config.getRawProperties to make sure validation happens on raw value before macros are evaluated. (CDAP-16330)
Added macro support for Analytics plugins. (CDAP-16324)
Reduced preview startup by 60%. Also added limit to maximum concurrent preview runs (10 by default). (CDAP-16308)
Added ability to show dropped field operation from field level lineage page. (CDAP-16249)
For field level lineage, added ability for user to view all fields in a cause or impact dataset (not just the related fields). (CDAP-16248)
Unified JSON structure used by REST endpoints for fetching pipeline configuration and deploying pipelines. (CDAP-16211)
Added ability for user to navigate to non-target dataset by selecting the header of the dataset in field level lineage. (CDAP-15894)
Added the ability for SparkCompute and SparkSink to record field level lineage. (CDAP-15579)
Added a page level error when the user navigates to an invalid pipeline via the URL. (CDAP-15061)
Added support for recording field level lineage in streaming pipelines. (CDAP-13643)

Bug Fixes

Fixed schedule properties to overwrite preferences set on the application instead of the other way around. This most visibly fixed a bug where the compute profile set on a pipeline schedule or trigger would get overwritten by the profile for the pipeline. (CDAP-16816)
Fixed a bug where UI overwrites scale and precision properties in a schema with decimal logical type if the value is 0. (CDAP-16751)
Fixed record schema comparison to include record name. (CDAP-16736)
Fixed a bug where concurrent preview runs were failing because SparkConf for the new preview runs was getting populated with the configurations from the previously started in-progress preview run. (CDAP-16725)
Fixed a bug in Wrangler that would cause it to go out of memory when sampling a Google Cloud Storage object that has a lot of rows. (CDAP-16724)
Fixed a bug that resulted in failure to update/upsert to Google BigQuery in a different project. (CDAP-16664)
Fixed a bug where UI incorrectly showed "No schema available" when the output of the previous stage is a macro. (CDAP-16663)
Fixed a bug in File source that prevented reading files from Google Cloud Storage. (CDAP-16655)
Fixed the fetch run records API to honor the limit query parameter correctly. (CDAP-16614)
Fixed a bug that prevented a user from using parse-as-json directive in Wrangler. (CDAP-16581)
Fixed a bug in the PluginProperties class where internal map was modifiable. (CDAP-16538)
Fixed Google BigQuery sink to properly allow certain types as clustering fields. (CDAP-16526)
Fixed a bug to correctly update pipeline stage metrics in UI. (CDAP-16501)
Fixed a bug that would leave zombie processes when using the Remote Hadoop Provisioner. (CDAP-16471)
Fixed a bug where Wrangler database connections could show more tables than those in the configured database. (CDAP-16465)
Fixed a bug with LimitingInputFormat that made Database source plugin fail in preview mode. (CDAP-16453)
Fixed macro support for output schema in Google BigQuery source plugin. (CDAP-16425)
Fixed a race condition bug that can cause failure when running Spark program. (CDAP-16309)
Fixed a bug to show master and worker memory in Google Cloud Dataproc compute profiles in GB. (CDAP-16240)
Fixed a bug where the failure message emitted by Spark driver was not being collected. (CDAP-16055)
Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files. (CDAP-16002)
Fixed a bug that disallowed writing to an empty Google BigQuery table without any data or schema. (CDAP-15775)
Fixed a bug that would cause the Google BigQuery sink to fail the pipeline run if there was no data to write. (CDAP-15649)
Fixed a bug in the custom date range picker that prevented users from setting a custom date range that is not in the current year. (CDAP-14850)
Fixed a bug where users cannot delete the entire column name in Wrangler. (CDAP-14190)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDAP 6.2.0

Summary

New Features

Improvements

Bug Fixes