CDAP 6.2.0
elfenheart
released this
29 May 03:44
·
362 commits
to release/6.2
since this release
Summary
This release introduces a number of new features, improvements, and bug fixes to CDAP. Some of the main highlights of the release are:
- Replication
- A CDAP application using which you can easily replicate data at low-latency and in real-time from transactional and operational databases into analytical data warehouses.
- Google Cloud Dataproc Runtime Improvement
- The Google Cloud Dataproc runtime now uses native Dataproc API's for job submission instead of SSH.
- Pipeline Studio Improvements
- Added the ability to perform bulk operations (copy, delete) in the pipeline Studio. Also added a right-click context menu for the Studio.
New Features
- Added JDBC plugin selector widget. (CDAP-16385)
- Introduced a new REST endpoint for fetching scheduled time for multiple programs. (CDAP-16339)
- Added new capability to start system applications using application specific config during startup. (CDAP-16243)
- Added Replication feature. (CDAP-16223)
- Added support for connecting to multiple hubs through market.base.urls property in cdap-site. (CDAP-16210)
- Added the ability to right-click on the Pipeline Studio canvas to add a Wrangler source. This allows you to add multiple Wrangler sources (source + Wrangler transform) in the same pipeline without losing context. (CDAP-16130)
- Added support for Spark 2.4. (CDAP-16107)
- Added date picker widget to allow users to specify a single date or date range in a plugin. (CDAP-15941)
- Added support to launch a job using Google Cloud Dataproc APIs. (CDAP-15633)
- Added the ability to select multiple plugins and connections from Pipeline Studio copy or delete them in bulk. (CDAP-9014)
Improvements
- Added option to generate scoped GoogleCredentials with Google BigQuery and Google Drive scope for all Google BigQuery requests. (CDAP-16633)
- Added macro support for Format field in Google Cloud Storage plugin. (CDAP-16572)
- Added an option for Database source to replace characters in the field names. (CDAP-16525)
- Added support for copying header on compressed file. (CDAP-16809)
- Added support for rendering large schemas (>1000 fields) in Pipeline UI by collapsing complex schemas and lazy-load fields in record types. (CDAP-16656)
- Make the View Raw Logs and Download Logs buttons to be enabled all the time in the log viewer page. (CDAP-16616)
- Added restrictions on the maximum number of network tags for Dataproc VM to be 64. (CDAP-16593)
- Changed behavior for selecting multiple nodes in Studio to require the user to hold the key [shift] and click on the plugins (instead of holding [ctrl] and then click). (CDAP-16586)
- Improved program startup performance by using a thread pool to start a program instead of starting from a single thread. (CDAP-16521)
- Added an option to skip header in the files in delimited, csv, tsv, and text formats. (CDAP-16517)
- Reduced memory footprint for StructureRecord which improves overall memory consumption for pipeline execution. (CDAP-16509)
- Added an API that returns the names of input stages. (CDAP-16351)
- Replaced config.getProperties with config.getRawProperties to make sure validation happens on raw value before macros are evaluated. (CDAP-16330)
- Added macro support for Analytics plugins. (CDAP-16324)
- Reduced preview startup by 60%. Also added limit to maximum concurrent preview runs (10 by default). (CDAP-16308)
- Added ability to show dropped field operation from field level lineage page. (CDAP-16249)
- For field level lineage, added ability for user to view all fields in a cause or impact dataset (not just the related fields). (CDAP-16248)
- Unified JSON structure used by REST endpoints for fetching pipeline configuration and deploying pipelines. (CDAP-16211)
- Added ability for user to navigate to non-target dataset by selecting the header of the dataset in field level lineage. (CDAP-15894)
- Added the ability for SparkCompute and SparkSink to record field level lineage. (CDAP-15579)
- Added a page level error when the user navigates to an invalid pipeline via the URL. (CDAP-15061)
- Added support for recording field level lineage in streaming pipelines. (CDAP-13643)
Bug Fixes
- Fixed schedule properties to overwrite preferences set on the application instead of the other way around. This most visibly fixed a bug where the compute profile set on a pipeline schedule or trigger would get overwritten by the profile for the pipeline. (CDAP-16816)
- Fixed a bug where UI overwrites scale and precision properties in a schema with decimal logical type if the value is 0. (CDAP-16751)
- Fixed record schema comparison to include record name. (CDAP-16736)
- Fixed a bug where concurrent preview runs were failing because SparkConf for the new preview runs was getting populated with the configurations from the previously started in-progress preview run. (CDAP-16725)
- Fixed a bug in Wrangler that would cause it to go out of memory when sampling a Google Cloud Storage object that has a lot of rows. (CDAP-16724)
- Fixed a bug that resulted in failure to update/upsert to Google BigQuery in a different project. (CDAP-16664)
- Fixed a bug where UI incorrectly showed "No schema available" when the output of the previous stage is a macro. (CDAP-16663)
- Fixed a bug in File source that prevented reading files from Google Cloud Storage. (CDAP-16655)
- Fixed the fetch run records API to honor the limit query parameter correctly. (CDAP-16614)
- Fixed a bug that prevented a user from using parse-as-json directive in Wrangler. (CDAP-16581)
- Fixed a bug in the PluginProperties class where internal map was modifiable. (CDAP-16538)
- Fixed Google BigQuery sink to properly allow certain types as clustering fields. (CDAP-16526)
- Fixed a bug to correctly update pipeline stage metrics in UI. (CDAP-16501)
- Fixed a bug that would leave zombie processes when using the Remote Hadoop Provisioner. (CDAP-16471)
- Fixed a bug where Wrangler database connections could show more tables than those in the configured database. (CDAP-16465)
- Fixed a bug with LimitingInputFormat that made Database source plugin fail in preview mode. (CDAP-16453)
- Fixed macro support for output schema in Google BigQuery source plugin. (CDAP-16425)
- Fixed a race condition bug that can cause failure when running Spark program. (CDAP-16309)
- Fixed a bug to show master and worker memory in Google Cloud Dataproc compute profiles in GB. (CDAP-16240)
- Fixed a bug where the failure message emitted by Spark driver was not being collected. (CDAP-16055)
- Fixed a bug that caused errors when Wrangler's parse-as-csv with header was used when reading multiple small files. (CDAP-16002)
- Fixed a bug that disallowed writing to an empty Google BigQuery table without any data or schema. (CDAP-15775)
- Fixed a bug that would cause the Google BigQuery sink to fail the pipeline run if there was no data to write. (CDAP-15649)
- Fixed a bug in the custom date range picker that prevented users from setting a custom date range that is not in the current year. (CDAP-14850)
- Fixed a bug where users cannot delete the entire column name in Wrangler. (CDAP-14190)