Cask Data Application Platform v3.3.0
awholegunch
released this
21 Jan 02:21
·
579 commits
to release/3.3
since this release
New Features
- Added on demand (dynamic) dataset instantiation through program runtime context. (CDAP-961)
- Added lookup capability in context that can be used in existing Script, ScriptFilter and Validator transforms. (CDAP-2303)
- Added an endpoint to get a count of active queries:
/v3/namespaces/<namespace-id>/data/explore/queries/count
. (CDAP-3514) - Added experimental support for running ETL Batch applications on Spark. Introduced an 'engine' setting in the configuration that defaults to
'mapreduce'
, but can be set to'spark'
. (CDAP-3857) - Added support to PartitionConsumer for concurrency, plus a limit and filter on read. (CDAP-3944)
- Added support for limiting the number of concurrent schedule runs. (CDAP-3945)
- Added Java-8 support for Script transforms. (CDAP-4016)
- Added RESTful APIs to start or stop multiple programs. (CDAP-4022)
- Added CLI commands to stop, start, restart, or get status of programs in an application. (CDAP-4023)
- Added support for ETL transforms written in Python. (CDAP-4043)
- Added a new Javascript transform that can emit records using an emitter. (CDAP-4128)
- Added the capability for MapReduce and Spark programs to localize additional resources during setup. (CDAP-4135)
- Added the ability to configure which artifact a Hydrator plugin should use. (CDAP-4228)
- Added DAGs to ETL pipelines, which will allow users to fork and merge. ETLConfig has been updated to allow representing a DAG. (CDAP-4230)
- Added AuthorizationPlugin, for pluggable authorization. (CDAP-4235)
- Added metadata support for stream views. (CDAP-4263)
- Added CLI support for metadata and lineage. (CDAP-4270)
- Added the ability to add metadata to artifacts. (CDAP-4280)
- Added RESTful APIs to set and get properties for an artifact. (CDAP-4289)
- Added support for automatically annotating CDAP entities with system metadata when they are created or updated. (CDAP-4264)
- Added an authorization plugin that uses a system dataset to manage ACLs. (CDAP-4285)
- Moved Hydrator plugins from the CDAP repository as cdap-etl-lib into its own repository. (CDAP-4403)
- Improved Metadata Indexing and Search to support searches on words in value and tags. (CDAP-4591)
- Schema fields are stored as Metadata and are searchable. (CDAP-4592)
- Added capability in CDAP UI to display system tags. (CDAP-4658)
Improvements
- Table datasets, and any other dataset that implements
RecordWritable<StructuredRecord>
, can now be written to using Hive. (CDAP-3079) - The CDAP Router now has a configurable timeout for idle connections, with a default timeout of 15 seconds. (CDAP-3887)
- A new property master.collect.containers.log has been added to cdap-site.xml, which determines if container logs are streamed back to the cdap-master process log. (This has always been the default behavior). For MapR installations, this must be turned off (set to false). (CDAP-4045)
- Added ability to retrieve the live-info for the AppFabric system service. (CDAP-4133)
- Added a method to
ObjectMappedTable
andObjectStore
to retrieve a specific number of splits between a start and end keys. (CDAP-4209) - Messages logged by Hydrator are now prefixed with the name of the stage that logged them. (CDAP-4233)
- Added support for CDH5.5 (CDAP-4301)
- Upgraded netty-http dependency in CDAP to 0.14.0. (CDAP-4392)
- Make
xmllint
dependency optional and allow setting variables to skip configuration file parsing. (CDAP-4444) - Added a schema validation -- for sources, transforms, and sinks -- that will validate the pipeline stages schema during deployment, and report any issues. (CDAP-4453)
- CDAP Master service will now log important configuration settings on startup. (CDAP-4518)
- Added the config setting
master.startup.checks.enabled
to control whether CDAP Master startup checks are run or not. (CDAP-4523) - Improved the installation experience by adding to the CDAP Master service checks of pre-requisites such as file system permissions, availability of components such as YARN and HBase, resource availability during startup, and to error out if any of the pre-requisites fail. (CDAP-4536)
- Added a config setting 'master.collect.app.containers.log' that can be set to 'false' to disable streaming of application logs back to the CDAP Master log. (CDAP-4548)
- Added an error message when a required field is not provided when configuring Hydrator pipeline. (CDAP-4598)
Bug Fixes
- Prefix start script functions with
'cdap'
to prevent namespace collisions. (CDAP-1174) - Added a check to cause a DB (source or sink) pipeline to fail during deployment if the table (source or sink) was not found, or if an incorrect connection string was provided. (CDAP-2470)
- Fixed a bug where the TTL for datasets was incorrect; it was reduced by (a factor of 1000) after an upgrade. After running the upgrade tool, please make sure the TTL values of tables are as expected. (CDAP-3345)
- Fixed an issue where the failure of a program running in a workflow fork node was causing other programs in the same fork node to remain in the RUNNING state, even after the Workflow was completed. (CDAP-3542)
- Fixed test failures in the PurchaseHistory, StreamConversion, and WikipediaPipeline example apps included in the CDAP SDK. (CDAP-3694)
- Fixed a bug where certain MapReduce metrics were not being properly emitted when using multiple outputs. (CDAP-3742)
- Fixed a problem with DBSink column names not being used to filter input record fields before writing to a DBSink. (CDAP-3761)
- Added a fix for case sensitivity handling in DBSink. (CDAP-3807)
- Fixed an issue where the regex filter for S3 Batch Source wasn't getting applied correctly. (CDAP-3815)
- Fixed an issue about stopping all dependent services when a service is stopped. (CDAP-3861)
- Fixed a bug when querying for logs of deleted program runs. (CDAP-3900)
- Fixed a problem with dataset performance degradation because of making multiple remote calls for each "get dataset" request. (CDAP-3902)
- Fixed QueryClient to work against HTTPS. (CDAP-3924)
- Fixed an issue where a stream that has a view could not be deleted cleanly. (CDAP-4000)
- Fixed an issue where socket connections to the TransactionManager were not being closed. (CDAP-4067)
- Fixes an issue that causes worker threads to go into an infinite recursion while exceptions are being thrown in channel handlers. (CDAP-4092)
- Fixed a bug that prevented applications from using HBase directly. (CDAP-4112)
- Fixed a problem where when CDAP Master switched from active to standby, the programs that were running were marked as failed. (CDAP-4119)
- Fixed a problem in the CLI command used to load an artifact, where the wrong artifact name and version was used if the artifact name ends with a number. (CDAP-4240)
- Fixed a problem where plugins from another namespace were visible when creating an application using a system artifact. (CDAP-4294)
- Fixed a problem with the CLI attempting to connect to CDAP when the hostname and port were incorrect. (CDAP-4316)
- Improved error message when stream views were not found. (CDAP-4366)
- Fixed an issue where tags search were failing for certain tags. (CDAP-4393)
- Fixed node.js version checking for the
cdap.sh
script in the CDAP SDK. (CDAP-4141) - Fixed a problem that prevented MapReduce jobs from being run when the Resource Manager switches from active to standby in a Kerberos-enabled HA cluster. (CDAP-4373)
- Fixed an issue that prevents streams from being read in HA HDFS mode. (CDAP-4384)
- Fixed init scripts to print service status when stopped. (CDAP-4526)
- Added configuration 'router.bypass.auth.regex' to exempt certain URLs from authentication. (CDAP-4534)
- Fixed a problem in the init scripts that forced
cdap-kafka-server
,cdap-router
, andcdap-auth-server
to have the Hive client installed. (CDAP-4539) - Fixed an issue where the logs and history list on a Hydrator pipeline view was not updating on new runs. (CDAP-4678)
Deprecated and Removed Features
- See the CDAP 3.3.0 Javadocs for a list of deprecated and removed APIs.
- Removed a deprecated endpoint to retrieve the status of a currently running node in a workflow. (CDAP-2481)
- Removed the deprecated builder-style Flow API. (CDAP-2943)
- Deprecated createDataSchedule and createTimeSchedule methods in Schedules class and removed deprecated Schedule constructor. (CDAP-4217)
- Deprecated the Script transform. (CDAP-4128)
- Removed deprecated fluent style API for Flow configuration. The only supported API is now the configurer style. (CDAP-4251)