Skip to content

Cask Data Application Platform v3.3.0

Compare
Choose a tag to compare
@awholegunch awholegunch released this 21 Jan 02:21
· 579 commits to release/3.3 since this release

New Features

  • Added on demand (dynamic) dataset instantiation through program runtime context. (CDAP-961)
  • Added lookup capability in context that can be used in existing Script, ScriptFilter and Validator transforms. (CDAP-2303)
  • Added an endpoint to get a count of active queries: /v3/namespaces/<namespace-id>/data/explore/queries/count. (CDAP-3514)
  • Added experimental support for running ETL Batch applications on Spark. Introduced an 'engine' setting in the configuration that defaults to 'mapreduce', but can be set to 'spark'. (CDAP-3857)
  • Added support to PartitionConsumer for concurrency, plus a limit and filter on read. (CDAP-3944)
  • Added support for limiting the number of concurrent schedule runs. (CDAP-3945)
  • Added Java-8 support for Script transforms. (CDAP-4016)
  • Added RESTful APIs to start or stop multiple programs. (CDAP-4022)
  • Added CLI commands to stop, start, restart, or get status of programs in an application. (CDAP-4023)
  • Added support for ETL transforms written in Python. (CDAP-4043)
  • Added a new Javascript transform that can emit records using an emitter. (CDAP-4128)
  • Added the capability for MapReduce and Spark programs to localize additional resources during setup. (CDAP-4135)
  • Added the ability to configure which artifact a Hydrator plugin should use. (CDAP-4228)
  • Added DAGs to ETL pipelines, which will allow users to fork and merge. ETLConfig has been updated to allow representing a DAG. (CDAP-4230)
  • Added AuthorizationPlugin, for pluggable authorization. (CDAP-4235)
  • Added metadata support for stream views. (CDAP-4263)
  • Added CLI support for metadata and lineage. (CDAP-4270)
  • Added the ability to add metadata to artifacts. (CDAP-4280)
  • Added RESTful APIs to set and get properties for an artifact. (CDAP-4289)
  • Added support for automatically annotating CDAP entities with system metadata when they are created or updated. (CDAP-4264)
  • Added an authorization plugin that uses a system dataset to manage ACLs. (CDAP-4285)
  • Moved Hydrator plugins from the CDAP repository as cdap-etl-lib into its own repository. (CDAP-4403)
  • Improved Metadata Indexing and Search to support searches on words in value and tags. (CDAP-4591)
  • Schema fields are stored as Metadata and are searchable. (CDAP-4592)
  • Added capability in CDAP UI to display system tags. (CDAP-4658)

Improvements

  • Table datasets, and any other dataset that implements RecordWritable<StructuredRecord>, can now be written to using Hive. (CDAP-3079)
  • The CDAP Router now has a configurable timeout for idle connections, with a default timeout of 15 seconds. (CDAP-3887)
  • A new property master.collect.containers.log has been added to cdap-site.xml, which determines if container logs are streamed back to the cdap-master process log. (This has always been the default behavior). For MapR installations, this must be turned off (set to false). (CDAP-4045)
  • Added ability to retrieve the live-info for the AppFabric system service. (CDAP-4133)
  • Added a method to ObjectMappedTable and ObjectStore to retrieve a specific number of splits between a start and end keys. (CDAP-4209)
  • Messages logged by Hydrator are now prefixed with the name of the stage that logged them. (CDAP-4233)
  • Added support for CDH5.5 (CDAP-4301)
  • Upgraded netty-http dependency in CDAP to 0.14.0. (CDAP-4392)
  • Make xmllint dependency optional and allow setting variables to skip configuration file parsing. (CDAP-4444)
  • Added a schema validation -- for sources, transforms, and sinks -- that will validate the pipeline stages schema during deployment, and report any issues. (CDAP-4453)
  • CDAP Master service will now log important configuration settings on startup. (CDAP-4518)
  • Added the config setting master.startup.checks.enabled to control whether CDAP Master startup checks are run or not. (CDAP-4523)
  • Improved the installation experience by adding to the CDAP Master service checks of pre-requisites such as file system permissions, availability of components such as YARN and HBase, resource availability during startup, and to error out if any of the pre-requisites fail. (CDAP-4536)
  • Added a config setting 'master.collect.app.containers.log' that can be set to 'false' to disable streaming of application logs back to the CDAP Master log. (CDAP-4548)
  • Added an error message when a required field is not provided when configuring Hydrator pipeline. (CDAP-4598)

Bug Fixes

  • Prefix start script functions with 'cdap' to prevent namespace collisions. (CDAP-1174)
  • Added a check to cause a DB (source or sink) pipeline to fail during deployment if the table (source or sink) was not found, or if an incorrect connection string was provided. (CDAP-2470)
  • Fixed a bug where the TTL for datasets was incorrect; it was reduced by (a factor of 1000) after an upgrade. After running the upgrade tool, please make sure the TTL values of tables are as expected. (CDAP-3345)
  • Fixed an issue where the failure of a program running in a workflow fork node was causing other programs in the same fork node to remain in the RUNNING state, even after the Workflow was completed. (CDAP-3542)
  • Fixed test failures in the PurchaseHistory, StreamConversion, and WikipediaPipeline example apps included in the CDAP SDK. (CDAP-3694)
  • Fixed a bug where certain MapReduce metrics were not being properly emitted when using multiple outputs. (CDAP-3742)
  • Fixed a problem with DBSink column names not being used to filter input record fields before writing to a DBSink. (CDAP-3761)
  • Added a fix for case sensitivity handling in DBSink. (CDAP-3807)
  • Fixed an issue where the regex filter for S3 Batch Source wasn't getting applied correctly. (CDAP-3815)
  • Fixed an issue about stopping all dependent services when a service is stopped. (CDAP-3861)
  • Fixed a bug when querying for logs of deleted program runs. (CDAP-3900)
  • Fixed a problem with dataset performance degradation because of making multiple remote calls for each "get dataset" request. (CDAP-3902)
  • Fixed QueryClient to work against HTTPS. (CDAP-3924)
  • Fixed an issue where a stream that has a view could not be deleted cleanly. (CDAP-4000)
  • Fixed an issue where socket connections to the TransactionManager were not being closed. (CDAP-4067)
  • Fixes an issue that causes worker threads to go into an infinite recursion while exceptions are being thrown in channel handlers. (CDAP-4092)
  • Fixed a bug that prevented applications from using HBase directly. (CDAP-4112)
  • Fixed a problem where when CDAP Master switched from active to standby, the programs that were running were marked as failed. (CDAP-4119)
  • Fixed a problem in the CLI command used to load an artifact, where the wrong artifact name and version was used if the artifact name ends with a number. (CDAP-4240)
  • Fixed a problem where plugins from another namespace were visible when creating an application using a system artifact. (CDAP-4294)
  • Fixed a problem with the CLI attempting to connect to CDAP when the hostname and port were incorrect. (CDAP-4316)
  • Improved error message when stream views were not found. (CDAP-4366)
  • Fixed an issue where tags search were failing for certain tags. (CDAP-4393)
  • Fixed node.js version checking for the cdap.sh script in the CDAP SDK. (CDAP-4141)
  • Fixed a problem that prevented MapReduce jobs from being run when the Resource Manager switches from active to standby in a Kerberos-enabled HA cluster. (CDAP-4373)
  • Fixed an issue that prevents streams from being read in HA HDFS mode. (CDAP-4384)
  • Fixed init scripts to print service status when stopped. (CDAP-4526)
  • Added configuration 'router.bypass.auth.regex' to exempt certain URLs from authentication. (CDAP-4534)
  • Fixed a problem in the init scripts that forced cdap-kafka-server, cdap-router, and cdap-auth-server to have the Hive client installed. (CDAP-4539)
  • Fixed an issue where the logs and history list on a Hydrator pipeline view was not updating on new runs. (CDAP-4678)

Deprecated and Removed Features

  • See the CDAP 3.3.0 Javadocs for a list of deprecated and removed APIs.
  • Removed a deprecated endpoint to retrieve the status of a currently running node in a workflow. (CDAP-2481)
  • Removed the deprecated builder-style Flow API. (CDAP-2943)
  • Deprecated createDataSchedule and createTimeSchedule methods in Schedules class and removed deprecated Schedule constructor. (CDAP-4217)
  • Deprecated the Script transform. (CDAP-4128)
  • Removed deprecated fluent style API for Flow configuration. The only supported API is now the configurer style. (CDAP-4251)