Releases: cdapio/cdap
Cask Data Application Platform v3.0.5
- Fixed a bug that prevents streams events that are already processed from being re-processed in flows (CDAP-3458)
Cask Data Application Platform v3.0.4
Cask Data Application Platform v3.1.1
This is a bug-fix release
Bugs Fixed
- CDAP-3259 - Removed a development script accidentally included in the 3.1.0 release.
- CDAP-3321 - Fixed a problem of being unable to enable SSL on the CDAP-UI.
- CDAP-3340 - Fixed a problem with the deployment of applications and the batch loading of events to a stream when using the CDAP CLI on Windows.
- CDAP-3362 - Fixed a problem of the logback-container.xml not being copied into the master services.
- CDAP-3377 - Fixed a problem in the CDAP-UI with shrinking the browser height when working with application templates.
- CDAP-3386 - Fixed a problem with Spark classes not being found when running a Spark program through a Workflow in Distributed mode on HDP 2.2.
- CDAP-3404 - Fixed an error in the installation documentation on enabling the CDAP Explore service.
- CDAP-3405 - Fixed a problem with the third step of the Getting Started example on cask.co/get-started.
- CDAP-3408 - Fixed a problem with starting the CDAP Explore service on CDH 5.2 and 5.3.
Cask Data Application Platform v3.1.0
New Features
MapR 4.1 Support, HDP 2.2 Support, CDH 5.4 Support
- CDAP-1614 -Added HBase 1.0 support.
- CDAP-2318 -Made CDAP work on the HDP 2.2 distribution.
- CDAP-2786 -Added support to CDAP 3.1.0 for the MapR 4.1 distro.
- CDAP-2798 -Added Hive 0.14 support.
- CDAP-2801 -Added CDH 5.4 Hive 1.1 support.
- CDAP-2836 -Added support for restart of specific CDAP System
Services Instances. - CDAP-2853 -Completed certification process for MapR on CDAP.
- CDAP-2879 -Added Hive 1.0 in Standalone.
- CDAP-2881 -Added support for HDP 2.2.x.
- CDAP-2891 -Documented cdap-env.sh and settings OPTS for HDP 2.2.
- CDAP-2898 -Added Hive 1.1 in Standalone.
- CDAP-2953 -Added HiveServer2 support in a secure cluster.
Spark
- CDAP-344 -Users can now run Spark in distributed mode.
- CDAP-1993 -Added ability to manipulate the SparkConf.
- CDAP-2700 -Added the ability to Spark programs of discovering CDAP
services in distributed mode. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2703 -Users are able to collect/view logs from Spark programs
in distributed mode. - CDAP-2705 -Added examples, guides and documentation for Spark in
distributed mode. LogAnalysis application demonstrating parallel
execution of the Spark and MapReduce programs using Workflows. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2936 -Spark program can now specify resources usage for
driver and executor process in distributed mode.
Workflows
- CDAP-1983 -Added example application for processing and analyzing
Wikipedia data using Workflows. - CDAP-2709 -Added ability to add generic keys to the WorkflowToken.
- CDAP-2712 -Added ability to update the WorkflowToken in MapReduce
and Spark programs. - CDAP-2713 -Added ability to persist the WorkflowToken per run of
the Workflow. - CDAP-2714 -Added ability to query the WorkflowToken for the past
as well as currently running Workflow runs. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2894 -Added an API to retreive the system properties (e.g.
MapReduce counters in case of MapReduce program) from
the WorkflowToken. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2982 -Added verification that the Workflow contains all
programs/custom actions with a unique name.
Datasets
- CDAP-347 -User can use datasets in beforeSubmit and afterFinish.
- CDAP-585 -Changes to Spark program runner to use File dataset
in Spark. Spark programs can now use file-based datasets. - CDAP-2734 -Added PartitionedFileSet support to setting/getting
properties at the Partition level. - CDAP-2746 -PartitionedFileSets now record the creation time of
each partition in the metadata. - CDAP-2747 -PartitionedFileSets now index the creation time of
partitions to allow selection of partitions that were created after
a given time. Introduced BatchPartitionConsumer as a way to
incrementally consume new data in a PartitionedFileSet. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2758 -FileSet now support existing HDFS locations.
Treat base paths that start with “/” as absolute in the file system.
An absolute base path for a (Partitioned)FileSet was interpreted as
relative to the namespace’s data directory. Newly created FileSets
interpret absolute base paths as absolute in the file system.
Introduced a new property for (Partitioned)FileSets name
“data.external”. If true, the base path of the FileSet is assumed to
be managed by some external process. That is, the FileSet will not
attempt to create the directory, it will not delete any files when
the FileSet is dropped or truncated, and it will not allow adding or
deleting files or partitions. In other words, the FileSet
is read-only.
- CDAP-2784 -Added support to write to PartitionedFileSet Partition
metadata from MapReduce. - CDAP-2822 -IndexedTable now supports scans on the indexed field.
Metrics
- CDAP-2975 -Added pre-split FactTables.
- CDAP-2326 -Added better unit-test coverage for Cube dataset.
- CDAP-1853 -Metrics processor scaling no longer needs a master
services restart. - CDAP-2844 -MapReduce metrics collection no longer use counters,
and instead report directly to Kafka. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2466 -Added CLI for metrics search and query.
- CDAP-2236 -New CDAP UI switched over to using newer
search/query APIs. - CDAP-1998 -Removed deprecated Context - Query param in Metrics
v3 API.
Miscellaneous New Features
- CDAP-332 -Added a Restful end-point for deleting Streams.
- CDAP-1483 -QueueAdmin now uses Id.Namespace instead of
simply String. - CDAP-1584 -CDAP CLI now shows the username in the CLI prompt.
- CDAP-2139 -Removed a duplicate Table of Contents on the
Documentation Search page. - CDAP-2515 -Added a metrics client for search and query by tags.
- CDAP-2582 -Documented the licenses of the shipped
CDAP-UI components. - CDAP-2595 -Added data modelling of flows.
- CDAP-2596 -Added data modelling of MapReduce.
- CDAP-2617 -Added the capability to get logs for a given time range
from CLI. - CDAP-2618 -Simplified the Cube sink configurations.
- CDAP-2670 -Added Parquet sink with time partitioned file dataset.
- CDAP-2739 -Added S3 batch source for ETLbatch.
- CDAP-2802 -Stopped using HiveConf.ConfVars.defaultValue, to
support Hive >0.13. - CDAP-2847 -Added ability to add custom filters to FileBatchSource.
- CDAP-2893 -Custom Transform now parses log formats for ETL.
- CDAP-2913 -Provided installation method for EMR.
- CDAP-2915 -Added an SQS realtime plugin for ETL.
- CDAP-3022 -Added Cloudfront format option to LogParserTransform.
- CDAP-3032 -Documented TestConfiguration class usage in
unit-test framework.
Cask Data Application Platform v3.0.3
Bug fixes
- Fix Bower dependency error
(CDAP-3010)
Cask Data Application Platform v2.8.2
Bug fixes
- Fix Bower dependency error
(CDAP-3010)
Cask Data Application Platform v3.02
Cask Data Application Platform v2.8.1
Cask Data Application Platform v3.0.1
- In the CDAP UI, mandatory parameters for Application Template creation are marked with asterisks, and if a user tries to create a template without one of those parameters, the missing parameter is highlighted (CDAP-2499).
- Added a tool (HBaseQueueDebugger) that counts consumed and unconsumed entries in a flowlet queue (CDAP-2105).
- The currently executing node of a workflow is now highlighted in the CDAP UI (CDAP-2615).
- The list of datasets and the run histories in the CDAP UI are now paginated (CDAP-2626, CDAP-2627).
- Added improvements to the CDAP UI when creating Application Templates (CDAP-2601, CDAP-2602, CDAP-2603, CDAP-2605, CDAP-2606, CDAP-2607, CDAP-2610).
- Improved the error messages returned when there are problems creating Application Templates in the CDAP UI (CDAP-2597).
- Added the Apache Flume agent flume-ng to the CDAP SDK VM (CDAP-2612).
- Added the ability to copy and paste to the CDAP SDK VM (CDAP-2611).
- Pre-downloaded the example dependencies into the CDAP SDK VM to speed building of the CDAP examples (CDAP-2613).
- Fixed a problem with the HBase store and flows with multiple queues, where one queue name is a prefix of another queue name (CDAP-1996).
- Fixed a problem with namespaces with underscores in the name crashing the Hadoop HBase region servers (CDAP-2110).
- Removed the requirement to specify the JDBC driver class property twice in the adaptor configuration for Database Sources and Sinks (CDAP-2453).
- Fixed a problem in CDAP Distributed where the status of running program always returns as “STOPPED” when the CDAP Master is restarted (CDAP-2489).
- Fixed a problem with invalid RunRecords for Spark and MapReduce programs that are run as part of a Workflow (CDAP-2490).
- Fixed a problem with the CDAP Master not being HA (highly available) when a leadership change happens (CDAP-2495).
- Fixed a problem with upgrading of queues with the UpgradeTool (CDAP-2502).
- Fixed a problem with ObjectMappedTables not deleting missing fields when updating a row (CDAP-2523, CDAP-2524).
- Fixed a problem with a stream not being created properly when deploying an application after the default namespace was deleted (CDAP-2537).
- Fixed a problem with the Applicaton Template Kafka Source not using the persisted offset when the Adapter is restarted (CDAP-2547).
- A problem with CDAP using its own transaction snapshot codec, leading to huge snapshot files and OutOfMemory exceptions, and transaction snapshots that can’t be read using Tephra’s tools, has been resolved by replacing the codec with Tephra’s SnapshotCodecV3 (CDAP-2563, CDAP-2946, TEPHRA-101).
- Fixed a problem with CDAP Master not being resilient in the handling of Zookeeper exceptions (CDAP-2569).
- Fixed a problem with RunRecords not being cleaned up correctly after certain exceptions (CDAP-2584).
- Fixed a problem with the CDAP Maven archetype having an incorrect CDAP version in it (CDAP-2634).
- Fixed a problem with the description of the TwitterSource not describing its output (CDAP-2648).
- Fixed a problem with the Twitter Source not handling missing fields correctly and as a consequence producing tweets (with errors) that were then not stored on disk (CDAP-2653).
- Fixed a problem with the TwitterSource not calculating the time of tweet correctly (CDAP-2656).
- Fixed a problem with the JMS Real-time Source failing to load required plugin sources (CDAP-2661).
- Fixed a problem with executing Hive queries on a distributed CDAP due to a failure to load Grok classes (CDAP-2678).
- Fixed a problem with CDAP Program jars not being cleaned up from the temporary directory (CDAP-2698).
- Fixed a problem with ProjectionTransforms not handling input data fields with null values correctly (CDAP-2719).
- Fixed a problem with the CDAP SDK running out of memory when MapReduce jobs are run repeatedly (CDAP-2743).
- Fixed a problem with not using CDAP RunIDs in the in-memory version of the CDAP SDK (CDAP-2769).
- Fixed a problem with the CDAP CLI not printing an error if it is unable to connect to a CDAP instance (CDAP-2529).
- Fixed a problem with extra whitespace in commands entered into the CDAP CLI causing errors (CDAP-2538).
- Updated the messages displayed when starting the CDAP Standalone SDK as to components and the JVM required (CDAP-2445).
- Fixed a problem with the creation of the default namespace upon starting the CDAP SDK (CDAP-2587).
- Fixed a problem with using the default namespace on the CDAP SDK Virtual Machine Image (CDAP-2500).
- Fixed a problem with the VirtualBox VM retaining a MAC address obtained from the build host (CDAP-2640).
- Fixed a problem with incorrect flow metrics showing in the CDAP UI (CDAP-2494).
- Fixed a problem in the CDAP UI with the properties in the Projection Transform being displayed inconsistently (CDAP-2525).
- Fixed a problem in the CDAP UI not automatically updating the number of flowlet instances (CDAP-2534).
- Fixed a problem in the CDAP UI with a window resize preventing clicking of the Adapter Template drop down menu (CDAP-2573).
- Fixed a problem with the CDAP UI not performing validation of mandatory parameters before the creation of an adapter (CDAP-2575).
- Fixed a problem with an incorrect version of CDAP being shown in the CDAP UI (CDAP-2586).
- Reduced the number of clicks required to navigate and perform actions within the CDAP UI (CDAP-2622, CDAP-2625).
- Fixed a problem with an additional forward-slash character in the URL causing a “page not found error” in the CDAP UI (CDAP-2624).
- Fixed a problem with the error dropdown of the CDAP UI not scrolling when it has a large number of errors (CDAP-2633).
- Fixed a problem in the CDAP UI with the Twitter Source’s consumer key secret not being treated as a password field (CDAP-2649).
- Fixed a problem with the CDAP UI attempting to create an adapter without a name (CDAP-2652).
- Fixed a problem with the CDAP UI not being able to find the ETL plugin templates on distributed CDAP (CDAP-2655).
- Fixed a problem with the CDAP UI’s System Dashboard chart having a y-axis starting at “-200” (CDAP-2699).
- Fixed a problem with the rendering of stack trace logs in the CDAP UI (CDAP-2745).
- Fixed a problem with the CDAP UI not working with secure CDAP instances, either clusters or standalone (CDAP-2770).
- Fixed a problem with the coloring of completed runs of Workflow DAGs in the CDAP UI (CDAP-2781).
- Fixed errors with the documentation examples of the ETL Plugins (CDAP-2503).
- Documented the licenses of all shipped CDAP UI components (CDAP-2582).
- Corrected issues with the building of Javadocs used on the website and removed Javadocs previously included in the SDK (CDAP-2730).
- Added a recommended version (v.12.0) of Node.js to the documentation (CDAP-2762).
Cask Data Application Platform v2.6.3
Bug Fixes
- Replace use of CDAP transaction snapshot codecs with the Tephra versions instead (CDAP-2496).