-
Notifications
You must be signed in to change notification settings - Fork 8
Introduction
Cask DRE is a sophisticated if-then-else statement interpreter that runs natively on big data system like Spark and Hadoop. It provides an alternative computational model for transforming your data while empowering the business users to specify and manage the transformations and policy enforcements.
The standard imperative model that consists of common sequence of commands requires development team to be constantly involved in the process for transforming the data, instead, DRE provides a easy-to-understand Business Readable DSL that mimics the power of transformation through Production Rule System and expressiveness complex data transformations.
Business rules (as show below), each of which specifies a condition that has to be met before the series of actions will be taken on the data.
rule <rule-name> {
description "Business text for when and how the rule is applied"
when(<condition-that-has-to-be-met>)
then {
<action-to-be-taken-1>;
<action-to-be-taken-2>;
<action-to-be-taken-3>;
...
<action-to-be-taken-n>;
}
}
As an example, the business user would like to mask field SSN every-time and anywhere it's present in their entire data repository -- so they would specify a rule as follows:
rule mask-ssn {
description "Masks all the digits, except the last 4 digits of SSN"
when(present(ssn))
then {
mask-number ssn xxx-xx-####;
}
}
It's as simple as that!
DRE is built as a CDAP Application and a plugin. It enables Rules management through a CDAP service.
rule "remove-fare-less-than-8.06" {
description "Send to error fares that are less that 8.06"
when (fare < 8.06) then {
send-to-error true;
}
}
Following is a simple example of a rule that can be defined within this implementation of Rule Engine.
/**
* This is rule book for normalizing the processing of titanic file.
* The rules are applied using an inference engine with forward chaining
* Rule firing defines the ordering of how the rules are applied to
* the input record.
*/
rulebook "Titanic Feed Normalization" {
version 1
meta {
description "This rule book applies transformation on the titanic feed."
source "titanic-rules.xslx"
user "joltie"
}
rule "remove-first-line" {
description "Removes first line when offset is zero"
when(present(offset) && offset == 0) then {
filter-row-if-true true;
}
}
rule "parse-as-csv" {
description "Parses body"
when(present(body)) then {
parse-as-csv body ',' false;
drop body;
set columns offset,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked;
cleanse-column-names;
}
}
rule "rename-sex-to-gender" {
description "Rename sex field to gender"
when(present(sex)) then {
rename sex gender;
}
}
rule "single-character-gender" {
description "Converts gender to single character"
when(present(gender) && gender.length() > 1) then {
cut-character gender gender 1-1;
uppercase gender;
}
}
rule "missing-age" {
description "If age is missing, send it to error"
when (!present(age)) then {
send-to-error true;
}
}
rule "remove-fare-less-than-8.06" {
description "Send to error fares that are less that 8.06"
when (fare < 8.06) then {
send-to-error true;
}
}
}
Cask DRE: A business rule engine for your big data needs.