layout | title | spec |
---|---|---|
page |
bandicoot - specification |
true |
Table of contents:
## Keywords
Here are the keywords which are currently in use:
extend | fn | int | join | long | minus |
project | real | rename | return | select | string |
summary | time | type | union | var | void |
The time keyword has no meaning at the moment and is reserved for the future release.
A program is defined in a single source file. The file is evaluated from top to bottom in one pass (similar to the C language). The top-level elements of the program can be of the following types:
- relational type declarations
- relational variable declarations
- function declarations
The convention for Bandicoot source file extension is .b
.
Primitive types are scalar types and are used for attributes within relations, as well as input parameters for functions. There are four types available:
Type | Size | Description |
---|---|---|
int | 32-bit | signed integer |
long | 64-bit | signed integer |
real | 64-bit | IEEE 754 double precision |
string | 0-1024 bytes | UTF-8 encoded string |
The primitive types are referenced within this specification as PrimitiveType.
Bandicoot is a strongly-typed language and converting a primitive expression of a given type into another type must be explicit. The current version of Bandicoot supports only conversion from one numeric type to another. There is no support for conversion between strings and numbers. The following syntax forms are supported:
(int PrimitiveExpr) (real PrimitiveExpr) (long PrimitiveExpr)
Here is the regular expression defining an identifier: [_a-zA-Z0-9]+
. Maximum
identifier length is 32 characters. Below you will find the following
references to the identifiers:
- TypeName
- AttrName
- VarName
- ParamName
- FuncName
## Relational Types
There are two ways to declare a relational type: named and inline. Named declarations give an identifier to some particular type so that it can be referenced in the code later. Inline (or anonymous) declarations are useful when the type is used only once (e.g. as an input or output function parameter).
Named type can be declared in the following way:
type TypeName { AttrName PrimitiveType [,] [more attributes] }
and inline type:
{ AttrName PrimitiveType [,] [more attributes] }
The relational types (both inline and named) are referenced within this specification as RelType.
## Relational Variables
Relational variables are used for keeping the program state. The system provides two types of variables:
- global variables
- local (temporary) variables
Here is how you can declare a global variable named VarName.
var VarName RelType ;
The relational variables are referenced within this specification as RelVar.
Functions are identified by names which must be unique across the whole program source file. A function can make complex state transformations on top of the global variables (see Transactions section).
fn FuncName ( FuncArgs ) FuncReturn { FuncBody }
FuncArgs can be contain only one relational argument and and several arguments of a primitive type all separated with the commas. Each argument has the following structure:
ArgName "RelType | PrimitiveType"
The FuncReturn defines the result type of a function. It can either be a relational type or no result at all, identified by keyword void:
RelType | void
Function body (FuncBody) is a list of statements evaluated from top to bottom. The list is separated with the semicolons (";"). Statements can be of three types:
- global variable assignment
VarName = RelExpr ;
- temporary variable declaration and assignment
var VarName = RelExpr ;
- return statement (only if a function declares its output type)
return RelExpr ;
A function cannot call another function. Also, only one assignment per global relational variable is possible within a function body. After the assignment the global variable cannot be accessed anymore (within the same function). This is a temporary limitation and you can workaround it with the help of temporary variables.
## Relational Operators and Expressions
Bandicoot implements 8 relational operators which provide rich data manipulative features. Some of the operators are binary (take 2 relations as input) and some are unary (take 1 relation as input). Apart from the relational inputs these operators usually take additional argument specific to the operator. Every operator returns a new relation and does not modify the inputs. The language provides these operators as functions with arguments:
OperatorName (arg1) (arg2) ... (argN)
The brackets around the arguments are mandatory only if the argument is an operator with at least one argument.
Every relational variable (global or local) is an operator as well and returns the value of the variable. The operators are the main building blocks in the language. Complex relational expressions (RelExpr) can be created by nesting the relational operators to compute the desired results.
rename ToAttrName = FromAttrName [,] [more attributes] RelExpr
This operator creates a new relation with the specified attributes being renames, the relational body (tuples) does not change.
### Project
project AttrName [,] [more attributes] RelExpr
The result contains only the attributes defined as the first argument. It can have reduced number of tuples due to removal of duplicate values.
### Extend
extend AttrName = PrimitiveExpr [,] [more attributes] RelExpr
The operator adds the attributes defined as the first argument to each tuple of the input relation. The values are computed by primitive expressions.
select BooleanExpr RelExpr
The result contains only those tuples of the input relation which match the boolean expression defined as the first argument.
union RelExpr RelExpr
or
RelExpr + RelExpr
The union operator creates a new relation consisting of two input relations removing duplicate tuples. Both inputs need to have the same attributes.
minus RelExpr RelExpr
or
RelExpr - RelExpr
Removes tuples from the first input which match tuples in the second input. The matching logic is an equality on the common attributes.
### Natural Join
join RelExpr RelExpr
or
RelExpr * RelExpr
This operator creates a result where the tuples are combinations of matching tuples from both input relations. The matching logic is an equality on the common attributes. If there are no common attributes the result is a cartesian join (i.e. every tuple from the first input matches every tuple in the second input). All the attributes from the input relations are present in the result.
summary AttrName = SumFunc [,] [more attributes] RelExpr
#### Binary version
summary AttrName = SumFunc [,] [more attributes] RelExpr RelExpr
Both unary and binary versions of summary operator produce tuples containing summary data grouped according to the specified attributes. In case of the unary version, the grouping is done by a virtual relation with zero attributes and therefore the result contains up to one tuple. The binary version creates the groups according to all the attributes of the second relation.
Result type is expressed as an extension of the empty relation (unary summary) or rightmost relation (binary summary). Each attribute can be of a specified summary function (SumFunc). Here is a list of currently supported functions:
- add - sum up the values of an attribute
(add AttrName DefVal)
- avg - average of values of an attribute
(avg AttrName DefVal)
- cnt - count the number of tuples
(cnt)
- max - maximum value of an attribute
(max AttrName DefVal)
- min - minimum value of an attribute
(min AttrName DefVal)
Where DefVal is a constant expression. The type of the expression should match the type of the result and attribute. The exception is the avg function where the default value and result are always real numbers. DefVal is used in those cases when the RelExpr body is empty. In case of the binary summary this can happen when there is no matching tuple in left RelExpr for a tuple in the right RelExpr.
# Transactions
Each invocation of a function implicitly creates a transaction. All the statements within a function are part of the same transaction. There are no explicit keywords to commit or rollback a transaction. If there is an error the rollback is performed automatically and an error code is returned to the client.
Modification of a global variable is not allowed by two transactions at the same time. Therefore two functions which modify the same variable are serialized and executed one after the other. Read-only functions are executed in parallel with other read/write functions.
The level of isolation is always serializable and it means that if a read of the same variable occurs several times within a function it always returns the same data even if the variable is modified by a different function at the same time.
Bandicoot API is based on the HTTP/1.1 protocol. The interface exposes all the
functions defined in a program source file through
http://server:port/FuncName
URLs. The HTTP POST method must be used to invoke
a function with an input parameter. Otherwise the HTTP GET is required.
Both input and output parameters are exchanged in "comma separated
values" format. The tuples are delimited with the \n
end-of-line
character. The first line is a relational head definition in the following
format:
AttrName PrimitiveType [,][more attributes]
The comma or the end-of-line character can be escaped by using \
character.
It means the Bandicoot will not represent those characters and they will be
treated as part of your data.