An Apache Mesos Framework for processing messages using Kafka
Brigade is an Apache Mesos framework that monitors Kafka topics to perform processing in Docker. Brigade gets configured with a set of "processors". Each processor has an input topic, a docker to perform processing and an optional output topic where the processing results will be stored. Brigade is humorously named after the old time bucket brigades that were used to put out fires and move dirt. This project fills the need for a Mesos Native framework for simplistic message processing for more complex workflows Apache Storm and APache Spark will always be better candidates.
Author's Note: Please be patient with me. This is my first Mesos Framework and I am learning the particulars of it as I go.
A top level process (that is expected to be run in Marathon). It will run a Brigade Process Scheduler for each processor in the configuration. As the configuration changes the Brigade Meta Scheduler will adapt and start or stop each Brigade Process Scheduler as necessary.The Meta Scheduler is an optional component. You can instead just configure each of the process schedulers independently. Currently the Brigade General is controlled by a properties file. Inside the properties file there is a PROCESSOR_DIR property that can be set to either a file or a directory. It will check the direct0ry or file for changes every n seconds (not sure what the right setting should be). Brigade General does not cache and of the jobs buts reads them directly from the Marathon API.
The Process Scheduler is responsible for handling a single processor configuration. The process scheduler starts a Apache Kafka Consumer to monitor an input topic. For each input topic the scheduler attempts to schedule tasks in Apache Mesos. It uses the Netflix Fenzo library for task determination. The message and the processor configuration are sent as information in the task for the Brigade Executor to use. The scheduler receives a TASK_FINISHED status and retrieves the output from the DOcker processor (part of the Task Status). The scheduler has an Apache Kafka Producer that then places the output message in the "output" topic for the processor. Using this approach it can be very simple to string together a processing chain.
The processor is specified in JSON. The schema for it is:
{
"name" : "PROCESSOR NAME",
"input" : "INPUT TOPIC NAME",
"output": "OUTPUT TOPIC NAME",
"error" : "ERROR TOPIC NAME"
"docker": "DOCKER IMAGE NAME",
"mem" : "MEMORY TO USE",
"cpus" : "CPUS TO USE",
"volumes": [
{ "host-path" : "PATH ON HOST", "container-path" : "PATH ON CONTAINER", "mode" : "RO" }
],
"env" : ["FOO=BAR", "A=B"]
}
A sample configuration file is included. Note, long term plans are to use the Curator library to store the configuration of each processor in ZooKeeper
The Brigade Executor is a simple executor that runs the Docker image specified for a processor. The executor expects that the processor configuration is stored as a Label in the TaskInfo proto under the key "processor-configuration" and the input message from Kafka is also stored as a Label under the key "input-message". The Executor captures the standard output and returns it in the TaskStatus proto using the "Data" element.
The docker image that is used to process a message is expected to follow a 12 factor application approach. Most importantly the input message written to the standard input and the docker is expected to use that input and generate an output on Standard Out.
Current instructions. Apache Mesos and Apache Kafka must be running.
- Build the Executor - Build the executor and place it in either the same place for all the Mesos Agents ore place it on a commonly shared drive.
- For each Processor - Run the "Framework" class with the arguments processor name and the configuration JSON file
Gradle / Maven is not yet setup so until then please just get the necessary dependencies manually.
- fenzo-core-0.8.6.jar
- jackson-annotations-2.7.3.jar
- jackson-core-2.7.3.jar
- jackson-databind-2.7.3.jar
- json-20160212.jar
- kafka-clients-0.9.0.1-sources.jar
- kafka-clients-0.9.0.1.jar
- mesos-0.28.0.jar
- protobuf-java-2.6.1.jar
- slf4j-1.7.20.tar.gz
- slf4j-api-1.7.20.jar
- slf4j-jdk14-1.7.20.jar
- slf4j-simple-1.7.20.jar
- jersey-bundle-1.19.1.jar
LOTS TO DO
- Get Gradle or Maven working to pull in the dependencies
- Task Reconciliation
- Failure modes
- Explore more robust tracking model
- Calculate metrics
- Implement Health Checks
- Integrate FluentD
- Handle hard and soft constraints
- Complete docker options supported and update JSON Configuration
- Consider having the executor send the output message to Kafka
- Figure out how to run the executor as a docker
- Look at the docker remote API
- REST API to get and update configuration
- Implement Health Checks
- Integrate FluentD
- We need one!
- Configuration Interface (read / write)
- Metrics / Tracking Interface