Before starting, please review agogosml design
- Make sure to run bash (Linux/MacOS) or WSL
- Install azure-cli
- Python 3.7
- Optional: Terraform to provision Azure resources such as AKS and EventHub
- Docker
- If you are on Linux/MacOS, please install GCC, Make, CMake and other relevant Python C Extension building tools.
The generated folder structure consists of the input reader, customer app and output writer as well as the Azure DevOps pipelines for CI/CD.
When you generated the project above, we include starter code for your application in either Python or Scala. To generate the project with a Scala application, run agogosml generate --app-base mleap. The default --app-base is simple, which scaffolds a Python application to build off of.
These base applications implement lightweight HTTP services that accept POST requests from the messaging service, and send the data via POST request to the output messaging service. In this application, you would load a model, or do any desired transformation of the data. For instance, the Scala application loads in a sample Spark model using MLeap, and runs the incoming data through this model.
You can find an example application here.
As you customize the base application that your data will run through, you should continuously test. You can find documentation and scripts to build and test
this pipeline in the end to end testing folder, e2e/
. You will
build the scripts in dockerbuild.sh
and run a docker-compose file that spins up the pipeline, along with a test generator.
This pipeline can easily be deployed to Azure using the Terraform plans we provide. Please refer to our documentation. for details on what to create and how.
The Azure DevOps Pipleline as defined in the YAML
files ./agogosml/azure-pipelines.yaml
and
./agogosml_cli/azure-pipelines.yaml
depend upon a Build Pipeline Variable group that must contain the following
- container_registry - this is the container registry DNS name
- subscription_endpoint_name - this is a string that is defined in Azure DevOps Project Settings - Service Connecgtions.
See Azure DevOps Variable Groups
The CLI and Scaffolding tools (agogosml_cli) was developed to help the Data Engineer scaffold a project using agogosml and to generate sample code, dependencies and configuration files. agogosml_cli will provide commands to update the dependencies of the generated scaffold to the latest agogosml version to help the Data Engineer keep their project up to date.
agogosml command [OPTIONS]
The Data Engineer installs the agogosml_cli and runs agogosml init
to generate a manifest.json file. The data engineer will then modify the
manifest.json and add their configuration files. The data engineer runs
agogosml generate
to generate the agogosml project. The generated
scaffold will include the following files:
.env
- This file will be read by the Pipfile and contains an initial array of keys= for you to fill out.manifest.json
- This file is the configuration file for agogosml_cli.cicd-pipeline.yml
- This yaml file will contain the Azure DevOps ci/cd pipeline for an agogosml project.data-pipeline.yml
- This yaml file will contain the Azure DevOps data pipeline for an agogosml project.<YourApplicationName>/
- This folder is where you will develop your- custom application. Within it, we provide starter code, which contains a simple data
transformation app that demonstrates how to read from the InputReader and write to the
OutputWriter data pipeline components. We provide either a simple Python starter project,
or a Scala project that loads a ML model using mleap. Specify which base you want to use
by adding a flag
agogosml generate --app-base BASE
.
e2e/
- This a directory containing end to end integration tests for your data pipeline. Please refer to the README.md in this folder.deployment/helm_chart
- Helm charts for deployment to Kubernetes.deployment/terraform
- Terraform plans to scaffold the necessary elements of your project.
agogosml init [--force|-f] <folder>
agogosml init <folder>
will generate a manifest file that contains
all the configuration variables for an agogosml project. <folder>
is
the folder you would like to give use for your agogosml project.
agogosml generate
agogosml generate <folder>
agogosml generate [--config|-c]
agogosml generate [--config|-c] <folder>
alias: agogosml g
agogosml generate
will generate a scaffold of an agogosml project
based on a manifest file if found in the current or target folder or as
specified by --config
.
agogosml update
agogosml update <folder>
agogosml update
will update a scaffolded agogosml project. It will
update the agogosml dependencies to the latest version.