Starting from the 0.2.0 version Pipeline supports managed Kubernetes clusters on Azure AKS as well.

For simplicity the instruction steps are presented through an example specifically how to hook a Spark application into a CI/CD workflow to run it on managed Kubernetes AKS/Azure.

Getting Started

It's assumed that the source of the Spark application is stored in GitHub.

The Pipeline Control Plane takes care of creating a Kubernetes cluster on the desired cloud provider and executing the steps of the CI/CD flow can be hosted on both AWS and Azure. See details below for how to launch Pipeline Control Plane on AWS and Azure.

General prerequisites

Account on GitHub
Repository on GitHub for the Spark application source code

Hosting the control plane on AWS

Hosting Pipeline Control Plane and creating Kubernetes clusters on AWS

AWS account
AWS EC2 key pair

Hosting the control plane on Azure

Hosting Pipeline Control Plane and creating Kubernetes clusters on Azure

Azure subscription with AKS service enabled.
Obtain a Client Id, Client Secret and Tenant Id for a Microsoft Azure Active Directory. These information can be retrieved from the portal, but the easiest and fastest way is to use the Azure CLI tool.

$ curl -L https://aka.ms/InstallAzureCli | bash
$ exec -l $SHELL
$ az login

You should get something like:

{

  "appId": "1234567-1234-1234-1234-1234567890ab",
  "displayName": "azure-cli-2017-08-18-19-25-59",
  "name": "http://azure-cli-2017-08-18-19-25-59",
  "password": "1234567-1234-1234-be18-1234567890ab",
  "tenant": "7654321-1234-1234-ee18-9876543210ab"
}

appId is the Azure Client Id
password is the Azure Client Secret
tenant is the Azure Tenant Id

In order to get Azure Subscription Id run:

az account show --query id

Register the OAuth application on GitHub

Register an OAuth application on GitHub for the Pipeline CI/CD workflow.

Fill in Authorization callback URL with some dummy value at this stage. This field will be updated once the Control Plane is up and running using the IP address or the DNS name.

Take note of the Client ID and Client Secret as these will be required for launching the Pipeline Control Plane.

Launch Pipeline Control Plane on `AWS`

The easiest way for running a Pipeline Control Plane is to use a Cloudformation template.

Navigate to: https://eu-west-1.console.aws.amazon.com/cloudformation/home?region=eu-west-1#/stacks/new
Select Specify an Amazon S3 template URL and add the URL to our template https://s3-eu-west-1.amazonaws.com/cf-templates-grr4ysncvcdl-eu-west-1/2018026em9-new.templatee93ate9mob7
Fill in the following fields on the form:
- Stack name
  - specify a name for the Control Plane deployment
- AWS Credentials
  - Amazon access key id - specify your access key id
  - Amazon secret access key - specify your secret access key
- Azure Credentials and Information - needed only for creating Kubernetes clusters on Azure
  - AzureClientId - see how to get Azure Client Id above
  - AzureClientSecret - see how to get Azure Client Secret above
  - AzureSubscriptionId - your Azure Subscription Id
  - AzureTenantId - see how to get Azure Client Tenant Id above
- Control Plane Instance Config
  - InstanceName - name of the EC2 instance that will host the Control Plane
  - ImageId - pick the image id from the README
  - KeyName - specify your AWS EC2 key pair
- Banzai Pipeline Credentials
  - Pipeline API Password - specify the password for accessing the Pipeline REST API exposed by the Pipeline PaaS. Take note of the user name and password as those will be required when setting the secrets for the GitHub repositories in the CI/CD workflow.
- Banzai-Ci Credentials
  - Orgs - comma-separated list of Github organizations whose members to grant access to use Banzai Cloud Pipeline's CI/CD workflow
  - Github Client - GitHub OAuth Client Id
  - Github Secret - Github OAuth Client Secret
- Grafana Dashboard
  - Grafana Dashboard Password - specify password for accessing Grafana dashboard with defaults specific to the application
- Prometheus Dashboard
  - Prometheus Password - specify password for accessing Prometheus that collects cluster metrics
- Advanced Pipeline Options
  - PipelineImageTag - specify 0.2.0 for using current stable Pipeline release.
- Slack Credentials
  - this section is optional. Complete this section to receive cluster related alerts through a Slack push notification channel.
- Alert SMTP Credentials
  - this section is optional. Fill this section to receive cluster related alerts through email.
Finish the wizard to create a Control Plane instance.
Take note of the PublicIP of the created Stack. We refer to this as the PublicIP of Control Plane.
Go back to the earlier created GitHub OAuth application and modify it. Set the Authorization callback URL field to http://{control_plane_public_ip}/authorize

Launch Pipeline Control Plane on `Azure`

The easiest way for running a Pipeline Control Plane is deploying it using an ARM template.

Navigate to: https://portal.azure.com/#create/Microsoft.Template
Click Build your own template in editor and copy-paste the content of ARM template into the editor then click Save
- Resource group - We recommend creating a new Resource Group for the deployment as later will be easier to clean up all the resources created by the deployment
- Specify SSH Public Key
- SMTP Server Address/User/Password/From
  - these are optional. Fill this section to receive cluster related alerts through email.
- Slack Webhook Url/Channel
  - this section is optional. Complete this section to receive cluster related alerts through a Slack push notification channel.
- Banzai Pipeline Credentials
  - Pipeline Password - specify the password for accessing the Pipeline REST API exposed by the Pipeline PaaS. Take note of the user name and password as those will be required when setting the secrets for the GitHub repositories in the CI/CD workflow.
- Prometheus Dashboard
  - Prometheus Password - specify password for accessing Prometheus that collects cluster metrics
- Grafana Dashboard
  - Grafana Dashboard Password - specify password for accessing Grafana dashboard with defaults specific to the application
- Banzai-Ci Credentials
  - Orgs - comma-separated list of Github organizations whose members to grant access to use Banzai Cloud Pipeline's CI/CD workflow
  - Github Client - GitHub OAuth Client Id
  - Github Secret - Github OAuth Client Secret
- Azure Credentials and Information
  - Azure Client Id - see how to get Azure Client Id above
  - Azure Client Secret - see how to get Azure Client Secret above
  - Azure Subscription Id - your Azure Subscription Id
  - Azure Tenant Id - see how to get Azure Tenant Id above
- Finish the wizard to create a Control Plane instance.
- Open the Resource Group that was specified for the deployment
- Take note of the PublicIP of the deployed Control Plane.

Define `.pipeline.yml` pipeline workflow configuration for your Spark application

The steps of the workflow executed by the CI/CD flow are described in the .pipeline.yml file that must be placed under the root directory of the source code of the Spark application. The file has to be pushed into the GitHub repo along with the source files of the application.

There is an example Spark application spark-pi-example that can be used for trying out the CI/CD pipeline.

Note: Fork this repository into your own repository for this purpose!).

For setting up your own spark application for the workflow you can start from the .pipeline.yml configuration file from spark-pi-example and customize it.

The following sections needs to be modified:

the command for building your application

remote_build:
  ...
  original_commands:
    - mvn clean package -s settings.xml

the Main class of your application

run:
  ...
  spark_class: banzaicloud.SparkPi

the name of your application
```
run:
  ...
  spark_app_name: sparkpi
```
the application artifact

This is the relative path to the jar of your Spark application. This is the jar generated by the build command
```
run:
  ...
  spark_app_source: target/spark-pi-1.0-SNAPSHOT.jar
```
the application arguments

run:
  ...
  spark_app_args: 1000

Grant access to desired GitHub organizations

Navigate to http://{control_plane_public_ip} in your web browser and grant access for the organizations that contain the GitHub repositories that you want to hook into the CI/CD workflow. Then click authorize access.

All the services of the Pipeline may take some time to fully initialize, thus the page may not load at first. Please give it some time and retry.

Hook repositories to CI/CD flow

Navigate to http://{control_plane_public_ip} - this will bring you to the CI/CD user interface. Select Repositories from top left menu. This will list all the repositories that the Pipeline has access to. Select repositories desired to be hooked to the CI/CD flow.

CI/CD secrets

For the hooked repositories set the following secrets :

plugin_endpoint - specify http://{control_plane_public_ip}/pipeline/api/v1
plugin_username - specify the same user name as for Banzai Pipeline Credentials
plugin_password - specify the same password as for Banzai Pipeline Credentials

Submit your changes

Modify the source code of your Spark application, commit the changes and push it to the repository on GitHub. The Pipeline gets notified through GitHub webhooks about the commits and will trigger the flow described in the .pipeline.yml file of the watched repositories.

Monitor running workflows

The running CI/CD jobs can be monitored and managed at http://{control_plane_public_ip}/account/repos

In order to check the logs of the CI/CD workflow steps, click on the desired commit message on the UI.

Once configured the Spark application will be built, deployed and executed for every commit pushed to the project's repository. The progress of the workflow can be followed by clicking on the small orange dot beside the commit on the GitHub UI.

Our git repos with example projects that contain pipeline workflow configurations:

Spark PDI Example
Zeppelin PDI Example
Spark Pi Example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline-howto.md

pipeline-howto.md

Getting Started

General prerequisites

Hosting the control plane on AWS

Hosting the control plane on Azure

Register the OAuth application on GitHub

Launch Pipeline Control Plane on `AWS`

Launch Pipeline Control Plane on `Azure`

Define `.pipeline.yml` pipeline workflow configuration for your Spark application

Grant access to desired GitHub organizations

Hook repositories to CI/CD flow

CI/CD secrets

Submit your changes

Monitor running workflows

Files

pipeline-howto.md

Latest commit

History

pipeline-howto.md

File metadata and controls

Getting Started

General prerequisites

Hosting the control plane on AWS

Hosting the control plane on Azure

Register the OAuth application on GitHub

Launch Pipeline Control Plane on AWS

Launch Pipeline Control Plane on Azure

Define .pipeline.yml pipeline workflow configuration for your Spark application

Grant access to desired GitHub organizations

Hook repositories to CI/CD flow

CI/CD secrets

Submit your changes

Monitor running workflows

Launch Pipeline Control Plane on `AWS`

Launch Pipeline Control Plane on `Azure`

Define `.pipeline.yml` pipeline workflow configuration for your Spark application