Skip to content

SebastianScherer88/bettmensch.ai

Repository files navigation

🏨 Welcome to Bettmensch.AI

bettmensch.ai logo

Bettmensch.AI is a Kubernetes native open source platform for GitOps based ML workloads that allows for tight CI and CD integrations.

docker unit tests integration tests platform tests

πŸ”€ CI

The .github/workflows directory contains all Github Actions workflow files.

Their respective state can be seen at the top of this README.

Setup

πŸŒ‰ AWS Infrastructure & Kubernetes

Before you start, make sure you have the following on your machine:

  • a working terraform installation
  • a working aws CLI installation configured to your AWS account
  • a dockerhub account your-account

To provision

  • the S3 bucket for the Argo Workflows artifact repository
  • Karpenter required infrastructure (IAM, message queues, etc.)
  • a working EKS cluster
  • the configured Karpenter, Argo Workflows & Volcano kubernetes installations on the cluster,
make platform.up

To port forward to

  • the ArgoWorkflow server running on EKS and
  • the Mlflow server running on EKS,

run:

make platform.connect

When you're done, you can tear down the stack by running

make platform.down

πŸ’» Dashboard

To build the bettmensch.ai's custom dashboard's docker image, run:

make dashboard.build DOCKER_ACCOUNT=your-account

This will build the image and tag it locally with

  • your-account/bettmensch-ai-dashboard:3.11-<commit-sha>
  • your-account/bettmensch-ai-dashboard:3.11-latest

To push the image to the docker repository and make it accessible to the platform, run

make dashboard.push DOCKER_ACCOUNT=your-account

To run the dashboard locally, run:

make dashboard.run

See the docker directory for more details.

πŸ“š Python SDK installation

To install the python library bettmensch_ai with torch-pipelines support, run

make sdk.install EXTRAS=torch-pipelines

from the repository's top directory.

You can now start authoring Pipelines and start submitting Flows and start monitoring them on both the ArgoWorkflow as well as the bettmensch.ai dashboards.

πŸ”§ Running tests

To run unit tests for the python library, run

make sdk.test SUITE=unit

To run integration tests for the python library, run

make sdk.test SUITE=integration

To run K8s tests for the python library (requires a running and connected bettmensch.ai platform), run

make sdk.test SUITE=k8s

Features (under active development )

πŸ’» Dashboard

bettmensch.ai

πŸ‘€ A dashboard for monitoring all workloads running on the platform.

πŸ‘ To actively manage Pipelines, Flows, please see the respective documentation of bettmensch.ai SDK.

πŸ”€ Pipelines & Flows

Overview

bettmensch.ai comes with a python SDK for defining and executing distributed (ML) workloads by leveraging the ArgoWorkflows framework and the official hera library. In this framework, pipelines are DAGs with graph nodes implementing your custom logic for the given pipeline step, executed on K8s in a containerised step.

Examples

The io module implements the classes implementing the transfer of inputs and outputs between a workfload's components.

Using InputParameter and OutputParameter for int, float or str type data:

from bettmensch_ai.pipelines.io import InputParameter, OutputParameter
from bettmensch_ai.pipelines import pipeline, as_component

@as_component
def add(
    a: InputParameter = 1,
    b: InputParameter = 2,
    sum: OutputParameter = None,
) -> None:

    sum.assign(a + b)

@as_pipeline("test-parameter-pipeline", "argo", True)
def a_plus_b_plus_2(a: InputParameter = 1, b: InputParameter = 2) -> None:
    a_plus_b = add(
        "a-plus-b",
        a=a,
        b=b,
    )

    a_plus_b_plus_2 = add(
        "a-plus-b-plus-2",
        a=a_plus_b.outputs["sum"],
        b=InputParameter("two", 2),
    )

a_plus_b_plus_2.export(test_output_dir)
a_plus_b_plus_2.register()
a_plus_b_plus_2.run(inputs={'a':3,'b':2})

Using InputArtifact and OutputArtifact for all other types of data, leveraging AWS's S3 storage service:

from bettmensch_ai.pipelines.io import InputArtifact, OutputArtifact
from bettmensch_ai.pipelines import as_component, pipeline

@as_component
def convert_to_artifact(
    a_param: InputParameter,
    a_art: OutputArtifact = None,
) -> None:

    with open(a_art.path, "w") as a_art_file:
        a_art_file.write(str(a_param))

@as_component
def show_artifact(a: InputArtifact) -> None:

    with open(a.path, "r") as a_art_file:
        a_content = a_art_file.read()

    print(f"Content of input artifact a: {a_content}")

@as_pipeline("test-artifact-pipeline", "argo", True)
def parameter_to_artifact(
    a: InputParameter = "Param A",
) -> None:
    convert = convert_to_artifact(
        "convert-to-artifact",
        a_param=a,
    )

    show = show_artifact(
        "show-artifact",
        a=convert.outputs["a_art"],
    )

parameter_to_artifact.export(test_output_dir)
parameter_to_artifact.register()
parameter_to_artifact.run(inputs={'a':"Test value A"})

NOTE: For more examples (including cross K8s node CPU and GPU torch.distributed processes), see

  • the pipelines.component.examples module,
  • the pipelines.pipeline.examples module, and
  • this repository's integration test and k8s test sections.

The submitted pipelines can be viewed on the dashboard's Pipelines section:

bettmensch.ai pipelines

The executed flows can be viewed on the dashboard's Flows section:

bettmensch.ai flows

Building images

To build a

  • standard,
  • pytorch, or
  • pytorch-lightning

docker image to be used for the pipeline components, run

make component.build DOCKER_ACCOUNT=your-account COMPONENT=standard # pytorch, pytorch-lightning

This will build the image and tag it locally with

  • your-account/bettmensch-ai-standard:3.11-<commit-sha>
  • your-account/bettmensch-ai-standard:3.11-latest

To push the image to the docker repository and make it accessible to the platform, run

make component.push DOCKER_ACCOUNT=your-account COMPONENT=standard # pytorch, pytorch-lightning

By default, the components will use the

  • standard image for the Component class
  • pytorch image for the DDPComponent class

See the k8s ddp test cases for how to use the pytorch-lightning image for the DDPComponent.

How it works

The following sequence diagram illustrates how the creation, registration and running of Pipeline's is supported by the infrastructure stack initiated in the Setup section: BettmenschAI - Sequence diagram

πŸ“š Models

bettmensch.ai models

We use mlflow as the default model registry backend for the time being, using an S3 storage backend for persisting the artifacts.

NOTE: Currently, the user needs access to the mlflow service on the K8s cluster. This is achieved by the port forwarding done in make platform.connect (see the setup section earlier.)

πŸš€ Servers

bettmensch.ai servers

Coming soon.

Credits

This platform makes liberal use of various great open source projects:

  • ArgoWorkflows: Kubernetes native workload orchestration. Also powers the popular Kubeflow Pipelines, which inspired the Pipelines & Flows of this project.
  • hera: Official Argo Python SDK for defining Workflow(Template)s
  • streamlit: A python library for designing interactive dashboards
  • mlflow: ML experiment tracking, model registry and serving support

About

An open source platform for GitOps based ML workloads

Resources

Stars

Watchers

Forks

Packages

No packages published