Bettmensch.AI is a Kubernetes native open source platform for GitOps based ML workloads that allows for tight CI and CD integrations.
The .github/workflows
directory contains all Github Actions workflow files.
Their respective state can be seen at the top of this README.
Before you start, make sure you have the following on your machine:
- a working
terraform
installation - a working
aws
CLI installation configured to your AWS account - a dockerhub account
your-account
To provision
- the S3 bucket for the Argo Workflows artifact repository
- Karpenter required infrastructure (IAM, message queues, etc.)
- a working EKS cluster
- the configured Karpenter, Argo Workflows & Volcano kubernetes installations on the cluster,
make platform.up
To port forward to
- the
ArgoWorkflow
server running on EKS and - the
Mlflow
server running on EKS,
run:
make platform.connect
When you're done, you can tear down the stack by running
make platform.down
To build the bettmensch.ai
's custom dashboard's docker image, run:
make dashboard.build DOCKER_ACCOUNT=your-account
This will build the image and tag it locally with
your-account/bettmensch-ai-dashboard:3.11-<commit-sha>
your-account/bettmensch-ai-dashboard:3.11-latest
To push the image to the docker repository and make it accessible to the platform, run
make dashboard.push DOCKER_ACCOUNT=your-account
To run the dashboard locally, run:
make dashboard.run
See the docker
directory for more details.
To install the python library bettmensch_ai
with torch-pipelines
support,
run
make sdk.install EXTRAS=torch-pipelines
from the repository's top directory.
You can now start authoring Pipeline
s and start submitting Flow
s and
start monitoring them on both the ArgoWorkflow
as well as the bettmensch.ai
dashboards.
To run unit tests for the python library, run
make sdk.test SUITE=unit
To run integration tests for the python library, run
make sdk.test SUITE=integration
To run K8s tests for the python library (requires a running and connected bettmensch.ai platform), run
make sdk.test SUITE=k8s
π A dashboard for monitoring all workloads running on the platform.
π To actively manage Pipeline
s, Flow
s, please see the
respective documentation of bettmensch.ai
SDK.
bettmensch.ai
comes with a python SDK for defining and executing distributed
(ML) workloads by leveraging the
ArgoWorkflows
framework and the
official hera
library. In this
framework, pipelines are DAGs with graph nodes implementing your custom logic
for the given pipeline step, executed on K8s in a containerised step.
The io
module implements the classes implementing the transfer of inputs and
outputs between a workfload's components.
Using InputParameter
and OutputParameter
for int
, float
or str
type
data:
from bettmensch_ai.pipelines.io import InputParameter, OutputParameter
from bettmensch_ai.pipelines import pipeline, as_component
@as_component
def add(
a: InputParameter = 1,
b: InputParameter = 2,
sum: OutputParameter = None,
) -> None:
sum.assign(a + b)
@as_pipeline("test-parameter-pipeline", "argo", True)
def a_plus_b_plus_2(a: InputParameter = 1, b: InputParameter = 2) -> None:
a_plus_b = add(
"a-plus-b",
a=a,
b=b,
)
a_plus_b_plus_2 = add(
"a-plus-b-plus-2",
a=a_plus_b.outputs["sum"],
b=InputParameter("two", 2),
)
a_plus_b_plus_2.export(test_output_dir)
a_plus_b_plus_2.register()
a_plus_b_plus_2.run(inputs={'a':3,'b':2})
Using InputArtifact
and OutputArtifact
for all other types of data,
leveraging AWS's S3
storage service:
from bettmensch_ai.pipelines.io import InputArtifact, OutputArtifact
from bettmensch_ai.pipelines import as_component, pipeline
@as_component
def convert_to_artifact(
a_param: InputParameter,
a_art: OutputArtifact = None,
) -> None:
with open(a_art.path, "w") as a_art_file:
a_art_file.write(str(a_param))
@as_component
def show_artifact(a: InputArtifact) -> None:
with open(a.path, "r") as a_art_file:
a_content = a_art_file.read()
print(f"Content of input artifact a: {a_content}")
@as_pipeline("test-artifact-pipeline", "argo", True)
def parameter_to_artifact(
a: InputParameter = "Param A",
) -> None:
convert = convert_to_artifact(
"convert-to-artifact",
a_param=a,
)
show = show_artifact(
"show-artifact",
a=convert.outputs["a_art"],
)
parameter_to_artifact.export(test_output_dir)
parameter_to_artifact.register()
parameter_to_artifact.run(inputs={'a':"Test value A"})
NOTE: For more examples (including cross K8s node CPU and GPU torch.distributed
processes), see
- the
pipelines.component.examples
module, - the
pipelines.pipeline.examples
module, and - this repository's integration
test
and k8stest
sections.
The submitted pipelines can be viewed on the dashboard's Pipelines
section:
The executed flows can be viewed on the dashboard's Flows
section:
To build a
standard
,pytorch
, orpytorch-lightning
docker image to be used for the pipeline components, run
make component.build DOCKER_ACCOUNT=your-account COMPONENT=standard # pytorch, pytorch-lightning
This will build the image and tag it locally with
your-account/bettmensch-ai-standard:3.11-<commit-sha>
your-account/bettmensch-ai-standard:3.11-latest
To push the image to the docker repository and make it accessible to the platform, run
make component.push DOCKER_ACCOUNT=your-account COMPONENT=standard # pytorch, pytorch-lightning
By default, the components will use the
standard
image for theComponent
classpytorch
image for theDDPComponent
class
See the k8s
ddp
test cases for how to use the pytorch-lightning
image for
the DDPComponent
.
The following sequence diagram illustrates how the creation, registration and
running of Pipeline
's is supported by the infrastructure stack initiated in
the Setup
section:
We use mlflow as the default model registry backend for the time being, using an S3 storage backend for persisting the artifacts.
NOTE: Currently, the user needs access to the mlflow service on the K8s cluster.
This is achieved by the port forwarding done in make platform.connect
(see
the setup section earlier.)
Coming soon.
This platform makes liberal use of various great open source projects:
- ArgoWorkflows: Kubernetes native
workload orchestration. Also powers the popular
Kubeflow Pipelines, which inspired
the
Pipelines
&Flows
of this project. - hera: Official Argo Python SDK for defining Workflow(Template)s
- streamlit: A python library for designing
interactive dashboards
- streamlit-flow-component: A react-flow integration for streamlit
- st-pages: A nice streamlit plugin for multi-page dashboards
- mlflow: ML experiment tracking, model registry and serving support