Skip to content

Commit

Permalink
feat(mlflow): Update Mlflow, API server, pyfunc server and batch pred…
Browse files Browse the repository at this point in the history
…ictor (#617)

# Description
This PR is the **second part** (2 out of 3) of a series of PRs to update
the version of Mlflow used by Merlin (to `1.26.1`*). These changes are:

1. Update the version of Mlflow used in the Merlin SDK and publish it to
PyPI
2. Update 
- the Merlin pyfunc server and batch predictor to use the updated Merlin
SDK version released in step 1 and publish them to PyPI
- the Merlin API server to ensure that requests sent to Mlflow reflect
the update API endpoint contracts
   - the Mlflow image so that it's built using version `1.26.1`
3. Update the default pyfunc server and batch predictor version in the
Merlin SDK and publish its new version to PyPI

# Main Modifications
- `.github/workflows/external.yml` - Updated the CICD job to release a
new version of the Mlflow image
- `api/mlflow/mlflow.go` - Added a header in all requests sent to the
Mlflow server from the Merlin API server because it's now required
- `api/mlflow/response.go` - Changed the data type of certain fields in
Mlflow's responses sent to Merlin
- `mlflow/Dockerfile` - Updated Dockerfile of the Mlflow server with
newer dependencies
- `python/batch-predictor/requirements.txt` - Updated the requirements
file with a newer version of the SDK which requires version `1.26.1` of
Mlflow
- `python/pyfunc-server/requirements.txt` - Updated the requirements
file with a newer version of the SDK which requires version `1.26.1` of
Mlflow

# Tests
<!-- Besides the existing / updated automated tests, what specific
scenarios should be tested? Consider the backward compatibility of the
changes, whether corner cases are covered, etc. Please describe the
tests and check the ones that have been completed. Eg:
- [x] Deploying new and existing standard models
- [ ] Deploying PyFunc models
-->

# Checklist
- [x] Added PR label
- [ ] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduces API
changes

# Release Notes
<!--
Does this PR introduce a user-facing change?
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
NONE
```
  • Loading branch information
deadlycoconuts authored Nov 18, 2024
1 parent b310de4 commit eefea9e
Show file tree
Hide file tree
Showing 12 changed files with 27 additions and 16 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/external.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ jobs:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push MLflow Docker image
uses: docker/build-push-action@v5
uses: docker/build-push-action@v6
with:
context: mlflow
push: true
file: mlflow/Dockerfile
build-args: MLFLOW_VERSION=1.3.0
tags: ghcr.io/caraml-dev/mlflow:1.3.0
build-args: MLFLOW_VERSION=1.26.1
tags: ghcr.io/caraml-dev/mlflow:1.26.1
3 changes: 2 additions & 1 deletion .github/workflows/merlin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -400,8 +400,9 @@ jobs:
K3D_CLUSTER: merlin-cluster
LOCAL_REGISTRY_PORT: 12345
LOCAL_REGISTRY: "dev.localhost"
DOCKER_REGISTRY: "dev.localhost:12345"
INGRESS_HOST: "127.0.0.1.nip.io"
MERLIN_CHART_VERSION: 0.13.4
MERLIN_CHART_VERSION: 0.13.18
E2E_PYTHON_VERSION: "3.10.6"
K3S_VERSION: v1.26.7-k3s1
steps:
Expand Down
1 change: 1 addition & 0 deletions api/mlflow/mlflow.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ func (mlflow *client) doCall(req *request, resp interface{}) error {
return err
}

httpReq.Header.Set("Content-Type", "application/json")
httpResp, err := mlflow.httpClient.Do(httpReq)
if err != nil {
return err
Expand Down
4 changes: 2 additions & 2 deletions api/mlflow/response.go
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ type Run struct {
type Info struct {
RunID string `json:"run_id"`
ExperimentID string `json:"experiment_id"`
StartTime string `json:"start_time"`
EndTime string `json:"end_time"`
StartTime int `json:"start_time"`
EndTime int `json:"end_time"`
ArtifactURI string `json:"artifact_uri"`
LifecycleStage string `json:"lifecycle_stage"`
Status string `json:"status"`
Expand Down
8 changes: 4 additions & 4 deletions mlflow/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ RUN apt-get update && \
apt-get clean

RUN pip install google-cloud-storage
RUN pip install psycopg2-binary==2.8.6
RUN pip install psycopg2-binary==2.9.10

ARG BOTO3_VERSION=1.7.12
ARG MLFLOW_VERSION=1.3.0
ARG SQLALCHEMY_VERSION=1.4.46
ARG BOTO3_VERSION=1.35.39
ARG MLFLOW_VERSION=1.26.1
ARG SQLALCHEMY_VERSION=1.4.54

RUN pip install boto3==${BOTO3_VERSION}
RUN pip install SQLAlchemy==${SQLALCHEMY_VERSION}
Expand Down
2 changes: 1 addition & 1 deletion python/batch-predictor/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cloudpickle==2.0.0
findspark==2.0.1
merlin-sdk==0.44.0
merlin-sdk==0.45.1
mlflow==1.26.1
pyarrow>=0.14.1,<=17.0.0
pyspark==3.0.1
Expand Down
2 changes: 1 addition & 1 deletion python/observation-publisher/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ six==1.16.0
# querystring-parser
smmap==5.0.1
# via gitdb
sqlalchemy==2.0.27
sqlalchemy==1.4.54
# via
# alembic
# mlflow
Expand Down
2 changes: 1 addition & 1 deletion python/pyfunc-server/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ cloudpickle==2.0.0
confluent-kafka==2.3.0
grpcio-health-checking
grpcio-reflection
merlin-sdk==0.44.0
merlin-sdk==0.45.1
numpy>=1.8.2
orjson>=2.6.8
prometheus-client
Expand Down
4 changes: 3 additions & 1 deletion scripts/e2e/deploy-merlin.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ install_merlin() {
--set deployment.image.registry=${DOCKER_REGISTRY} \
--set deployment.image.repository=merlin \
--set deployment.image.tag=${VERSION} \
--set transformer.image=${DOCKER_REGISTRY}/merlin-transformer:${VERSION} \
--set rendered.overrides.StandardTransformerConfig.ImageName=${DOCKER_REGISTRY}/merlin-transformer:${VERSION} \
--set imageBuilder.dockerRegistry=${DOCKER_REGISTRY} \
--set imageBuilder.predictionJobBaseImages."3\.7\.*".imageName=${DOCKER_REGISTRY}/merlin/merlin-pyspark-base-py37:${VERSION} \
--set imageBuilder.predictionJobBaseImages."3\.7\.*".buildContextURI=git://github.com/caraml-dev/merlin.git#${GIT_REF} \
Expand All @@ -48,6 +48,8 @@ install_merlin() {
--set imageBuilder.predictionJobBaseImages."3\.10\.*".mainAppPath=/home/spark/merlin-spark-app/main.py \
--set ingress.host=merlin.caraml.${INGRESS_HOST} \
--set mlflow.ingress.host=merlin-mlflow.caraml.${INGRESS_HOST} \
--set mlflow.image.repository=caraml-dev/mlflow \
--set mlflow.image.tag=1.26.1 \
--set mlp.deployment.apiHost=http://mlp.caraml.${INGRESS_HOST}/v1 \
--set mlp.deployment.mlflowTrackingUrl=http://merlin-mlflow.caraml.${INGRESS_HOST} \
--set mlp.ingress.host=mlp.caraml.${INGRESS_HOST} \
Expand Down
2 changes: 1 addition & 1 deletion scripts/e2e/setup-and-run-e2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ export LOCAL_REGISTRY=dev.localhost
export DOCKER_REGISTRY=${LOCAL_REGISTRY}:${LOCAL_REGISTRY_PORT}
export VERSION=test-local
export MLP_CHART_VERSION=0.4.18
export MERLIN_CHART_VERSION=0.11.1
export MERLIN_CHART_VERSION=0.13.18


# Create k3d cluster and managed registry
Expand Down
6 changes: 6 additions & 0 deletions scripts/e2e/setup-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,12 @@ install_knative() {
kubectl rollout status deployment/domainmapping-webhook -n knative-serving -w --timeout=${TIMEOUT}
kubectl rollout status deployment/webhook -n knative-serving -w --timeout=${TIMEOUT}

# Update knative config-deployment config map to allow resolving of the local e2e image repository
kubectl get configmap -n knative-serving config-deployment -o yaml > temp.yaml
yq -i '.data.registries-skipping-tag-resolving |= ("\(.),${DOCKER_REGISTRY}" | envsubst)' temp.yaml
kubectl apply -f temp.yaml
kubectl rollout restart deployment -n knative-serving controller

# Install knative-istio
kubectl apply -f https://github.com/knative-sandbox/net-istio/releases/download/knative-v${KNATIVE_NET_ISTIO_VERSION}/net-istio.yaml

Expand Down
2 changes: 1 addition & 1 deletion scripts/quick_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -
export DOCKER_REGISTRY=ghcr.io/caraml-dev
export VERSION=0.33.0
export GIT_REF=v0.33.0
export MERLIN_CHART_VERSION=0.11.7
export MERLIN_CHART_VERSION=0.13.18
export OAUTH_CLIENT_ID=""

cd e2e; ./deploy-merlin.sh $INGRESS_HOST $DOCKER_REGISTRY $VERSION $GIT_REF $MERLIN_CHART_VERSION $OAUTH_CLIENT_ID

0 comments on commit eefea9e

Please sign in to comment.