Skip to content

Commit

Permalink
feat: update observation publisher to use newer sdk (#509)
Browse files Browse the repository at this point in the history
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->

**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->
This is a follow up to #504 ,
which update the observation publisher to use the latest Merlin SDK.
Also introduce the use of piptools to pin the dependencies.

Another important changes is using dict instead of the InferenceSchema
dataclass. This is because OmegaConf (and by extension, Hydra) does not
have a way to instantiate the correct subclass to an abstract class
field, so we have to perform our own deserialization instead.

**Which issue(s) this PR fixes**:
<!--
*Automatically closes linked issue when PR is merged.
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
-->

Fixes #

**Does this PR introduce a user-facing change?**:
<!--
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->
NONE

```release-note
NONE
```

**Checklist**

- [x] Added unit test, integration, and/or e2e tests
- [x] Tested locally
- [x] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
  • Loading branch information
khorshuheng authored Jan 2, 2024
1 parent 4e084a6 commit b183593
Show file tree
Hide file tree
Showing 18 changed files with 621 additions and 221 deletions.
48 changes: 48 additions & 0 deletions .github/workflows/merlin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,29 @@ jobs:
POSTGRES_PASSWORD: ${{ secrets.DB_PASSWORD }}
run: make it-test-api-ci


test-observation-publisher:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v4
id: setup-python
with:
python-version: '3.10'
- uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-${{ steps.setup-python-outputs.python-version }}-
- name: Install dependencies
working-directory: ./python/observation-publisher
run: |
make setup
- name: Unit test observation publisher
working-directory: ./python/observation-publisher
run: make test

build-ui:
runs-on: ubuntu-latest
steps:
Expand Down Expand Up @@ -355,6 +378,30 @@ jobs:
path: merlin-logger.${{ needs.create-version.outputs.version }}.tar
retention-days: ${{ env.ARTIFACT_RETENTION_DAYS }}

build-observation-publisher:
runs-on: ubuntu-latest
needs:
- create-version
- test-observation-publisher
env:
DOCKER_REGISTRY: ghcr.io
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-observation-publisher:${{ needs.create-version.outputs.version }}"
steps:
- uses: actions/checkout@v2
- name: Build Observation Publisher Docker
env:
OBSERVATION_PUBLISHER_IMAGE_TAG: ${{ env.DOCKER_IMAGE_TAG }}
run: make observation-publisher
working-directory: ./python
- name: Save Observation Publisher Docker
run: docker image save --output merlin-observation-publisher.${{ needs.create-version.outputs.version }}.tar ${{ env.DOCKER_IMAGE_TAG }}
- name: Publish Observation Publisher Docker Artifact
uses: actions/upload-artifact@v2
with:
name: merlin-observation-publisher.${{ needs.create-version.outputs.version }}.tar
path: merlin-observation-publisher.${{ needs.create-version.outputs.version }}.tar
retention-days: ${{ env.ARTIFACT_RETENTION_DAYS }}

e2e-test:
runs-on: ubuntu-latest
needs:
Expand Down Expand Up @@ -443,6 +490,7 @@ jobs:
- build-api
- build-batch-predictor-base
- build-pyfunc-server-base
- build-observation-publisher
- test-python-sdk
- e2e-test
with:
Expand Down
20 changes: 20 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,23 @@ jobs:
run: |
docker image load --input merlin-pyfunc-base.${{ inputs.version }}.tar
docker push ${{ env.DOCKER_IMAGE_TAG }}
publish-observation-publisher:
runs-on: ubuntu-latest
env:
DOCKER_IMAGE_TAG: "ghcr.io/${{ github.repository }}/merlin-observation-publisher:${{ inputs.version }}"
steps:
- name: Log in to the Container registry
uses: docker/login-action@v1
with:
registry: ${{ env.DOCKER_REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Download Observation Publisher Docker Artifact
uses: actions/download-artifact@v2
with:
name: merlin-observation-publisher.${{ inputs.version }}.tar
- name: Retag and Push Docker Image
run: |
docker image load --input merlin-observation-publisher.${{ inputs.version }}.tar
docker push ${{ env.DOCKER_IMAGE_TAG }}
4 changes: 2 additions & 2 deletions python/Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
IMAGE_TAG=dev
OBSERVATION_PUBLISHER_IMAGE_TAG ?= observation-publisher:dev

.PHONY: observation-publisher
observation-publisher:
@echo "Building image for observation publisher..."
@docker build -t observation-publisher:${IMAGE_TAG} -f observation-publisher/Dockerfile .
@docker build -t ${OBSERVATION_PUBLISHER_IMAGE_TAG} -f observation-publisher/Dockerfile --progress plain .
12 changes: 5 additions & 7 deletions python/observation-publisher/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
FROM python:3.10

WORKDIR /mlobs

COPY sdk ./sdk
WORKDIR /mlobs/observation-publisher
COPY observation-publisher/requirements.txt .
COPY sdk/ ./sdk
ENV SDK_PATH=/mlobs/sdk
RUN pip install -r requirements.txt
RUN rm requirements.txt
RUN rm -rf sdk
COPY observation-publisher/conf/ ./conf
COPY observation-publisher/publisher/ ./publisher

WORKDIR /mlobs
COPY observation-publisher ./observation-publisher
WORKDIR /mlobs/observation-publisher
ENTRYPOINT ["python", "-m", "publisher"]
8 changes: 7 additions & 1 deletion python/observation-publisher/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ ENVIRONMENT_CONFIG = "example-override"
setup:
@echo "Setting up environment..."
@pip install -r requirements.txt --use-pep517
@pip install -r requirements-dev.txt

.PHONY: pip-compile
pip-compile:
@echo "Compiling requirements..."
@python -m piptools compile

.PHONY: test
test:
Expand All @@ -13,4 +19,4 @@ test:
.PHONY: run
run:
@echo "Running observation publisher..."
@python -m observation_publisher +environment=${ENVIRONMENT_CONFIG}
@python -m publisher +environment=${ENVIRONMENT_CONFIG}
10 changes: 10 additions & 0 deletions python/observation-publisher/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ make run


## Development
### Setup
```bash
make setup
```

### Updating requirements.txt
Make changes on requirements.in, then execute
```bash
make pip-compile
```

### Run test
```bash
Expand Down
53 changes: 10 additions & 43 deletions python/observation-publisher/conf/environment/example-override.yaml
Original file line number Diff line number Diff line change
@@ -1,53 +1,20 @@
model_id: "test-model"
model_version: "0.1.0"
inference_schema:
# Supported model types:
# - BINARY_CLASSIFICATION
# - MULTICLASS_CLASSIFICATION
# - REGRESSION
# - RANKING
# The prediction output schema that is corresponded to the model
# type must be provided.
# Inference schema associated with the model id and version. For full documentation on the support configuration,
# refer to the Merlin SDK. Example below is for a binary classification model.

# Example for binary classification
type: "BINARY_CLASSIFICATION"
binary_classification:
# The classification label STRING of this event.
prediction_label_column: "label"
# Optional: The likelihood of the event (Probability between 0 to 1.0).
model_prediction_output:
output_class: "BinaryClassificationOutput"
prediction_score_column: "score"
actual_label_column: "actual_label"
positive_class_label: "positive"
negative_class_label: "negative"
score_threshold: 0.5

# # Example for multiclass classification
# type: "MULTICLASS_CLASSIFICATION"
# multiclass_classification:
# # The classification label STRING of this event.
# prediction_label_column: "label"
# # Optional: The likelihood of the event (Probability between 0 to 1.0).
# prediction_score_column: "score"

# # Example for regression
# type: "REGRESSION"
# regression:
# # FLOAT64 value for the prediction value.
# prediction_score_column: "score"

# # Example for ranking
# type: "RANKING"
# ranking:
# # A group of predictions within which items are ranked..
# prediction_group_id_column: "prediction_group"
# # INT64 value for the rank of the prediction within the group.
# rank_column: "rank"

# Column name to data types mapping for feature columns. Supported types are:
# - INT64
# - FLOAT64
# - STRING
feature_types:
distance: "INT64"
transaction: "FLOAT64"
# Optional: Column name to be used for prediction id.
# If not provided, it's assumed to be prediction_id.
distance: "int64"
transaction: "float64"
prediction_id_column: "prediction_id"
observability_backend:
# Supported backend types:
Expand Down
12 changes: 10 additions & 2 deletions python/observation-publisher/publisher/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import hydra
from merlin.observability.inference import InferenceSchema
from omegaconf import OmegaConf

from publisher.config import PublisherConfig
Expand All @@ -12,11 +13,18 @@ def start_consumer(cfg: PublisherConfig) -> None:
if missing_keys:
raise RuntimeError(f"Got missing keys in config:\n{missing_keys}")
prediction_log_consumer = new_consumer(cfg.environment.observation_source)
inference_schema = InferenceSchema.from_dict(
OmegaConf.to_container(cfg.environment.inference_schema)
)
observation_sink = new_observation_sink(
cfg.environment.observability_backend, cfg.environment.model
config=cfg.environment.observability_backend,
inference_schema=inference_schema,
model_id=cfg.environment.model_id,
model_version=cfg.environment.model_version,
)
prediction_log_consumer.start_polling(
observation_sink=observation_sink, model_spec=cfg.environment.model
observation_sink=observation_sink,
inference_schema=inference_schema,
)


Expand Down
23 changes: 16 additions & 7 deletions python/observation-publisher/publisher/config.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from dataclasses import dataclass
from enum import Enum, unique
from enum import Enum
from typing import Optional

from hydra.core.config_store import ConfigStore
from merlin.observability.inference import InferenceSchema


@dataclass
Expand All @@ -12,20 +11,24 @@ class ArizeConfig:
space_key: str


@unique
class ObservabilityBackendType(Enum):
ARIZE = 1
ARIZE = "arize"


@dataclass
class ObservabilityBackend:
type: ObservabilityBackendType
arize_config: Optional[ArizeConfig] = None

def __post_init__(self):
if self.type == ObservabilityBackendType.ARIZE:
assert (
self.arize_config is not None
), "Arize config must be set for Arize observability backend"


@unique
class ObservationSource(Enum):
KAFKA = 1
KAFKA = "kafka"


@dataclass
Expand All @@ -43,12 +46,18 @@ class ObservationSourceConfig:
type: ObservationSource
kafka_config: Optional[KafkaConsumerConfig] = None

def __post_init__(self):
if self.type == ObservationSource.KAFKA:
assert (
self.kafka_config is not None
), "Kafka config must be set for Kafka observation source"


@dataclass
class Environment:
model_id: str
model_version: str
inference_schema: InferenceSchema
inference_schema: dict
observability_backend: ObservabilityBackend
observation_source: ObservationSourceConfig

Expand Down
Loading

0 comments on commit b183593

Please sign in to comment.