Skip to content

Commit

Permalink
Introducing model version revision (#471)
Browse files Browse the repository at this point in the history
<!--  Thanks for sending a pull request!  Here are some tips for you:

1. Run unit tests and ensure that they are passing
2. If your change introduces any API changes, make sure to update the
e2e tests
3. Make sure documentation is updated for your PR!

-->

**What this PR does / why we need it**:
<!-- Explain here the context and why you're making the change. What is
the problem you're trying to solve. --->

The current implementation of model version redeployment patches the
existing inference service however, if the patch fails, it will delete
the existing inference service
([ref](https://github.com/caraml-dev/merlin/blob/v0.33.0/api/cluster/controller.go#L243-L250)).

This PR reworks how we do the redeployment by introducing model version
revision which itself is a new inference service that deployed for every
redeployment. Then we will have a new Istio VirtualService that routes
the traffic to the latest revision (latest inference service). Note that
the new VirtualService adds a `Content-Type` header to the request
because we need it for graceful migration to Kserve 0.11 where it
requires the `Content-Type` header to be set, but our clients might not
using this header.

It also changes the naming convention for the deployed inference
service:

From: `{model_name}-{model_version}` to
`{model_name}-{model_version}-{model_version_revision}`

### Serverless Deployment Before

<img width="500" alt="image"
src="https://github.com/caraml-dev/merlin/assets/8122852/c95675f2-0d0b-47f6-ac80-69db63d6cf21">

### Serverless Deployment After

<img width="500" alt="image"
src="https://github.com/caraml-dev/merlin/assets/8122852/3e31ca6f-e406-4497-83c5-07ada08b1da0">

### Raw Deployment Before

<img width="500" alt="image"
src="https://github.com/caraml-dev/merlin/assets/8122852/a2b56e92-36c5-456b-84ec-a72f51bc158b">

### Raw Deployment After

<img width="500" alt="image"
src="https://github.com/caraml-dev/merlin/assets/8122852/bfc6e92d-4d1f-449c-b61c-aa89604e584f">

### Model Version Deployment History UI

<img width="500" alt="Screen Shot 2023-10-16 at 14 56 22"
src="https://github.com/caraml-dev/merlin/assets/8122852/abd933b5-344c-489a-9ab8-c2293f1f5be3">

Changes this PR introduce:

1. Add `revision_id` column to `version_endpoints` table
2. Add revision number to model service name, prefixed with `r`
3. Add model version virtual service that routes to deployed model
service with revision number to maintain backward compatibility
a. New model service with revision number's URL is something like:
my-model-name-1-r1.domain.com
b. To maintain backward compatibility, the new model version virtual
service have this URL: my-model-name-1.domain.com
4. Add Deployment History tab in the UI
5. Add Deployments API
6. Add redeploy label to deploymentCounter Prometheus metrics
7. Update e2e-test to use environment's default_deployment_config;
Increase github e2e-test cluster environment's default_deployment_config

**Which issue(s) this PR fixes**:
<!--
*Automatically closes linked issue when PR is merged.
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
-->

Fixes model version redeployment got deleted if the redeployment fails.

**Does this PR introduce a user-facing change?**:
<!--
If no, just write "NONE" in the release-note block below.
If yes, a release note is required. Enter your extended release note in
the block below.
If the PR requires additional action from users switching to the new
release, include the string "action required".

For more information about release notes, see kubernetes' guide here:
http://git.k8s.io/community/contributors/guide/release-notes.md
-->

```release-note
Add model version deployment history UI
```

**Checklist**

- [ ] Added unit test, integration, and/or e2e tests
- [ ] Tested locally
- [ ] Updated documentation
- [ ] Update Swagger spec if the PR introduce API changes
- [ ] Regenerated Golang and Python client if the PR introduce API
changes
  • Loading branch information
ariefrahmansyah authored Oct 19, 2023
1 parent 9beca08 commit a041b7e
Show file tree
Hide file tree
Showing 59 changed files with 2,373 additions and 2,857 deletions.
19 changes: 19 additions & 0 deletions api/api/deployment_api.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package api

import (
"fmt"
"net/http"
)

type DeploymentController struct {
*AppContext
}

func (c *DeploymentController) ListDeployments(r *http.Request, vars map[string]string, _ interface{}) *Response {
deployments, err := c.DeploymentService.ListDeployments(vars["model_id"], vars["version_id"], vars["endpoint_id"])
if err != nil {
return InternalServerError(fmt.Sprintf("Error listing deployments: %v", err))
}

return Ok(deployments)
}
100 changes: 100 additions & 0 deletions api/api/deployment_api_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
package api

import (
"fmt"
"net/http"
"testing"
"time"

"github.com/caraml-dev/merlin/models"
"github.com/caraml-dev/merlin/service/mocks"
"github.com/google/uuid"
)

func TestDeploymentController_ListDeployments(t *testing.T) {
endpointUUID := uuid.New()
endpointUUIDString := fmt.Sprint(endpointUUID)

createdUpdated := models.CreatedUpdated{
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
}

testCases := []struct {
desc string
vars map[string]string
deploymentService func() *mocks.DeploymentService
expected *Response
}{
{
desc: "Should success list deployments",
vars: map[string]string{
"model_id": "model",
"version_id": "1",
"endpoint_id": endpointUUIDString,
},
deploymentService: func() *mocks.DeploymentService {
mockSvc := &mocks.DeploymentService{}
mockSvc.On("ListDeployments", "model", "1", endpointUUIDString).Return([]*models.Deployment{
{
ID: models.ID(1),
ProjectID: models.ID(1),
VersionModelID: models.ID(1),
VersionID: models.ID(1),
VersionEndpointID: endpointUUID,
Status: models.EndpointRunning,
Error: "",
CreatedUpdated: createdUpdated,
},
}, nil)
return mockSvc
},
expected: &Response{
code: http.StatusOK,
data: []*models.Deployment{
{
ID: models.ID(1),
ProjectID: models.ID(1),
VersionModelID: models.ID(1),
VersionID: models.ID(1),
VersionEndpointID: endpointUUID,
Status: models.EndpointRunning,
Error: "",
CreatedUpdated: createdUpdated,
},
},
},
},
{
desc: "Should return 500 when failed fetching list of deployments",
vars: map[string]string{
"model_id": "model",
"version_id": "1",
"endpoint_id": endpointUUIDString,
},
deploymentService: func() *mocks.DeploymentService {
mockSvc := &mocks.DeploymentService{}
mockSvc.On("ListDeployments", "model", "1", endpointUUIDString).Return(nil, fmt.Errorf("Database is down"))
return mockSvc
},
expected: &Response{
code: http.StatusInternalServerError,
data: Error{
Message: "Error listing deployments: Database is down",
},
},
},
}
for _, tC := range testCases {
t.Run(tC.desc, func(t *testing.T) {
mockSvc := tC.deploymentService()
ctl := &DeploymentController{
AppContext: &AppContext{
DeploymentService: mockSvc,
},
}
resp := ctl.ListDeployments(&http.Request{}, tC.vars, nil)
assertEqualResponses(t, tC.expected, resp)
})
}
}
5 changes: 5 additions & 0 deletions api/api/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ type AppContext struct {
DB *gorm.DB
Enforcer enforcer.Enforcer

DeploymentService service.DeploymentService
EnvironmentService service.EnvironmentService
ProjectsService service.ProjectsService
ModelsService service.ModelsService
Expand Down Expand Up @@ -154,6 +155,7 @@ func NewRouter(appCtx AppContext) (*mux.Router, error) {
if err != nil {
return nil, err
}
deploymentController := DeploymentController{&appCtx}
environmentController := EnvironmentController{&appCtx}
projectsController := ProjectsController{&appCtx}
modelEndpointsController := ModelEndpointsController{&appCtx}
Expand Down Expand Up @@ -206,6 +208,9 @@ func NewRouter(appCtx AppContext) (*mux.Router, error) {
// To maintain backward compatibility with SDK v0.1.0
{http.MethodDelete, "/models/{model_id:[0-9]+}/versions/{version_id:[0-9]+}/endpoint", nil, endpointsController.DeleteEndpoint, "DeleteDefaultEndpoint"},

// Deployments API
{http.MethodGet, "/models/{model_id:[0-9]+}/versions/{version_id:[0-9]+}/endpoints/{endpoint_id}/deployments", nil, deploymentController.ListDeployments, "ListDeployments"},

{http.MethodGet, "/models/{model_id:[0-9]+}/versions/{version_id:[0-9]+}/endpoint/{endpoint_id}", nil, endpointsController.GetEndpoint, "GetEndpoint"},
{http.MethodPut, "/models/{model_id:[0-9]+}/versions/{version_id:[0-9]+}/endpoint/{endpoint_id}", models.VersionEndpoint{}, endpointsController.UpdateEndpoint, "UpdateEndpoint"},
{http.MethodDelete, "/models/{model_id:[0-9]+}/versions/{version_id:[0-9]+}/endpoint/{endpoint_id}", nil, endpointsController.DeleteEndpoint, "DeleteEndpoint"},
Expand Down
8 changes: 6 additions & 2 deletions api/api/version_endpoints_api.go
Original file line number Diff line number Diff line change
Expand Up @@ -379,12 +379,16 @@ func (c *EndpointsController) ListContainers(r *http.Request, vars map[string]st
if err != nil {
return NotFound(fmt.Sprintf("Version not found: %v", err))
}
endpoint, err := c.EndpointsService.FindByID(ctx, endpointID)
if err != nil {
return NotFound(fmt.Sprintf("Endpoint not found: %v", err))
}

endpoint, err := c.EndpointsService.ListContainers(ctx, model, version, endpointID)
containers, err := c.EndpointsService.ListContainers(ctx, model, version, endpoint)
if err != nil {
return InternalServerError(fmt.Sprintf("Error while getting container for endpoint: %v", err))
}
return Ok(endpoint)
return Ok(containers)
}

func validateUpdateRequest(prev *models.VersionEndpoint, new *models.VersionEndpoint) error {
Expand Down
16 changes: 14 additions & 2 deletions api/api/version_endpoints_api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -536,7 +536,13 @@ func TestListContainers(t *testing.T) {
},
endpointService: func() *mocks.EndpointsService {
svc := &mocks.EndpointsService{}
svc.On("ListContainers", context.Background(), mock.Anything, mock.Anything, uuid).Return([]*models.Container{
svc.On("FindByID", context.Background(), uuid).Return(&models.VersionEndpoint{
ID: uuid,
VersionModelID: models.ID(1),
VersionID: models.ID(1),
RevisionID: models.ID(1),
}, nil)
svc.On("ListContainers", context.Background(), mock.Anything, mock.Anything, mock.Anything).Return([]*models.Container{
{
Name: "pod-1",
PodName: "pod-1-1",
Expand Down Expand Up @@ -656,7 +662,13 @@ func TestListContainers(t *testing.T) {
},
endpointService: func() *mocks.EndpointsService {
svc := &mocks.EndpointsService{}
svc.On("ListContainers", context.Background(), mock.Anything, mock.Anything, uuid).Return(nil, fmt.Errorf("Error creating secret: db is down"))
svc.On("FindByID", context.Background(), uuid).Return(&models.VersionEndpoint{
ID: uuid,
VersionModelID: models.ID(1),
VersionID: models.ID(1),
RevisionID: models.ID(1),
}, nil)
svc.On("ListContainers", context.Background(), mock.Anything, mock.Anything, mock.Anything).Return(nil, fmt.Errorf("Error creating secret: db is down"))
return svc
},
expected: &Response{
Expand Down
2 changes: 1 addition & 1 deletion api/cluster/container_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ func TestContainer_GetContainers(t *testing.T) {
clusterMetadata := Metadata{GcpProject: "my-gcp", ClusterName: "my-cluster"}

containerFetcher := NewContainerFetcher(v1Client, clusterMetadata)
ctl, _ := newController(knClient, kfClient, v1Client, nil, policyV1Client, config.DeploymentConfig{}, containerFetcher, nil)
ctl, _ := newController(knClient, kfClient, v1Client, nil, policyV1Client, nil, config.DeploymentConfig{}, containerFetcher, nil)
containers, err := ctl.GetContainers(context.Background(), tt.args.namespace, tt.args.labelSelector)
if !tt.wantError {
assert.NoErrorf(t, err, "expected no error got %v", err)
Expand Down
Loading

0 comments on commit a041b7e

Please sign in to comment.