Investigate integration with vetiver #163

MarkEdmondson1234 · 2022-01-07T21:41:10Z

https://vetiver.tidymodels.org/

juliasilge · 2022-01-07T22:24:47Z

To get appropriate versioning support, I imagine this will require rstudio/pins-r#572 to be implemented.

The deployment piece alone on its own doesn't necessarily require the model object to be stored as a pin.

MarkEdmondson1234 · 2022-01-08T09:01:40Z

A setup script is here:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

cr_deploy_plumber(file.path(root,"plumber"))

I changed the plumber deploiyment server.R to

pr <- plumber::plumb("api.R")
pr <- vetiver::vetiver_pr_predict()
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

The main bottleneck at the moment is getting a Docker image with pins installed since the arrow depedency is 40mins+ and counting to install, will look for a quicker method.

MarkEdmondson1234 · 2022-01-08T10:52:56Z

The arrow dependency timedout after 60mins, need a bigger build or ideally a pre-existing Docker

juliasilge · 2022-01-08T18:07:11Z

There has been some discussion of making the arrow dependency optional. You might want to check out rstudio/pins-r#537 and see if anything in there helps.

FWIW arrow isn't really needed for the model publishing use case.

MarkEdmondson1234 · 2022-01-08T18:55:34Z

Makes sense, yes it seemed a lot of installation for features not used. I've left a comment to see if there is a way though since it would be nice to have an arrow image available.

MarkEdmondson1234 · 2022-01-08T22:14:02Z

The docker built in about 20mins now so available at gcr.io/gcer-public/vetiver

I haven't seen modifying the actual plumber router before so made a new script file to load that in, this would be fairly boilerplate though I think:

#server.r
pr <- plumber::plumb("api.R")
v <- vetiver::vetiver_pin_read(pins::board_folder("pins"), name = "sacramento_rf")
pr <- vetiver::vetiver_pr_predict(pr, v, debug = TRUE)
pr$run(host = "0.0.0.0", port = as.numeric(Sys.getenv("PORT")), swagger = TRUE)

Its built on top of the example plumber script I have so endpoints at /plot and /hello too - I think it would be nice to make a PubSub target for it.

How would vetiver work within an api.R script?

This successfully deployed with this simple Docker - I guess in real life some more dependencies or renv: lockfiles could be involved.

FROM gcr.io/gcer-public/vetiver
COPY ["./", "./"]
ENTRYPOINT ["Rscript", "server.R"]

Example endpoint live at https://vetiver-ewjogewawq-ew.a.run.app/predict. This is on Cloud Run serverless, can take 80 connections per instance, scales up to millions.

Runs the example from the vetiver docs:

data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>% 
   slice_sample(n = 20) %>% 
   select(type, sqft, beds, baths)

endpoint <- vetiver::vetiver_endpoint("https://vetiver-ewjogewawq-ew.a.run.app/predict")
predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

In real life you could also add a build trigger for any changes to the R script the model is doing, to update the deployment as needed. With the pins integration calling outside services such as GCS, this would be needed less often.

The full setup script below:

library(parsnip)
library(workflows)
data(Sacramento, package = "modeldata")

rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths

rf_fit <-
  workflow(rf_form, rf_spec) %>%
  fit(Sacramento)

library(vetiver)
v <- vetiver_model(rf_fit, "sacramento_rf")

root <- file.path("inst","vetiver")

library(pins)
model_board <- board_folder(file.path(root,"plumber/pins"))
model_board %>% vetiver_pin_write(v)

library(googleCloudRunner)

# the docker takes a long time to install arrow so build it first to cache
repo <- cr_buildtrigger_repo("MarkEdmondson1234/googleCloudRunner",
                             branch = "vetiver")

#cr_buildtrigger_delete("docker-vetiver")
cr_deploy_docker_trigger(repo, "vetiver",
                         location = "inst/vetiver/docker/",
                         includedFiles = "inst/vetiver/**",
                         projectId_target = "gcer-public",
                         timeout = 3600)

# use the vetiver docker image built above to deploy a Cloud Run instance of the model
# deploys folder with api.R, Dockerfile, pins/ and server.R contained
run <- cr_deploy_plumber(file.path(root,"plumber"), remote = "vetiver")

# on succesful deployment
endpoint <- vetiver::vetiver_endpoint(paste0(run$status$url, "/predict"))
library(tidyverse)
data(Sacramento, package = "modeldata")
new_sac <- Sacramento %>%
  slice_sample(n = 20) %>%
  select(type, sqft, beds, baths)

predict(endpoint, new_sac)
# A tibble: 20 x 1
     .pred
     <dbl>
 1 236325.
 2 427492.
 3 417112.
 4 258001.
 5 339775.
...

MarkEdmondson1234 · 2022-01-08T22:18:58Z

Folder structure of working deployment here https://github.com/MarkEdmondson1234/googleCloudRunner/tree/vetiver/inst/vetiver

juliasilge · 2022-03-18T03:46:33Z

I've been working lately on generating Docker containers more, if you'd like to take a look and give any feedback. This demo might be helpful for how I am setting things up.

MarkEdmondson1234 · 2022-03-18T05:52:35Z

Thanks very much will take a look

MarkEdmondson1234 added a commit that referenced this issue Jan 8, 2022

start #163

f1724f4

MarkEdmondson1234 added a commit that referenced this issue Jan 8, 2022

working deployment #163

f328540

MarkEdmondson1234 mentioned this issue Jan 10, 2022

Integrations with other packages - request for help #164

Open

MarkEdmondson1234 mentioned this issue Jan 26, 2022

Global Environment Variables in R #169

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate integration with vetiver #163

Investigate integration with vetiver #163

MarkEdmondson1234 commented Jan 7, 2022

juliasilge commented Jan 7, 2022

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading

MarkEdmondson1234 commented Jan 8, 2022

juliasilge commented Jan 8, 2022

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading

MarkEdmondson1234 commented Jan 8, 2022

juliasilge commented Mar 18, 2022

MarkEdmondson1234 commented Mar 18, 2022

Investigate integration with vetiver #163

Investigate integration with vetiver #163

Comments

MarkEdmondson1234 commented Jan 7, 2022

juliasilge commented Jan 7, 2022

MarkEdmondson1234 commented Jan 8, 2022 • edited Loading

MarkEdmondson1234 commented Jan 8, 2022

juliasilge commented Jan 8, 2022

MarkEdmondson1234 commented Jan 8, 2022 • edited Loading

MarkEdmondson1234 commented Jan 8, 2022 • edited Loading

MarkEdmondson1234 commented Jan 8, 2022

juliasilge commented Mar 18, 2022

MarkEdmondson1234 commented Mar 18, 2022

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading

MarkEdmondson1234 commented Jan 8, 2022 •

edited

Loading