You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kserve CI workflow started failing recently due to a change introduced in this PR. Due to the new model parameter startup_timeout, starting the model server using any old snapshots leads to exceptions.
Error logs
Error trace:
2024-09-06T00:08:11,306 [INFO ] main org.pytorch.serve.ModelServer - Torchserve stopped.
java.lang.NullPointerException: Cannot invoke "com.google.gson.JsonElement.getAsInt()" because the return value of "com.google.gson.JsonObject.get(String)" is null
at org.pytorch.serve.wlm.Model.setModelState(Model.java:197)
at org.pytorch.serve.wlm.ModelManager.createModel(ModelManager.java:493)
at org.pytorch.serve.wlm.ModelManager.registerAndUpdateModel(ModelManager.java:98)
at org.pytorch.serve.snapshot.SnapshotManager.initModels(SnapshotManager.java:137)
at org.pytorch.serve.snapshot.SnapshotManager.restore(SnapshotManager.java:120)
at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:162)
at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:398)
at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:124)
at org.pytorch.serve.ModelServer.main(ModelServer.java:105)
INFO:root:Loading mnist .. 2 of 10 tries..
INFO:root:The model mnist is not ready
INFO:root:Sleep 30 seconds for load mnist..
🐛 Describe the bug
Kserve CI workflow started failing recently due to a change introduced in this PR. Due to the new model parameter
startup_timeout
, starting the model server using any old snapshots leads to exceptions.Error logs
Error trace:
Installation instructions
Ran CI locally on minikube: https://github.com/pytorch/serve/blob/master/.github/workflows/kserve_cpu_tests.yml
Model Packaging
https://github.com/pytorch/serve/blob/master/kubernetes/kserve/tests/configs/mnist_v2_cpu.yaml
config.properties
From
gs://kfserving-examples/models/torchserve/image_classifier/v2
Versions
Repro instructions
Ran
https://github.com/pytorch/serve/blob/master/.github/workflows/kserve_cpu_tests.yml
locallyPossible Solution
Short-term
gs://kfserving-examples/models/torchserve/**/config.properties
file to add the new parameter wherever server starts from a snapshotLong-term:
The text was updated successfully, but these errors were encountered: