Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

sezhiyanhari · 2023-11-20T07:22:48Z

Description
I'm unable to install and run the Triton server using the HPS backend.

Triton Information
Triton v23.06

To Reproduce
Steps to reproduce the behavior.

I'm following steps (1) and (2) here (https://github.com/triton-inference-server/hugectr_backend) under the Build the HPS Backend from Scratch section. I follow all the steps exactly.

I'm doing all the steps in a container built from this image (https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr).

After trying to launch Triton using tritonserver --model-repository=/opt/hugectr_testing/data/test_dask/output/model_inference --backend-config=hps,ps=/opt/hugectr_testing/data/test_dask/output/model_inference/ps.json, I get the following:

I1120 07:09:59.322961 19420 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4fd6000000' with size 268435456
I1120 07:09:59.323460 19420 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1120 07:09:59.333655 19420 model_lifecycle.cc:462] loading: criteo:1
I1120 07:09:59.333798 19420 model_lifecycle.cc:462] loading: criteo_nvt:1
I1120 07:09:59.370486 19420 hps.cc:62] TRITONBACKEND_Initialize: hps
I1120 07:09:59.370512 19420 hps.cc:69] Triton TRITONBACKEND API version: 1.13
I1120 07:09:59.370519 19420 hps.cc:73] 'hps' TRITONBACKEND API version: 1.15
I1120 07:09:59.370536 19420 hps.cc:150] TRITONBACKEND_Backend Finalize: HPSBackend
E1120 07:09:59.370572 19420 model_lifecycle.cc:626] failed to load 'criteo' version 1: Unsupported: Triton backend API version does not support this backend
I1120 07:09:59.370602 19420 model_lifecycle.cc:753] failed to load 'criteo'
I1120 07:09:59.521670 19436 pb_stub.cc:255]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'hugectr'

At:
  /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

E1120 07:09:59.529317 19420 model_lifecycle.cc:626] failed to load 'criteo_nvt' version 1: Internal: ModuleNotFoundError: No module named 'hugectr'

At:
  /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

I1120 07:09:59.529369 19420 model_lifecycle.cc:753] failed to load 'criteo_nvt'
E1120 07:09:59.529473 19420 model_repository_manager.cc:562] Invalid argument: ensemble 'criteo_ens' depends on 'criteo' which has no loaded version. Model 'criteo' loading failed with error: version 1 is at UNAVAILABLE state: Unsupported: Triton backend API version does not support this backend;
I1120 07:09:59.529552 19420 server.cc:603] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1120 07:09:59.529642 19420 server.cc:630] 
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                                        |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1120 07:09:59.529735 19420 server.cc:673] 
+------------+---------+-------------------------------------------------------------------------------------------------+
| Model      | Version | Status                                                                                          |
+------------+---------+-------------------------------------------------------------------------------------------------+
| criteo     | 1       | UNAVAILABLE: Unsupported: Triton backend API version does not support this backend              |
| criteo_nvt | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'hugectr'                           |
|            |         |                                                                                                 |
|            |         | At:                                                                                             |
|            |         |   /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module> |
|            |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                 |
|            |         |   <frozen importlib._bootstrap_external>(883): exec_module                                      |
|            |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                            |
|            |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                  |
|            |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                           |
+------------+---------+-------------------------------------------------------------------------------------------------+

I1120 07:09:59.575939 19420 metrics.cc:808] Collecting metrics for GPU 0: Tesla V100-SXM2-16GB
I1120 07:09:59.576319 19420 metrics.cc:701] Collecting CPU metrics
I1120 07:09:59.576525 19420 tritonserver.cc:2385] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                          |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                         |
| server_version                   | 2.35.0                                                                                                                                                                                                         |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace loggin |
|                                  | g                                                                                                                                                                                                              |
| model_repository_path[0]         | /opt/hugectr_testing/data/test_dask/output/model_inference                                                                                                                                                     |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                      |
| strict_model_config              | 0                                                                                                                                                                                                              |
| rate_limit                       | OFF                                                                                                                                                                                                            |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                      |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                            |
| strict_readiness                 | 1                                                                                                                                                                                                              |
| exit_timeout                     | 30                                                                                                                                                                                                             |
| cache_enabled                    | 0                                                                                                                                                                                                              |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1120 07:09:59.576581 19420 server.cc:304] Waiting for in-flight requests to complete.
I1120 07:09:59.576592 19420 server.cc:320] Timeout 30: Found 0 model versions that have in-flight inferences
I1120 07:09:59.576617 19420 server.cc:335] All models are stopped, unloading models
I1120 07:09:59.576634 19420 server.cc:342] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

I trained my model using this example notebook: https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb

However, this notebook came out before the HugeCTR backend was merged with the HPS backend. As a result, I needed to manually a line in my config.pbtxt to go from the hugectr to hps backend => backend: "hugectr" to backend: "hps".

Expected behavior
First, when building the inference version, I expect hugectr's Python module to be installed, but it isn't. This is weird because when I turn off the -DENABLE_INFERENCE=ON and install, import hugectr works.

Second, I expect the Triton server to start and accept requests.

The text was updated successfully, but these errors were encountered:

kthui · 2023-11-20T20:59:09Z

Hi @sezhiyanhari, I am moving this issue to the Hierarchical Parameter Server Backend team to better address this issue.

yingcanw · 2023-11-21T07:41:58Z

@sezhiyanhari According to the reply in NVIDIA-Merlin/HugeCTR#431 , the HugeCTR Triton Backend implementation has been completely removed from the current repo , so you may have misunderstood that it is not that the HugeCTR backend has been merged into the HPS backend, but that the HugeCTR Triton backend has been completely deprecated, that is the model trained through native HugeCTR(https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) no longer supports deployment in the Triton Server and we will delete the examples about HugeCTR model deployment in Merlin repo. Sorry again for the confusion.

So if you still need to use the HugeCTR backend to deploy the native HugeCTR model, you can only use the NGC image before 22.08. If you need to compile HugeCTR backend from scratch, please note that the Trtion REPO_TAG release version(-DTRITON_COMMON_REPO_TAG=<rxx.yy> -DTRITON_CORE_REPO_TAG=<rxx.yy> -DTRITON_BACKEND_REPO_TAG=<rxx.yy>) cannot exceed the Triton release version, otherwise there will be an error that Triton backend API version does not support this backend

In addition, you can also directly use TF or Pytorch training models to deploy on the Triton server through the HPS plug-ins:
HPS plugin for TensorFlow
HPS plugin for TensorRT
HPS plugin for Torch

sezhiyanhari · 2023-11-22T16:42:14Z

@yingcanw got it, thank you for the explanation!

Do you know how I can rework this example (https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) so it's compatible for inference with the HPS backend?

I'm specifically trying to train the Criteo TB model using the DLRM architecture and doing inference on the trained model using HPS on an image where HPS is compiled from source. I'd also like for HugeCTR to be at the latest version possible.

yingcanw · 2023-11-23T02:43:15Z

Do you know how I can rework this example (https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) so it's compatible for inference with the HPS backend?

We have completely stopped supporting the deployment of the native HugeCTR training model on triton server with HugeCTR backend. If you insist on using the Hugectr backend for deployment, the only way is to use an old version of the image or compile it from scratch to get the HugeCTR triton backend (before v23.08 ).

I'm specifically trying to train the Criteo TB model using the DLRM architecture and doing inference on the trained model using HPS on an image where HPS is compiled from source. I'd also like for HugeCTR to be at the latest version possible.

Because the HPS inference implementation is decoupled from the training implementation of native HugeCTR, you can use the latest HugeCTR code for model training. However, if you want to deploy on Triton Server, you can only compile the old version of HugeCTR Triton backend as described above.

Our recommended deployment solution is to use TRT Trtion backend, combined with HPS TRT plugins to deploy on Triton server. I don't think this will hinder your purpose of using the native HugeCTR to train a DLRM model using Criteo dataset, and deployment on Triton server. Please refer to the demo_for_hugectr_trained_model.ipynb

kthui transferred this issue from triton-inference-server/server Nov 20, 2023

yingcanw closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

sezhiyanhari commented Nov 20, 2023 •

edited

Loading

kthui commented Nov 20, 2023

yingcanw commented Nov 21, 2023

sezhiyanhari commented Nov 22, 2023 •

edited

Loading

yingcanw commented Nov 23, 2023

Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

Comments

sezhiyanhari commented Nov 20, 2023 • edited Loading

kthui commented Nov 20, 2023

yingcanw commented Nov 21, 2023

sezhiyanhari commented Nov 22, 2023 • edited Loading

yingcanw commented Nov 23, 2023

sezhiyanhari commented Nov 20, 2023 •

edited

Loading

sezhiyanhari commented Nov 22, 2023 •

edited

Loading