Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to launch Triton Server with hps backend using latest HugeCTR and hugectr_backend repos #58

Closed
sezhiyanhari opened this issue Nov 20, 2023 · 4 comments

Comments

@sezhiyanhari
Copy link

sezhiyanhari commented Nov 20, 2023

Description
I'm unable to install and run the Triton server using the HPS backend.

Triton Information
Triton v23.06

To Reproduce
Steps to reproduce the behavior.

I'm following steps (1) and (2) here (https://github.com/triton-inference-server/hugectr_backend) under the Build the HPS Backend from Scratch section. I follow all the steps exactly.

I'm doing all the steps in a container built from this image (https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/dockerfile.ctr).

After trying to launch Triton using tritonserver --model-repository=/opt/hugectr_testing/data/test_dask/output/model_inference --backend-config=hps,ps=/opt/hugectr_testing/data/test_dask/output/model_inference/ps.json, I get the following:

I1120 07:09:59.322961 19420 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4fd6000000' with size 268435456
I1120 07:09:59.323460 19420 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1120 07:09:59.333655 19420 model_lifecycle.cc:462] loading: criteo:1
I1120 07:09:59.333798 19420 model_lifecycle.cc:462] loading: criteo_nvt:1
I1120 07:09:59.370486 19420 hps.cc:62] TRITONBACKEND_Initialize: hps
I1120 07:09:59.370512 19420 hps.cc:69] Triton TRITONBACKEND API version: 1.13
I1120 07:09:59.370519 19420 hps.cc:73] 'hps' TRITONBACKEND API version: 1.15
I1120 07:09:59.370536 19420 hps.cc:150] TRITONBACKEND_Backend Finalize: HPSBackend
E1120 07:09:59.370572 19420 model_lifecycle.cc:626] failed to load 'criteo' version 1: Unsupported: Triton backend API version does not support this backend
I1120 07:09:59.370602 19420 model_lifecycle.cc:753] failed to load 'criteo'
I1120 07:09:59.521670 19436 pb_stub.cc:255]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'hugectr'

At:
  /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

E1120 07:09:59.529317 19420 model_lifecycle.cc:626] failed to load 'criteo_nvt' version 1: Internal: ModuleNotFoundError: No module named 'hugectr'

At:
  /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

I1120 07:09:59.529369 19420 model_lifecycle.cc:753] failed to load 'criteo_nvt'
E1120 07:09:59.529473 19420 model_repository_manager.cc:562] Invalid argument: ensemble 'criteo_ens' depends on 'criteo' which has no loaded version. Model 'criteo' loading failed with error: version 1 is at UNAVAILABLE state: Unsupported: Triton backend API version does not support this backend;
I1120 07:09:59.529552 19420 server.cc:603] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1120 07:09:59.529642 19420 server.cc:630] 
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                                        |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1120 07:09:59.529735 19420 server.cc:673] 
+------------+---------+-------------------------------------------------------------------------------------------------+
| Model      | Version | Status                                                                                          |
+------------+---------+-------------------------------------------------------------------------------------------------+
| criteo     | 1       | UNAVAILABLE: Unsupported: Triton backend API version does not support this backend              |
| criteo_nvt | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'hugectr'                           |
|            |         |                                                                                                 |
|            |         | At:                                                                                             |
|            |         |   /opt/hugectr_testing/data/test_dask/output/model_inference/criteo_nvt/1/model.py(1): <module> |
|            |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed                                 |
|            |         |   <frozen importlib._bootstrap_external>(883): exec_module                                      |
|            |         |   <frozen importlib._bootstrap>(703): _load_unlocked                                            |
|            |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked                                  |
|            |         |   <frozen importlib._bootstrap>(1027): _find_and_load                                           |
+------------+---------+-------------------------------------------------------------------------------------------------+

I1120 07:09:59.575939 19420 metrics.cc:808] Collecting metrics for GPU 0: Tesla V100-SXM2-16GB
I1120 07:09:59.576319 19420 metrics.cc:701] Collecting CPU metrics
I1120 07:09:59.576525 19420 tritonserver.cc:2385] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                          |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                         |
| server_version                   | 2.35.0                                                                                                                                                                                                         |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace loggin |
|                                  | g                                                                                                                                                                                                              |
| model_repository_path[0]         | /opt/hugectr_testing/data/test_dask/output/model_inference                                                                                                                                                     |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                      |
| strict_model_config              | 0                                                                                                                                                                                                              |
| rate_limit                       | OFF                                                                                                                                                                                                            |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                      |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                            |
| strict_readiness                 | 1                                                                                                                                                                                                              |
| exit_timeout                     | 30                                                                                                                                                                                                             |
| cache_enabled                    | 0                                                                                                                                                                                                              |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1120 07:09:59.576581 19420 server.cc:304] Waiting for in-flight requests to complete.
I1120 07:09:59.576592 19420 server.cc:320] Timeout 30: Found 0 model versions that have in-flight inferences
I1120 07:09:59.576617 19420 server.cc:335] All models are stopped, unloading models
I1120 07:09:59.576634 19420 server.cc:342] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

I trained my model using this example notebook: https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb

However, this notebook came out before the HugeCTR backend was merged with the HPS backend. As a result, I needed to manually a line in my config.pbtxt to go from the hugectr to hps backend => backend: "hugectr" to backend: "hps".

Expected behavior
First, when building the inference version, I expect hugectr's Python module to be installed, but it isn't. This is weird because when I turn off the -DENABLE_INFERENCE=ON and install, import hugectr works.

Second, I expect the Triton server to start and accept requests.

@kthui
Copy link

kthui commented Nov 20, 2023

Hi @sezhiyanhari, I am moving this issue to the Hierarchical Parameter Server Backend team to better address this issue.

@kthui kthui transferred this issue from triton-inference-server/server Nov 20, 2023
@yingcanw
Copy link
Contributor

@sezhiyanhari According to the reply in NVIDIA-Merlin/HugeCTR#431 , the HugeCTR Triton Backend implementation has been completely removed from the current repo , so you may have misunderstood that it is not that the HugeCTR backend has been merged into the HPS backend, but that the HugeCTR Triton backend has been completely deprecated, that is the model trained through native HugeCTR(https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) no longer supports deployment in the Triton Server and we will delete the examples about HugeCTR model deployment in Merlin repo. Sorry again for the confusion.

So if you still need to use the HugeCTR backend to deploy the native HugeCTR model, you can only use the NGC image before 22.08. If you need to compile HugeCTR backend from scratch, please note that the Trtion REPO_TAG release version(-DTRITON_COMMON_REPO_TAG=<rxx.yy> -DTRITON_CORE_REPO_TAG=<rxx.yy> -DTRITON_BACKEND_REPO_TAG=<rxx.yy>) cannot exceed the Triton release version, otherwise there will be an error that Triton backend API version does not support this backend

In addition, you can also directly use TF or Pytorch training models to deploy on the Triton server through the HPS plug-ins:
HPS plugin for TensorFlow
HPS plugin for TensorRT
HPS plugin for Torch

@sezhiyanhari
Copy link
Author

sezhiyanhari commented Nov 22, 2023

@yingcanw got it, thank you for the explanation!

Do you know how I can rework this example (https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) so it's compatible for inference with the HPS backend?

I'm specifically trying to train the Criteo TB model using the DLRM architecture and doing inference on the trained model using HPS on an image where HPS is compiled from source. I'd also like for HugeCTR to be at the latest version possible.

@yingcanw
Copy link
Contributor

Do you know how I can rework this example (https://github.com/NVIDIA-Merlin/Merlin/blob/main/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb) so it's compatible for inference with the HPS backend?

We have completely stopped supporting the deployment of the native HugeCTR training model on triton server with HugeCTR backend. If you insist on using the Hugectr backend for deployment, the only way is to use an old version of the image or compile it from scratch to get the HugeCTR triton backend (before v23.08 ).

I'm specifically trying to train the Criteo TB model using the DLRM architecture and doing inference on the trained model using HPS on an image where HPS is compiled from source. I'd also like for HugeCTR to be at the latest version possible.

Because the HPS inference implementation is decoupled from the training implementation of native HugeCTR, you can use the latest HugeCTR code for model training. However, if you want to deploy on Triton Server, you can only compile the old version of HugeCTR Triton backend as described above.

Our recommended deployment solution is to use TRT Trtion backend, combined with HPS TRT plugins to deploy on Triton server. I don't think this will hinder your purpose of using the native HugeCTR to train a DLRM model using Criteo dataset, and deployment on Triton server. Please refer to the demo_for_hugectr_trained_model.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants