diff --git a/examples/README.md b/examples/README.md
index de7c285ce..35a344b67 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -39,7 +39,7 @@ These notebooks demonstrate how to scale NVTabular as well as the following:
- Use multiple GPUs and nodes with NVTabular for feature engineering.
- Train recommender system models with the Merlin Models for TensorFlow.
- Train recommender system models with HugeCTR using multiple GPUs.
-- Inference with the Triton Inference Server and Merlin Models for TensorFlow or HugeCTR.
+- Inference with the Triton Inference Server and Merlin Models for TensorFlow.
### [Training and Serving with Merlin on AWS SageMaker](./sagemaker-tensorflow/)
diff --git a/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb b/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
deleted file mode 100644
index 2f98e09c9..000000000
--- a/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
+++ /dev/null
@@ -1,559 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "d813a4ce",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Copyright 2021 NVIDIA Corporation. All Rights Reserved.\n",
- "#\n",
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# http://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License.\n",
- "# ===================================\n",
- "\n",
- "# Each user is responsible for checking the content of datasets and the\n",
- "# applicable licenses and determining if suitable for the intended use."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "260dbfff",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "# Serve Recommendations from the HugeCTR Model\n",
- "\n",
- "This notebook is created using the latest stable [merlin-hugectr](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr/tags) container. \n",
- "\n",
- "## Overview\n",
- "\n",
- "In this notebook, we will show how we do inference with our trained deep learning recommender model using Triton Inference Server. In this example, we deploy the NVTabular workflow and HugeCTR model with Triton Inference Server. We deploy them as an ensemble. For each request, Triton Inference Server will feed the input data through the NVTabular workflow and its output through the HugeCR model."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e0157e1c",
- "metadata": {},
- "source": [
- "## Getting Started"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "71304a10",
- "metadata": {},
- "source": [
- "We need to write configuration files with the stored model weights and model configuration.\n",
- "\n",
- "Let us first move all of our model files to a directory that we will be able to access from the scripts that we will generate."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "c2efc44e",
- "metadata": {},
- "outputs": [],
- "source": [
- "import os\n",
- "import json\n",
- "\n",
- "# path to preprocessed data\n",
- "INPUT_DATA_DIR = os.environ.get(\n",
- " \"INPUT_DATA_DIR\", os.path.expanduser(\"/workspace/nvt-examples/movielens/data/\")\n",
- ")\n",
- "\n",
- "# path to saved model\n",
- "MODEL_DIR = os.path.join(INPUT_DATA_DIR, \"model/movielens_hugectr\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "6a9fbb6d",
- "metadata": {
- "tags": [
- "flake8-noqa-cell"
- ]
- },
- "outputs": [],
- "source": [
- "file_to_write = \"\"\"\n",
- "name: \"movielens_hugectr\"\n",
- "backend: \"hugectr\"\n",
- "max_batch_size: 64\n",
- "input [\n",
- " {\n",
- " name: \"DES\"\n",
- " data_type: TYPE_FP32\n",
- " dims: [ -1 ]\n",
- " },\n",
- " {\n",
- " name: \"CATCOLUMN\"\n",
- " data_type: TYPE_INT64\n",
- " dims: [ -1 ]\n",
- " },\n",
- " {\n",
- " name: \"ROWINDEX\"\n",
- " data_type: TYPE_INT32\n",
- " dims: [ -1 ]\n",
- " }\n",
- "]\n",
- "output [\n",
- " {\n",
- " name: \"OUTPUT0\"\n",
- " data_type: TYPE_FP32\n",
- " dims: [ -1 ]\n",
- " }\n",
- "]\n",
- "instance_group [\n",
- " {\n",
- " count: 1\n",
- " kind : KIND_GPU\n",
- " gpus:[0]\n",
- " }\n",
- "]\n",
- "\n",
- "parameters [\n",
- " {\n",
- " key: \"config\"\n",
- " value: { string_value: \"$MODEL_DIR/1/movielens.json\" }\n",
- " },\n",
- " {\n",
- " key: \"gpucache\"\n",
- " value: { string_value: \"true\" }\n",
- " },\n",
- " {\n",
- " key: \"hit_rate_threshold\"\n",
- " value: { string_value: \"0.8\" }\n",
- " },\n",
- " {\n",
- " key: \"gpucacheper\"\n",
- " value: { string_value: \"0.5\" }\n",
- " },\n",
- " {\n",
- " key: \"label_dim\"\n",
- " value: { string_value: \"1\" }\n",
- " },\n",
- " {\n",
- " key: \"slots\"\n",
- " value: { string_value: \"3\" }\n",
- " },\n",
- " {\n",
- " key: \"cat_feature_num\"\n",
- " value: { string_value: \"4\" }\n",
- " },\n",
- " {\n",
- " key: \"des_feature_num\"\n",
- " value: { string_value: \"0\" }\n",
- " },\n",
- " {\n",
- " key: \"max_nnz\"\n",
- " value: { string_value: \"2\" }\n",
- " },\n",
- " {\n",
- " key: \"embedding_vector_size\"\n",
- " value: { string_value: \"16\" }\n",
- " },\n",
- " {\n",
- " key: \"embeddingkey_long_type\"\n",
- " value: { string_value: \"true\" }\n",
- " }\n",
- "]\n",
- "\"\"\"\n",
- "\n",
- "with open(os.path.join(MODEL_DIR, \"config.pbtxt\"), \"w\", encoding=\"utf-8\") as f:\n",
- " f.write(file_to_write.replace(\"$MODEL_DIR\", MODEL_DIR))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "0a23cb52",
- "metadata": {
- "tags": [
- "flake8-noqa-cell"
- ]
- },
- "outputs": [],
- "source": [
- "config = json.dumps(\n",
- "{\n",
- " \"supportlonglong\": True,\n",
- " \"models\": [\n",
- " {\n",
- " \"model\": \"movielens_hugectr\",\n",
- " \"sparse_files\": [f\"{MODEL_DIR}/0_sparse_1900.model\"],\n",
- " \"dense_file\": f\"{MODEL_DIR}/_dense_1900.model\",\n",
- " \"network_file\": f\"{MODEL_DIR}/1/movielens.json\",\n",
- " \"num_of_worker_buffer_in_pool\": \"1\",\n",
- " \"num_of_refresher_buffer_in_pool\": \"1\",\n",
- " \"cache_refresh_percentage_per_iteration\": \"0.2\",\n",
- " \"deployed_device_list\": [\"0\"],\n",
- " \"max_batch_size\": \"64\",\n",
- " \"default_value_for_each_table\": [\"0.0\",\"0.0\"],\n",
- " \"hit_rate_threshold\": \"0.9\",\n",
- " \"gpucacheper\": \"0.5\",\n",
- " \"maxnum_catfeature_query_per_table_per_sample\": [\"162542\", \"56632\",\"12\"],\n",
- " \"embedding_vecsize_per_table\": [\"16\",\"16\",\"16\"],\n",
- " \"gpucache\":\"true\"\n",
- " }\n",
- " ] \n",
- "})\n",
- "\n",
- "config = json.loads(config)\n",
- "with open(os.path.join(MODEL_DIR, \"ps.json\"), \"w\", encoding=\"utf-8\") as f:\n",
- " json.dump(config, f)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5eb3627f",
- "metadata": {},
- "source": [
- "Let's import required libraries."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "f5b54092",
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
- " from .autonotebook import tqdm as notebook_tqdm\n"
- ]
- }
- ],
- "source": [
- "import tritonclient.grpc as httpclient\n",
- "import cudf\n",
- "import numpy as np"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4e4592a9",
- "metadata": {},
- "source": [
- "### Load Models on Triton Server"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c6f50e9e",
- "metadata": {},
- "source": [
- "In the running docker container, you can start triton server with the command below:"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bc8aa849",
- "metadata": {},
- "source": [
- "```\n",
- "tritonserver --model-repository= --backend-config=hugectr,ps=/ps.json --model-control-mode=explicit\n",
- "```"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "a0626de4",
- "metadata": {},
- "source": [
- "Since we add `--model-control-mode=explicit` flag, the model wont be loaded at this step, we will load the model below."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9b7de550",
- "metadata": {},
- "source": [
- "Note: The model-repository path is `/root/nvt-examples/movielens/data/model`. The models haven't been loaded, yet. We can request triton server to load the saved ensemble. We initialize a triton client. The path for the json file is `/root/nvt-examples/movielens/data/model/movielens_hugectr/ps.json`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "a9d1c74a",
- "metadata": {},
- "outputs": [],
- "source": [
- "# disable warnings\n",
- "import warnings\n",
- "\n",
- "warnings.filterwarnings(\"ignore\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "f86290af",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "client created.\n"
- ]
- },
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.8/dist-packages/tritonhttpclient/__init__.py:31: DeprecationWarning: The package `tritonhttpclient` is deprecated and will be removed in a future version. Please use instead `tritonclient.http`\n",
- " warnings.warn(\n"
- ]
- }
- ],
- "source": [
- "import tritonhttpclient\n",
- "\n",
- "try:\n",
- " triton_client = tritonhttpclient.InferenceServerClient(url=\"localhost:8000\", verbose=True)\n",
- " print(\"client created.\")\n",
- "except Exception as e:\n",
- " print(\"channel creation failed: \" + str(e))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "a2a2bed5",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "GET /v2/health/live, headers None\n",
- "\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "True"
- ]
- },
- "execution_count": 8,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "triton_client.is_server_live()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "id": "dac3dd79",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "POST /v2/repository/index, headers None\n",
- "\n",
- "\n",
- "bytearray(b'[{\"name\":\"movielens_hugectr\"}]')\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "[{'name': 'movielens_hugectr'}]"
- ]
- },
- "execution_count": 9,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "triton_client.get_model_repository_index()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "23b2df62",
- "metadata": {},
- "source": [
- "Let's load our model to Triton Server."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "id": "2a1ec18b",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "POST /v2/repository/models/movielens_hugectr/load, headers None\n",
- "{}\n",
- "\n",
- "Loaded model 'movielens_hugectr'\n",
- "CPU times: user 3.99 ms, sys: 0 ns, total: 3.99 ms\n",
- "Wall time: 1.04 s\n"
- ]
- }
- ],
- "source": [
- "%%time\n",
- "\n",
- "triton_client.load_model(model_name=\"movielens_hugectr\")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "eec2d617",
- "metadata": {},
- "source": [
- "Let's send a request to Inference Server and print out the response. Since in our example above we do not have continuous columns, below our only inputs are categorical columns."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 11,
- "id": "8d78ad75",
- "metadata": {},
- "outputs": [],
- "source": [
- "file_to_write = f\"\"\"\n",
- "from tritonclient.utils import *\n",
- "import tritonclient.http as httpclient\n",
- "import numpy as np\n",
- "import pandas as pd\n",
- "import sys\n",
- "\n",
- "model_name = 'movielens_hugectr'\n",
- "CATEGORICAL_COLUMNS = [\"userId\", \"movieId\", \"genres\"]\n",
- "CONTINUOUS_COLUMNS = []\n",
- "LABEL_COLUMNS = ['label']\n",
- "emb_size_array = [162542, 29434, 20]\n",
- "shift = np.insert(np.cumsum(emb_size_array), 0, 0)[:-1]\n",
- "df = pd.read_parquet('{INPUT_DATA_DIR}/valid/part_0.parquet')\n",
- "test_df = df.head(10)\n",
- "\n",
- "rp_lst = [0]\n",
- "cur = 0\n",
- "for i in range(1, 31):\n",
- " if i % 3 == 0:\n",
- " cur += 2\n",
- " rp_lst.append(cur)\n",
- " else:\n",
- " cur += 1\n",
- " rp_lst.append(cur)\n",
- "\n",
- "with httpclient.InferenceServerClient(\"localhost:8000\") as client:\n",
- " test_df.iloc[:, :2] = test_df.iloc[:, :2] + shift[:2]\n",
- " test_df.iloc[:, 2] = test_df.iloc[:, 2].apply(lambda x: [e + shift[2] for e in x])\n",
- " embedding_columns = np.array([list(np.hstack(np.hstack(test_df[CATEGORICAL_COLUMNS].values)))], dtype='int64')\n",
- " dense_features = np.array([[]], dtype='float32')\n",
- " row_ptrs = np.array([rp_lst], dtype='int32')\n",
- "\n",
- " inputs = [httpclient.InferInput(\"DES\", dense_features.shape, np_to_triton_dtype(dense_features.dtype)),\n",
- " httpclient.InferInput(\"CATCOLUMN\", embedding_columns.shape, np_to_triton_dtype(embedding_columns.dtype)),\n",
- " httpclient.InferInput(\"ROWINDEX\", row_ptrs.shape, np_to_triton_dtype(row_ptrs.dtype))]\n",
- "\n",
- " inputs[0].set_data_from_numpy(dense_features)\n",
- " inputs[1].set_data_from_numpy(embedding_columns)\n",
- " inputs[2].set_data_from_numpy(row_ptrs)\n",
- " outputs = [httpclient.InferRequestedOutput(\"OUTPUT0\")]\n",
- "\n",
- " response = client.infer(model_name, inputs, request_id=str(1), outputs=outputs)\n",
- "\n",
- " result = response.get_response()\n",
- " print(result)\n",
- " print(\"Prediction Result:\")\n",
- " print(response.as_numpy(\"OUTPUT0\"))\n",
- "\"\"\"\n",
- "\n",
- "with open(\"wdl2predict.py\", \"w\", encoding=\"utf-8\") as f:\n",
- " f.write(file_to_write)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 12,
- "id": "339340c6",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1851: SettingWithCopyWarning: \r\n",
- "A value is trying to be set on a copy of a slice from a DataFrame.\r\n",
- "Try using .loc[row_indexer,col_indexer] = value instead\r\n",
- "\r\n",
- "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\r\n",
- " self._setitem_single_column(loc, val, pi)\r\n",
- "/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning: \r\n",
- "A value is trying to be set on a copy of a slice from a DataFrame.\r\n",
- "Try using .loc[row_indexer,col_indexer] = value instead\r\n",
- "\r\n",
- "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\r\n",
- " self._setitem_single_column(ilocs[0], value, pi)\r\n",
- "{'id': '1', 'model_name': 'movielens_hugectr', 'model_version': '1', 'parameters': {'NumSample': 10, 'DeviceID': 0}, 'outputs': [{'name': 'OUTPUT0', 'datatype': 'FP32', 'shape': [10], 'parameters': {'binary_data_size': 40}}]}\r\n",
- "Prediction Result:\r\n",
- "[0.5346206 0.49736455 0.2987379 0.6282493 0.7548654 0.59079504\r\n",
- " 0.55132014 0.90419775 0.47409508 0.5124942 ]\r\n"
- ]
- }
- ],
- "source": [
- "!python3 ./wdl2predict.py"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.10"
- },
- "merlin": {
- "containers": [
- "nvcr.io/nvidia/merlin/merlin-hugectr:latest"
- ]
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/examples/getting-started-movielens/README.md b/examples/getting-started-movielens/README.md
index d50ad3459..27a57452c 100644
--- a/examples/getting-started-movielens/README.md
+++ b/examples/getting-started-movielens/README.md
@@ -10,7 +10,6 @@ Most users are familiar with the dataset and we will teach the basic concepts of
- Use the Merlin Dataloader with PyTorch.
- Train a HugeCTR model.
- Serve recommendations from the Tensorflow model with the Triton Inference Server.
-- Serve recommendations from the HugeCTR model with the Triton Inference Server.
Explore the following notebooks:
@@ -20,4 +19,4 @@ Explore the following notebooks:
- [Training with PyTorch](03-Training-with-PyTorch.ipynb)
- [Training with HugeCTR](03-Training-with-HugeCTR.ipynb)
- [Serve Recommendations with Triton Inference Server (Tensorflow)](04-Triton-Inference-with-TF.ipynb)
-- [Serve Recommendations with Triton Inference Server (HugeCTR)](04-Triton-Inference-with-HugeCTR.ipynb)
+
diff --git a/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb b/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
deleted file mode 100644
index 4d6688a80..000000000
--- a/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
+++ /dev/null
@@ -1,639 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Copyright 2021 NVIDIA Corporation. All Rights Reserved.\n",
- "#\n",
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# http://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License.\n",
- "# ==============================================================================\n",
- "\n",
- "# Each user is responsible for checking the content of datasets and the\n",
- "# applicable licenses and determining if suitable for the intended use."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- "\n",
- "# Scaling Criteo: Triton Inference with HugeCTR\n",
- "\n",
- "This notebook is created using the latest stable [merlin-hugectr](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-hugectr/tags) container. \n",
- "\n",
- "## Overview\n",
- "\n",
- "The last step is to deploy the ETL workflow and saved model to production. In the production setting, we want to transform the input data as during training (ETL). We need to apply the same mean/std for continuous features and use the same categorical mapping to convert the categories to continuous integer before we use the deep learning model for a prediction. Therefore, we deploy the NVTabular workflow with the HugeCTR model as an ensemble model to Triton Inference. The ensemble model guarantees that the same transformation are applied to the raw inputs.\n",
- "\n",
- "\n",
- "\n",
- "### Learning objectives\n",
- "\n",
- "In this notebook, we learn how to deploy our models to production:\n",
- "\n",
- "- Use **NVTabular** to generate config and model files for Triton Inference Server\n",
- "- Deploy an ensemble of NVTabular workflow and HugeCTR model\n",
- "- Send example request to Triton Inference Server"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Deploying Ensemble to Triton Inference Server\n",
- "\n",
- "First, we need to generate the Triton Inference Server configurations and save the models in the correct format. In the previous notebooks [02-ETL-with-NVTabular](./02-ETL-with-NVTabular.ipynb) and [03-Training-with-HugeCTR](./03-Training-with-HugeCTR.ipynb) we saved the NVTabular workflow and HugeCTR model to disk. We will load them."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "After training terminates, we can see that two `.model` files are generated. We need to move them inside a temporary folder, like `criteo_hugectr/1`. Let's create these folders."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!\n",
- " warnings.warn(\"urllib3 ({}) or chardet ({}) doesn't match a supported \"\n"
- ]
- }
- ],
- "source": [
- "import os\n",
- "import glob\n",
- "import json\n",
- "\n",
- "import numpy as np\n",
- "import nvtabular as nvt\n",
- "import tritonclient.grpc as grpcclient\n",
- "\n",
- "from merlin.core.dispatch import get_lib\n",
- "from merlin.systems.triton import convert_df_to_triton_input\n",
- "from nvtabular.inference.triton import export_hugectr_ensemble\n",
- "\n",
- "BASE_DIR = os.environ.get(\"BASE_DIR\", \"/raid/data/criteo\")\n",
- "OUTPUT_DATA_DIR = os.environ.get(\"OUTPUT_DATA_DIR\", BASE_DIR + \"/test_dask/output\")\n",
- "original_data_path = os.environ.get(\"INPUT_FOLDER\", BASE_DIR + \"/converted/criteo\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Now we move our saved `.model` files inside 1 folder. We use only the last snapshot after `9600` iterations."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "metadata": {},
- "outputs": [],
- "source": [
- "os.system(\"mv *9600.model \" + os.path.join(OUTPUT_DATA_DIR, \"criteo_hugectr/1/\"))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We need to load the NVTabular workflow first"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "metadata": {},
- "outputs": [],
- "source": [
- "workflow = nvt.Workflow.load(os.path.join(OUTPUT_DATA_DIR, \"workflow\"))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Let's clear the directory"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "metadata": {},
- "outputs": [],
- "source": [
- "os.system(\"rm -rf \" + os.path.join(OUTPUT_DATA_DIR, \"model_inference\"))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### Export artifacts\n",
- "\n",
- "Now, we can save our models for use later during the inference stage. To do so we use export_hugectr_ensemble method below. With this method, we can generate the `config.pbtxt` files automatically for each model.
\n",
- "The script below creates an ensemble triton server model where\n",
- "- workflow is the the nvtabular workflow used in preprocessing,\n",
- "- hugectr_model_path is the HugeCTR model that should be served. This path includes the model files.\n",
- "- name is the base name of the various triton models.\n",
- "- output_path is the path where is model will be saved to.\n",
- "- cats are the categorical column names\n",
- "- conts are the continuous column names"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "metadata": {},
- "outputs": [],
- "source": [
- "hugectr_params = dict()\n",
- "# Config File in the final directory for serving\n",
- "hugectr_params[\"config\"] = os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"criteo/1/criteo.json\")\n",
- "hugectr_params[\"slots\"] = 26\n",
- "hugectr_params[\"max_nnz\"] = 1\n",
- "hugectr_params[\"embedding_vector_size\"] = 128\n",
- "hugectr_params[\"n_outputs\"] = 1\n",
- "export_hugectr_ensemble(\n",
- " workflow=workflow,\n",
- " # Current directory with model weights and config file\n",
- " hugectr_model_path=os.path.join(OUTPUT_DATA_DIR, \"criteo_hugectr/1/\"),\n",
- " hugectr_params=hugectr_params,\n",
- " name=\"criteo\",\n",
- " # Base directory for serving\n",
- " output_path=os.path.join(OUTPUT_DATA_DIR, \"model_inference\"),\n",
- " label_columns=[\"label\"],\n",
- " cats=[\"C\" + str(x) for x in range(1, 27)],\n",
- " conts=[\"I\" + str(x) for x in range(1, 14)],\n",
- " max_batch_size=64,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We can take a look at the generated files."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\u001b[01;34m/tmp/test_merlin_criteo_hugectr/output/criteo//model_inference\u001b[00m\r\n",
- "├── \u001b[01;34mcriteo\u001b[00m\r\n",
- "│ ├── \u001b[01;34m1\u001b[00m\r\n",
- "│ │ ├── \u001b[01;34m0_sparse_9600.model\u001b[00m\r\n",
- "│ │ │ ├── emb_vector\r\n",
- "│ │ │ ├── key\r\n",
- "│ │ │ └── slot_id\r\n",
- "│ │ ├── _dense_9600.model\r\n",
- "│ │ ├── _opt_dense_9600.model\r\n",
- "│ │ └── criteo.json\r\n",
- "│ └── config.pbtxt\r\n",
- "├── \u001b[01;34mcriteo_ens\u001b[00m\r\n",
- "│ ├── \u001b[01;34m1\u001b[00m\r\n",
- "│ └── config.pbtxt\r\n",
- "├── \u001b[01;34mcriteo_nvt\u001b[00m\r\n",
- "│ ├── \u001b[01;34m1\u001b[00m\r\n",
- "│ │ ├── \u001b[01;34m__pycache__\u001b[00m\r\n",
- "│ │ │ └── model.cpython-38.pyc\r\n",
- "│ │ ├── model.py\r\n",
- "│ │ └── \u001b[01;34mworkflow\u001b[00m\r\n",
- "│ │ ├── \u001b[01;34mcategories\u001b[00m\r\n",
- "│ │ │ ├── unique.C1.parquet\r\n",
- "│ │ │ ├── unique.C10.parquet\r\n",
- "│ │ │ ├── unique.C11.parquet\r\n",
- "│ │ │ ├── unique.C12.parquet\r\n",
- "│ │ │ ├── unique.C13.parquet\r\n",
- "│ │ │ ├── unique.C14.parquet\r\n",
- "│ │ │ ├── unique.C15.parquet\r\n",
- "│ │ │ ├── unique.C16.parquet\r\n",
- "│ │ │ ├── unique.C17.parquet\r\n",
- "│ │ │ ├── unique.C18.parquet\r\n",
- "│ │ │ ├── unique.C19.parquet\r\n",
- "│ │ │ ├── unique.C2.parquet\r\n",
- "│ │ │ ├── unique.C20.parquet\r\n",
- "│ │ │ ├── unique.C21.parquet\r\n",
- "│ │ │ ├── unique.C22.parquet\r\n",
- "│ │ │ ├── unique.C23.parquet\r\n",
- "│ │ │ ├── unique.C24.parquet\r\n",
- "│ │ │ ├── unique.C25.parquet\r\n",
- "│ │ │ ├── unique.C26.parquet\r\n",
- "│ │ │ ├── unique.C3.parquet\r\n",
- "│ │ │ ├── unique.C4.parquet\r\n",
- "│ │ │ ├── unique.C5.parquet\r\n",
- "│ │ │ ├── unique.C6.parquet\r\n",
- "│ │ │ ├── unique.C7.parquet\r\n",
- "│ │ │ ├── unique.C8.parquet\r\n",
- "│ │ │ └── unique.C9.parquet\r\n",
- "│ │ ├── metadata.json\r\n",
- "│ │ └── workflow.pkl\r\n",
- "│ └── config.pbtxt\r\n",
- "└── ps.json\r\n",
- "\r\n",
- "10 directories, 40 files\r\n"
- ]
- }
- ],
- "source": [
- "!tree $OUTPUT_DATA_DIR/model_inference"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We need to write a configuration file with the stored model weights and model configuration."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "metadata": {
- "tags": [
- "flake8-noqa-cell"
- ]
- },
- "outputs": [],
- "source": [
- "config = json.dumps(\n",
- "{\n",
- " \"supportlonglong\": \"true\",\n",
- " \"models\": [\n",
- " {\n",
- " \"model\": \"criteo\",\n",
- " \"sparse_files\": [os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"criteo/1/0_sparse_9600.model\")],\n",
- " \"dense_file\": os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"criteo/1/_dense_9600.model\"),\n",
- " \"network_file\": os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"criteo/1/criteo.json\"),\n",
- " \"max_batch_size\": \"64\",\n",
- " \"gpucache\": \"true\",\n",
- " \"hit_rate_threshold\": \"0.9\",\n",
- " \"gpucacheper\": \"0.5\",\n",
- " \"num_of_worker_buffer_in_pool\": \"4\",\n",
- " \"num_of_refresher_buffer_in_pool\": \"1\",\n",
- " \"cache_refresh_percentage_per_iteration\": 0.2,\n",
- " \"deployed_device_list\": [\"0\"],\n",
- " \"default_value_for_each_table\": [\"0.0\", \"0.0\"],\n",
- " \"maxnum_catfeature_query_per_table_per_sample\": [2, 26],\n",
- " \"embedding_vecsize_per_table\": [16 for x in range(26)],\n",
- " }\n",
- " ],\n",
- "}\n",
- ")\n",
- "\n",
- "config = json.loads(config)\n",
- "with open(os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"ps.json\"), \"w\", encoding=\"utf-8\") as f:\n",
- " json.dump(config, f)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### Start Triton Inference Server\n",
- "\n",
- "After we export the ensemble, we are ready to start the Triton Inference Server. The server is installed in the merlin-tensorflow-container. If you are not using one of our containers, then ensure it is installed in your environment. For more information, see the Triton Inference Server [documentation](https://github.com/triton-inference-server/server/blob/r22.03/README.md#documentation). \n",
- "\n",
- "You can start the server by running the following command:\n",
- "\n",
- "```shell\n",
- "tritonserver --model-repository= --backend-config=hugectr,ps=\n",
- "```\n",
- "\n",
- "For the `--model-repository` argument, specify the same value as `os.path.join(OUTPUT_DATA_DIR, \"model_inference\"` that you specified previously in `export_hugectr_ensemble` for `output_path`.\n",
- "For `ps=` argument, specify the same value as `os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"ps.json)` the file for ps.json."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "/tmp/test_merlin_criteo_hugectr/output/criteo/model_inference\n",
- "/tmp/test_merlin_criteo_hugectr/output/criteo/model_inference/ps.json\n"
- ]
- }
- ],
- "source": [
- "print(os.path.join(OUTPUT_DATA_DIR, \"model_inference\"))\n",
- "print(os.path.join(OUTPUT_DATA_DIR, \"model_inference\", \"ps.json\"))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Get prediction from Triton Inference Server\n",
- "\n",
- "We have saved the models for Triton Inference Server. We started Triton Inference Server and the models are loaded. Now, we can send raw data as a request and receive the predictions.\n",
- "\n",
- "We read 3 example rows from the last parquet file from the raw data. We drop the target column, `label`, from the dataframe, as the information is not available at inference time."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 30,
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " C1 | \n",
- " C2 | \n",
- " C3 | \n",
- " C4 | \n",
- " C5 | \n",
- " C6 | \n",
- " C7 | \n",
- " C8 | \n",
- " C9 | \n",
- " C10 | \n",
- " ... | \n",
- " I4 | \n",
- " I5 | \n",
- " I6 | \n",
- " I7 | \n",
- " I8 | \n",
- " I9 | \n",
- " I10 | \n",
- " I11 | \n",
- " I12 | \n",
- " I13 | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 70000 | \n",
- " 2714039 | \n",
- " 29401 | \n",
- " 11464 | \n",
- " 1122 | \n",
- " 9355 | \n",
- " 2 | \n",
- " 6370 | \n",
- " 1010 | \n",
- " 37 | \n",
- " 1865651 | \n",
- " ... | \n",
- " 0.208215 | \n",
- " 0.952671 | \n",
- " 0.955872 | \n",
- " 0.944922 | \n",
- " 0.139380 | \n",
- " 0.994092 | \n",
- " 0.056103 | \n",
- " 0.547473 | \n",
- " 0.709442 | \n",
- " 0.930728 | \n",
- "
\n",
- " \n",
- " 70001 | \n",
- " 3514299 | \n",
- " 27259 | \n",
- " 8072 | \n",
- " 395 | \n",
- " 9361 | \n",
- " 1 | \n",
- " 544 | \n",
- " 862 | \n",
- " 11 | \n",
- " 3292987 | \n",
- " ... | \n",
- " 0.171709 | \n",
- " 0.759526 | \n",
- " 0.795019 | \n",
- " 0.716366 | \n",
- " 0.134964 | \n",
- " 0.516737 | \n",
- " 0.065577 | \n",
- " 0.129782 | \n",
- " 0.471361 | \n",
- " 0.386101 | \n",
- "
\n",
- " \n",
- " 70002 | \n",
- " 1304577 | \n",
- " 5287 | \n",
- " 7367 | \n",
- " 2033 | \n",
- " 2899 | \n",
- " 2 | \n",
- " 712 | \n",
- " 640 | \n",
- " 36 | \n",
- " 6415968 | \n",
- " ... | \n",
- " 0.880028 | \n",
- " 0.347701 | \n",
- " 0.207892 | \n",
- " 0.753950 | \n",
- " 0.371013 | \n",
- " 0.759502 | \n",
- " 0.201477 | \n",
- " 0.192447 | \n",
- " 0.085893 | \n",
- " 0.957961 | \n",
- "
\n",
- " \n",
- "
\n",
- "
3 rows × 39 columns
\n",
- "
"
- ],
- "text/plain": [
- " C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 ... \\\n",
- "70000 2714039 29401 11464 1122 9355 2 6370 1010 37 1865651 ... \n",
- "70001 3514299 27259 8072 395 9361 1 544 862 11 3292987 ... \n",
- "70002 1304577 5287 7367 2033 2899 2 712 640 36 6415968 ... \n",
- "\n",
- " I4 I5 I6 I7 I8 I9 I10 \\\n",
- "70000 0.208215 0.952671 0.955872 0.944922 0.139380 0.994092 0.056103 \n",
- "70001 0.171709 0.759526 0.795019 0.716366 0.134964 0.516737 0.065577 \n",
- "70002 0.880028 0.347701 0.207892 0.753950 0.371013 0.759502 0.201477 \n",
- "\n",
- " I11 I12 I13 \n",
- "70000 0.547473 0.709442 0.930728 \n",
- "70001 0.129782 0.471361 0.386101 \n",
- "70002 0.192447 0.085893 0.957961 \n",
- "\n",
- "[3 rows x 39 columns]"
- ]
- },
- "execution_count": 30,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
- "source": [
- "df_lib = get_lib()\n",
- "input_cols = workflow.input_schema.column_names\n",
- "# read in data for request\n",
- "data = df_lib.read_parquet(\n",
- " os.path.join(sorted(glob.glob(original_data_path + \"/*.parquet\"))[-1]),\n",
- " columns=input_cols\n",
- ")\n",
- "batch = data[:3]\n",
- "batch = batch[[x for x in batch.columns if x not in ['label']]]\n",
- "batch"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We generate a Triton Inference Server request object. \n",
- "\n",
- "Currently, `NA` and `None` values are not supported for `int32` columns. As a workaround, we will `NA` values with `0`. The output of the HugeCTR model is called `OUTPUT0`. For the same reason of dropping the target column, we need to remove it from the input schema, as well."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 31,
- "metadata": {},
- "outputs": [],
- "source": [
- "input_schema = workflow.input_schema.remove_col('label')\n",
- "inputs = convert_df_to_triton_input(\n",
- " input_schema, \n",
- " batch.fillna(0), \n",
- " grpcclient.InferInput\n",
- ")\n",
- "output_cols = ['OUTPUT0']\n",
- "outputs = [\n",
- " grpcclient.InferRequestedOutput(col)\n",
- " for col in output_cols\n",
- "]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We send the request to Triton Inference Server."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 29,
- "metadata": {},
- "outputs": [],
- "source": [
- "# send request to tritonserver\n",
- "with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n",
- " response = client.infer(\"criteo_ens\", inputs, request_id=\"1\", outputs=outputs)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We print out the predictions. The outputs are the probability scores, predicted by our model, how likely the ad will be clicked."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 35,
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "OUTPUT0 [0.52164096 0.50390565 0.4957397 ] (3,)\n"
- ]
- }
- ],
- "source": [
- "for col in output_cols:\n",
- " print(col, response[col], response[col].shape)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Summary\n",
- "\n",
- "In this example, we deployed a recommender system pipeline as an ensemble. First, NVTabular created features and afterwards, HugeCTR predicted the processed data. This process ensures that the training and production environments use the same feature engineering.\n",
- "\n",
- "## Next steps\n",
- "\n",
- "There is more detailed information in the [API documentation](https://nvidia-merlin.github.io/HugeCTR/stable/hugectr_user_guide.html) and [more examples](https://nvidia-merlin.github.io/HugeCTR/stable/notebooks/index.html) in the [HugeCTR repository](https://github.com/NVIDIA-Merlin/HugeCTR)."
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.10"
- },
- "merlin": {
- "containers": [
- "nvcr.io/nvidia/merlin/merlin-hugectr:latest"
- ]
- }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/examples/scaling-criteo/README.md b/examples/scaling-criteo/README.md
index feed55e04..c76121385 100644
--- a/examples/scaling-criteo/README.md
+++ b/examples/scaling-criteo/README.md
@@ -7,7 +7,7 @@ We demonstrate how to scale NVTabular, as well as:
- Use multiple GPUs and nodes with NVTabular for feature engineering.
- Train recommender system models with the Merlin Models for TensorFlow.
- Train recommender system models with HugeCTR using multiple GPUs.
-- Inference with the Triton Inference Server and Merlin Models for TensorFlow or HugeCTR.
+- Inference with the Triton Inference Server and Merlin Models for TensorFlow.
Our recommendation is to use our latest stable [Merlin containers](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=merlin) for the examples. Each notebook provides the required container.
@@ -19,8 +19,7 @@ Training and Deployment with **TensorFlow**:
- [Training with TensorFlow](03-Training-with-Merlin-Models-TensorFlow.ipynb)
- [Deploy the TensorFlow Model with Triton Inference Server](04-Triton-Inference-with-Merlin-Models-TensorFlow.ipynb)
-Training and Deployment with **HugeCTR**:
+Training with **HugeCTR**:
- [Download and Convert](01-Download-Convert.ipynb)
- [Feature Engineering with NVTabular](02-ETL-with-NVTabular.ipynb)
- [Training with HugeCTR](03-Training-with-HugeCTR.ipynb)
-- [Deploy the HugeCTR Model with Triton Inference Server](04-Triton-Inference-with-HugeCTR.ipynb)