modify the quick start

torchpipe · Sep 5, 2023 · 19a9b53 · 19a9b53
1 parent 5c9320e
commit 19a9b53
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 36 deletions.
diff --git a/docs/quick_start_new_user.md b/docs/quick_start_new_user.md
@@ -7,7 +7,7 @@ type: explainer
 
 # Trial in 30mins(new users)
 
-TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and OpenCV) and RPC frameworks. It guarantees high service throughput while meeting latency requirements. This document is mainly for new users, that is, users who are in the introductory stage of acceleration-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of torchpipe for accelerating service deployment, complemented by performance and effect comparisons.
+TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and OpenCV) and RPC frameworks. It guarantees high service throughput while meeting latency requirements. This document is mainly for new users, that is, users who are in the introductory stage of acceleration-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of torchpipe for accelerating service deployment, complemented by performance and effect comparisons. The complete code of this document can be found at [resnet50_thrift](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/)。
 
 ## Catalogue
 * [1. Basic knowledge](#1)
@@ -74,24 +74,16 @@ img = precls_trans(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), (224,224)))
 3、TensorRT acceleration
 
 ```py
-def load_classifier(net, max_batch_size,fp16):
-    x = torch.ones((1, 3, 224, 224))
-    if device == 'gpu':
-        x = x.cuda()
-        net.cuda()
-    net.eval()
-    trtmodel = torch2trt(net,
-                        [x], 
-                        fp16_mode = fp16,
-                        max_batch_size=max_batch_size,
-                        max_workspace_size=32 * max_batch_size) 
-    del x
-    del net
-    return trtmodel
+
+input_shape = torch.ones((1, 3, 224, 224)).cuda()
+self.classification_engine = torch2trt(resnet50, [input_shape], 
+                            fp16_mode=self.fp16,
+                            max_batch_size=self.cls_trt_max_batchsize,
+                            )
 
 ```
 
-The overall online service deployment can be found at [main_trt.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_trt.py)
+The overall online service deployment can be found at [main_trt.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/main_trt.py)
 
 :::tip
 Since TensorRT is not thread-safe, when using this method for model acceleration, it is necessary to handle locking (with self.lock:) during the service deployment process.
@@ -111,7 +103,7 @@ From the above process, it's clear that when accelerating a single model, the fo
 
 ![](images/quick_start_new_user/torchpipe_en.png)
 
-We've made adjustments to the deployment of our service using TorchPipe.The overall online service deployment can be found at [main_torchpipe.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_torchpipe.py).
+We've made adjustments to the deployment of our service using TorchPipe.The overall online service deployment can be found at [main_torchpipe.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/main_torchpipe.py).
 The core function modifications as follows:
 
 ```py
@@ -227,7 +219,7 @@ std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
 `python clien_qps.py --img_dir /your/testimg/path/ --port 8888 --request_client 20 --request_batch 1
 `
 
-The specific test code can be found at [client_qps.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/client_qps.py)
+The specific test code can be found at [client_qps.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/client_qps.py)
 
 With the same Thrift service interface, testing on a machine with NIDIA-3080 GPU, 36-core CPU, and concurrency of 10, we have the following results:
 

diff --git a/i18n/zh/docusaurus-plugin-content-docs/current/quick_start_new_user.md b/i18n/zh/docusaurus-plugin-content-docs/current/quick_start_new_user.md
@@ -7,7 +7,8 @@ type: explainer
 
 # torchpipe快速上手（30min体验版）
 
-torchpipe是为工业界所准备的一个独立作用于底层加速库（如tensorrt，opencv，torchscript）以及 RPC（如thrift, gRPC）之间的多实例流水线并行库，助力使用者能在部署阶段节约更多的硬件资源，帮助产品应用落地。此教程主要针对初级用户，即对于加速相关的理论知识处于入门阶段，具有一定的 Python基础，能够阅读简单代码的用户。此内容主要包括使用torchpipe进行服务部署加速的使用方法、性能和效果差异对比等。
+torchpipe是为工业界所准备的一个独立作用于底层加速库（如tensorrt，opencv，torchscript）以及 RPC（如thrift, gRPC）之间的多实例流水线并行库，助力使用者能在部署阶段节约更多的硬件资源，帮助产品应用落地。此教程主要针对初级用户，即对于加速相关的理论知识处于入门阶段，具有一定的 Python基础，能够阅读简单代码的用户。此内容主要包括使用torchpipe进行服务部署加速的使用方法、性能和效果差异对比等。本文档的完整代码见可详见[resnet50_thrift](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/)。
+
 
 
 ## 目录
@@ -29,6 +30,7 @@ torchpipe是为工业界所准备的一个独立作用于底层加速库（如te
 
 我们对模型部署中需要了解到的一些概念做一些简单的解释，希望对初次体验torchpipe的你有所帮助，可详见[预备知识](./preliminaries)。
 
+
 <a name='2'></a>
 
 ## 2.   环境安装与配置
@@ -75,26 +77,18 @@ img = precls_trans(cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), (224,224)))
 3、模型TensorRT加速
 
 ```py
-def load_classifier(net, max_batch_size,fp16):
-    x = torch.ones((1, 3, 224, 224))
-    if device == 'gpu':
-        x = x.cuda()
-        net.cuda()
-    net.eval()
-    trtmodel = torch2trt(net,
-                        [x], 
-                        fp16_mode = fp16,
-                        max_batch_size=max_batch_size,
-                        max_workspace_size=32 * max_batch_size) 
-    del x
-    del net
-    return trtmodel
+
+input_shape = torch.ones((1, 3, 224, 224)).cuda()
+self.classification_engine = torch2trt(resnet50, [input_shape], 
+                            fp16_mode=self.fp16,
+                            max_batch_size=self.cls_trt_max_batchsize,
+                            )
 
 ```
 
 
 
-整体的线上服务部署代码见[main_trt.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_trt.py)
+整体的线上服务部署代码见[main_trt.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/main_trt.py)
 
 :::tip
 因为TensorRT不是线程安全的，所以利用这种方法进行模型加速时，服务部署过程中需要加锁（`with self.lock:`）处理。
@@ -113,7 +107,7 @@ def load_classifier(net, max_batch_size,fp16):
 
 ![](images/quick_start_new_user/torchpipe.png)
 
-利用torchpipe对本服务部署进行调整，整体的线上服务部署代码见[main_torchpipe.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_torchpipe.py),核心函数调整如下：
+利用torchpipe对本服务部署进行调整，整体的线上服务部署代码见[main_torchpipe.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/main_torchpipe.py),核心函数调整如下：
 
 ```py
 # ------- main -------
@@ -216,7 +210,7 @@ std="58.395, 57.120, 57.375" # 255*"0.229, 0.224, 0.225"
 ## 4 性能和效果对比
 `python test_tools.py --img_dir /your/testimg/path/ --port 8095 --request_client 10 --request_batch 1
 `
-测试具体代码见[client_qps.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/client_qps.py)
+测试具体代码见[client_qps.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50_thrift/client_qps.py)
 
 采用相同的thrift的服务接口，测试机器3080,cpu 36核, 并发数10