diff --git a/docs/sphinx/vlm/vlm_offline_inference_en.rst b/docs/sphinx/vlm/vlm_offline_inference_en.rst index 20d37a6a..10ecdb4c 100644 --- a/docs/sphinx/vlm/vlm_offline_inference_en.rst +++ b/docs/sphinx/vlm/vlm_offline_inference_en.rst @@ -107,15 +107,19 @@ Launching with CLI You can also opt to install dashinfer-vlm locally and use command line to launch server. 1. Pull dashinfer docker image (see :ref:`docker-label`) -2. Download and extract the TensorRT GA build +2. Install TensorRT Python package, and download TensorRT GA build from NVIDIA Developer Zone. + +Example: TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64 .. code-block:: bash + pip install tensorrt wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz - export TRT_LIBPATH=`pwd`/TensorRT-10.6.0.26 + export LD_LIBRARY_PATH=`pwd`/TensorRT-10.6.0.26/lib -3. Install ``dashinfer-vlm``: ``pip install dashinfer-vlm``. +3. Install dashinfer Python Package from `release `_ +4. Install dashinfer-vlm: ``pip install dashinfer-vlm``. Now you can launch server with command line: diff --git a/multimodal/Dockerfile b/multimodal/Dockerfile index a128ffd4..81b9bbfb 100644 --- a/multimodal/Dockerfile +++ b/multimodal/Dockerfile @@ -6,6 +6,7 @@ RUN mkdir /root/code/ COPY ./dashinfer_vlm /root/code/dashinfer_vlm COPY ./setup.py code/ COPY ./requirements.txt /root/code/requirements.txt +RUN python3 -m pip install https://github.com/modelscope/dash-infer/releases/download/v2.0.0-rc2/dashinfer-2.0.0rc2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl RUN python3 -m pip install -r /root/code/requirements.txt --index-url=http://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com RUN python3 -m pip install -e /root/code/ diff --git a/multimodal/requirements.txt b/multimodal/requirements.txt index e22accce..135cf9d9 100644 --- a/multimodal/requirements.txt +++ b/multimodal/requirements.txt @@ -1,4 +1,3 @@ -dashinfer av numpy==1.24.3 requests==2.32.3 @@ -12,7 +11,8 @@ shortuuid fastapi pydantic_settings uvicorn -cmake==3.22.6 +cmake==3.22.6 modelscope aiohttp onnx +torchvision diff --git a/multimodal/resource/dashinfer-vlm-arch.png b/multimodal/resource/dashinfer-vlm-arch.png index a2f79f5a..82ee80d5 100644 Binary files a/multimodal/resource/dashinfer-vlm-arch.png and b/multimodal/resource/dashinfer-vlm-arch.png differ