Skip to content

Commit

Permalink
multimodal: update doc and model path in launcher (#48)
Browse files Browse the repository at this point in the history
* update multimodal doc and requirement

* update model path

---------

Co-authored-by: Xiaotong Chen <“[email protected]”>
  • Loading branch information
x574chen and Xiaotong Chen authored Dec 20, 2024
1 parent 97108ec commit 0174d94
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 9 deletions.
18 changes: 11 additions & 7 deletions docs/sphinx/vlm/vlm_offline_inference_en.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,26 +97,30 @@ You can also use OpenAI's Python client library:
},
],
}],
stream=False,
stream=True,
max_completion_tokens=1024,
temperature=0.1,
)
full_response = ""
for chunk in response:
full_response += chunk.choices[0].delta.content
print(".", end="")
print(f"\nFull Response: \n{full_response}")
Launching with CLI
-------------------------
You can also opt to install dashinfer-vlm locally and use command line to launch server.

1. Pull dashinfer docker image (see :ref:`docker-label`)
2. Install TensorRT Python package, and download TensorRT GA build from NVIDIA Developer Zone.

Example: TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64

.. code-block:: bash
pip install tensorrt
wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
export LD_LIBRARY_PATH=`pwd`/TensorRT-10.6.0.26/lib
wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/tars/TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
tar -xvzf TensorRT-10.5.0.18.Linux.x86_64-gnu.cuda-12.6.tar.gz
export LD_LIBRARY_PATH=`pwd`/TensorRT-10.5.0.18/lib
3. Install dashinfer Python Package from `release <https://github.com/modelscope/dash-infer/releases>`_
4. Install dashinfer-vlm: ``pip install dashinfer-vlm``.
Expand Down
3 changes: 2 additions & 1 deletion multimodal/dashinfer_vlm/api_server/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ def init():
context.set("chat_format", chat_format)

# -----------------------Convert Model------------------------
output_dir = "/root/.cache/as_model/" + model.split("/")[-1]
home_dir = os.environ.get("HOME") or "/root"
output_dir = os.path.join(home_dir, ".cache/as_model/", model.split("/")[-1])
model_name = "model"
data_type = "bfloat16"

Expand Down
3 changes: 2 additions & 1 deletion multimodal/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
tensorrt==10.5.0
av
numpy==1.24.3
requests==2.32.3
Expand All @@ -6,7 +7,7 @@ transformers>=4.45.0
cachetools>=5.4.0
six
tiktoken
openai==1.52.2
openai>=1.56.2
shortuuid
fastapi
pydantic_settings
Expand Down

0 comments on commit 0174d94

Please sign in to comment.