Skip to content

Commit

Permalink
merge main
Browse files Browse the repository at this point in the history
  • Loading branch information
baishihao committed Nov 20, 2024
2 parents 8990f47 + 06afb4a commit 418aa30
Show file tree
Hide file tree
Showing 70 changed files with 3,490 additions and 1,320 deletions.
10 changes: 10 additions & 0 deletions docs/CN/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
## Build the docs

```bash
# Install lightllm

# git clone https://github.com/ModelTC/lightllm.git
# cd lightllm
pip install --no-deps .
```

```bash
# Install dependencies.

# cd docs/CN
pip install -r requirements-docs.txt

# Build the docs.
Expand Down
5 changes: 1 addition & 4 deletions docs/CN/requirements-docs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,5 @@ sphinxcontrib.openapi

# packages to install to build the documentation
pydantic
-f https://download.pytorch.org/whl/cpu
torch
py-cpuinfo
transformers
openai # Required by docs/source/serving/openai_compatible_server.md's vllm.entrypoints.openai.cli_args
numpy
11 changes: 7 additions & 4 deletions docs/CN/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,20 +67,23 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
$ git clone https://github.com/ModelTC/lightllm.git
$ cd lightllm
$
$ # 安装lightllm的依赖
$ pip install -r requirements.txt
$ # 安装lightllm的依赖 (cuda 11.8)
$ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
$
$ # 这个版本的 nccl 可以支持 torch cuda graph
$ pip install nvidia-nccl-cu12==2.20.5
$
$ # 安装lightllm
$ python setup.py install
.. note::

Lightllm 的代码在多种GPU上都进行了测试,包括 V100, A100, A800, 4090, 和 H800。
如果你使用 A100 、A800 等显卡,那么推荐你安装 triton==2.1.0 :
如果你使用 A100 、A800 等显卡,那么推荐你安装 triton==3.0.0 :

.. code-block:: console
$ pip install triton==2.1.0 --no-deps
$ pip install triton==3.0.0 --no-deps
如果你使用 H800、V100 等显卡,那么推荐你安装 triton-nightly:

Expand Down
18 changes: 6 additions & 12 deletions docs/CN/source/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
1. 准备模型文件
-------------------------

下面的内容将会以 `Qwen2-0.5B <https://huggingface.co/Qwen/Qwen2-0.5B>`_ 演示lightllm对大语言模型的支持。
下面的内容将会以 `Llama-2-7b-chat <https://huggingface.co/meta-llama/Llama-2-7b-chat>`_ 演示lightllm对大语言模型的支持。
下载模型的方法可以参考文章:`如何快速下载huggingface模型——全方法总结 <https://zhuanlan.zhihu.com/p/663712983>`_

下面是下载模型的实例代码:
Expand All @@ -38,7 +38,7 @@

.. code-block:: console
$ huggingface-cli download Qwen/Qwen2-0.5B --local-dir Qwen2-0.5
$ huggingface-cli download meta-llama/Llama-2-7b-chat --local-dir Llama-2-7b-chat
.. tip::
上面的下载模型的代码需要科学上网,并且需要花费一定的时间,你可以使用其它下载方式或者其它支持的模型作为替代。最新的支持的模型的列表请查看 `项目主页 <https://github.com/ModelTC/lightllm>`_ 。
Expand All @@ -47,20 +47,14 @@
2. 启动模型服务
-------------------------

下载完Qwen2-0.5B模型以后,在终端使用下面的代码部署API服务:
下载完Llama-2-7b-chat模型以后,在终端使用下面的代码部署API服务:

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --eos_id 151643
$ python -m lightllm.server.api_server --model_dir ~/models/Llama-2-7b-chat
.. note::
上面代码中的 ``--model_dir`` 参数需要修改为你本机实际的模型路径。 ``--eos_id 151643`` 是Qwen模型专属,其它模型请删除这个参数。
上面代码中的 ``--model_dir`` 参数需要修改为你本机实际的模型路径。


3. (可选)测试模型服务
Expand All @@ -70,7 +64,7 @@

.. code-block:: console
$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
Expand Down
57 changes: 19 additions & 38 deletions docs/CN/source/models/test.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,14 @@ Qwen2-0.5B

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --eos_id 151643
$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B --trust_remote_code
**测试服务**


.. code-block:: console
$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
Expand All @@ -39,13 +33,10 @@ Qwen-VL-Chat

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/Qwen-VL-Chat \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --enable_multimodal
$ python -m lightllm.server.api_server
$ --model_dir ~/models/Qwen-VL-Chat \
$ --trust_remote_code \
$ --enable_multimodal
**测试服务**

Expand Down Expand Up @@ -79,7 +70,7 @@ Qwen-VL-Chat
}
}
url = "http://127.0.0.1:8080/generate"
url = "http://127.0.0.1:8000/generate"
headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, data=json.dumps(data))
return response
Expand Down Expand Up @@ -114,11 +105,7 @@ llama2-70b-chat

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/llama2-70b-chat \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 4 \
$ --max_total_token_num 120000
$ python -m lightllm.server.api_server --model_dir ~/models/llama2-70b-chat --tp 4
.. tip::

Expand All @@ -128,7 +115,7 @@ llama2-70b-chat

.. code-block:: console
$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is LLM?",
Expand All @@ -146,13 +133,10 @@ internlm2-1_8b

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/internlm2-1_8b \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --splitfuse_mode \
$ --trust_remote_code
$ python -m lightllm.server.api_server
$ --model_dir ~/models/internlm2-1_8b \
$ --splitfuse_mode \
$ --trust_remote_code
.. tip::

Expand All @@ -163,7 +147,7 @@ internlm2-1_8b

.. code-block:: console
$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is LLM?",
Expand All @@ -181,13 +165,10 @@ internlm2-1_8b-reward

.. code-block:: console
$ python -m lightllm.server.api_server --model_dir ~/models/internlm2-1_8b-reward \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --use_reward_model \
$ --trust_remote_code
$ python -m lightllm.server.api_server
$ --model_dir ~/models/internlm2-1_8b-reward \
$ --use_reward_model \
$ --trust_remote_code
.. tip::

Expand All @@ -203,7 +184,7 @@ internlm2-1_8b-reward
query = "<|im_start|>user\nHello! What's your name?<|im_end|>\n<|im_start|>assistant\nMy name is InternLM2! A helpful AI assistant. What can I do for you?<|im_end|>\n<|reward|>"
url = "http://127.0.0.1:8080/get_score"
url = "http://127.0.0.1:8000/get_score"
headers = {'Content-Type': 'application/json'}
data = {
Expand Down
Loading

0 comments on commit 418aa30

Please sign in to comment.