Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Update docs. #597

Merged
merged 3 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions docs/CN/source/getting_started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,20 +67,23 @@ Lightllm 是一个纯python开发的推理框架,其中的算子使用triton
$ git clone https://github.com/ModelTC/lightllm.git
$ cd lightllm
$
$ # 安装lightllm的依赖
$ pip install -r requirements.txt
$ # 安装lightllm的依赖 (cuda 11.8)
$ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
$
$ # 这个版本的 nccl 可以支持 torch cuda graph
$ pip install nvidia-nccl-cu12==2.20.5
$
$ # 安装lightllm
$ python setup.py install

.. note::

Lightllm 的代码在多种GPU上都进行了测试,包括 V100, A100, A800, 4090, 和 H800。
如果你使用 A100 、A800 等显卡,那么推荐你安装 triton==2.1.0 :
如果你使用 A100 、A800 等显卡,那么推荐你安装 triton==3.0.0 :

.. code-block:: console

$ pip install triton==2.1.0 --no-deps
$ pip install triton==3.0.0 --no-deps

如果你使用 H800、V100 等显卡,那么推荐你安装 triton-nightly:

Expand Down
18 changes: 6 additions & 12 deletions docs/CN/source/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
1. 准备模型文件
-------------------------

下面的内容将会以 `Qwen2-0.5B <https://huggingface.co/Qwen/Qwen2-0.5B>`_ 演示lightllm对大语言模型的支持。
下面的内容将会以 `Llama-2-7b-chat <https://huggingface.co/meta-llama/Llama-2-7b-chat>`_ 演示lightllm对大语言模型的支持。
下载模型的方法可以参考文章:`如何快速下载huggingface模型——全方法总结 <https://zhuanlan.zhihu.com/p/663712983>`_

下面是下载模型的实例代码:
Expand All @@ -38,7 +38,7 @@

.. code-block:: console

$ huggingface-cli download Qwen/Qwen2-0.5B --local-dir Qwen2-0.5
$ huggingface-cli download meta-llama/Llama-2-7b-chat --local-dir Llama-2-7b-chat

.. tip::
上面的下载模型的代码需要科学上网,并且需要花费一定的时间,你可以使用其它下载方式或者其它支持的模型作为替代。最新的支持的模型的列表请查看 `项目主页 <https://github.com/ModelTC/lightllm>`_ 。
Expand All @@ -47,20 +47,14 @@
2. 启动模型服务
-------------------------

下载完Qwen2-0.5B模型以后,在终端使用下面的代码部署API服务:
下载完Llama-2-7b-chat模型以后,在终端使用下面的代码部署API服务:

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --eos_id 151643
$ python -m lightllm.server.api_server --model_dir ~/models/Llama-2-7b-chat

.. note::
上面代码中的 ``--model_dir`` 参数需要修改为你本机实际的模型路径。 ``--eos_id 151643`` 是Qwen模型专属,其它模型请删除这个参数。
上面代码中的 ``--model_dir`` 参数需要修改为你本机实际的模型路径。


3. (可选)测试模型服务
Expand All @@ -70,7 +64,7 @@

.. code-block:: console

$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
Expand Down
57 changes: 19 additions & 38 deletions docs/CN/source/models/test.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,14 @@ Qwen2-0.5B

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --eos_id 151643
$ python -m lightllm.server.api_server --model_dir ~/models/Qwen2-0.5B --trust_remote_code

**测试服务**


.. code-block:: console

$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is AI?",
Expand All @@ -39,13 +33,10 @@ Qwen-VL-Chat

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/Qwen-VL-Chat \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --trust_remote_code \
$ --enable_multimodal
$ python -m lightllm.server.api_server
$ --model_dir ~/models/Qwen-VL-Chat \
$ --trust_remote_code \
$ --enable_multimodal

**测试服务**

Expand Down Expand Up @@ -79,7 +70,7 @@ Qwen-VL-Chat
}
}

url = "http://127.0.0.1:8080/generate"
url = "http://127.0.0.1:8000/generate"
headers = {'Content-Type': 'application/json'}
response = requests.post(url, headers=headers, data=json.dumps(data))
return response
Expand Down Expand Up @@ -114,11 +105,7 @@ llama2-70b-chat

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/llama2-70b-chat \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 4 \
$ --max_total_token_num 120000
$ python -m lightllm.server.api_server --model_dir ~/models/llama2-70b-chat --tp 4

.. tip::

Expand All @@ -128,7 +115,7 @@ llama2-70b-chat

.. code-block:: console

$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is LLM?",
Expand All @@ -146,13 +133,10 @@ internlm2-1_8b

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/internlm2-1_8b \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --splitfuse_mode \
$ --trust_remote_code
$ python -m lightllm.server.api_server
$ --model_dir ~/models/internlm2-1_8b \
$ --splitfuse_mode \
$ --trust_remote_code

.. tip::

Expand All @@ -163,7 +147,7 @@ internlm2-1_8b

.. code-block:: console

$ curl http://localhost:8080/generate \
$ curl http://localhost:8000/generate \
$ -H "Content-Type: application/json" \
$ -d '{
$ "inputs": "What is LLM?",
Expand All @@ -181,13 +165,10 @@ internlm2-1_8b-reward

.. code-block:: console

$ python -m lightllm.server.api_server --model_dir ~/models/internlm2-1_8b-reward \
$ --host 0.0.0.0 \
$ --port 8080 \
$ --tp 1 \
$ --max_total_token_num 120000 \
$ --use_reward_model \
$ --trust_remote_code
$ python -m lightllm.server.api_server
$ --model_dir ~/models/internlm2-1_8b-reward \
$ --use_reward_model \
$ --trust_remote_code

.. tip::

Expand All @@ -203,7 +184,7 @@ internlm2-1_8b-reward

query = "<|im_start|>user\nHello! What's your name?<|im_end|>\n<|im_start|>assistant\nMy name is InternLM2! A helpful AI assistant. What can I do for you?<|im_end|>\n<|reward|>"

url = "http://127.0.0.1:8080/get_score"
url = "http://127.0.0.1:8000/get_score"
headers = {'Content-Type': 'application/json'}

data = {
Expand Down
Loading
Loading