From e58ca7794ae8fe19aa5bae86b9e96c2f9b058f06 Mon Sep 17 00:00:00 2001
From: TylunasLi <pwstudio@163.com>
Date: Fri, 19 Jul 2024 11:43:49 +0800
Subject: [PATCH 1/3] =?UTF-8?q?=E6=9B=B4=E6=96=B0=E6=A8=A1=E5=9E=8B?=
 =?UTF-8?q?=E6=94=AF=E6=8C=81=E5=88=97=E8=A1=A8=EF=BC=8C=E6=95=B4=E5=90=88?=
 =?UTF-8?q?=E6=A8=A1=E5=9E=8B=E8=BD=AC=E6=8D=A2=E6=96=87=E6=A1=A3?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 README.md               |   7 +-
 docs/convert_model.md   |  96 ---------------
 docs/fastllm_pytools.md |   0
 docs/llama_cookbook.md  |   4 +-
 docs/models.md          | 262 ++++++++++++++++++++++++++++++++++++++--
 5 files changed, 261 insertions(+), 108 deletions(-)
 delete mode 100644 docs/convert_model.md
 delete mode 100644 docs/fastllm_pytools.md

diff --git a/README.md b/README.md
index e14c2151..72464589 100644
--- a/README.md
+++ b/README.md
@@ -8,17 +8,18 @@ fastllm是纯c++实现，无第三方依赖的多平台高性能大模型推理
 
 部署交流QQ群： 831641348
 
-| [快速开始](#快速开始) | [模型获取](#模型获取) |
+| [快速开始](#快速开始) | [模型获取](docs/models.md) |
 
 ## 功能概述
 
 - 🚀 纯c++实现，便于跨平台移植，可以在安卓上直接编译
+- 🚀 无论ARM平台，X86平台，NVIDIA平台，速度都较快
 - 🚀 支持读取Hugging face原始模型并直接量化
 - 🚀 支持部署Openai api server
 - 🚀 支持多卡部署，支持GPU + CPU混合部署
 - 🚀 支持动态Batch，流式输出
 - 🚀 前后端分离设计，便于支持新的计算设备
-- 🚀 目前支持ChatGLM系列模型，Qwen2系列模型，各种LLAMA模型(ALPACA, VICUNA等)，BAICHUAN模型，MOSS模型，MINICPM模型等
+- 🚀 目前支持ChatGLM系列模型，Qwen系列模型，各种LLAMA模型(ALPACA, VICUNA等)，BAICHUAN模型，MOSS模型，MINICPM模型等
 
 ## 快速开始
 
@@ -66,7 +67,7 @@ python3 -m ftllm.webui -t 16 -p ~/Qwen2-7B-Instruct/ --port 8080
 
 目前模型的支持情况见: [模型列表](docs/models.md)
 
-有一些架构暂时无法直接读取Hugging face模型，可以参考 [模型转换文档](docs/convert_model.md) 转换fastllm格式的模型
+一些早期的HuggingFace模型无法直接读取，可以参考 [模型转换](docs/models.md#模型导出convert-offline) 转换fastllm格式的模型
 
 ### 运行demo程序 (c++)
 
diff --git a/docs/convert_model.md b/docs/convert_model.md
deleted file mode 100644
index 041d12c6..00000000
--- a/docs/convert_model.md
+++ /dev/null
@@ -1,96 +0,0 @@
-## 模型获取
-
-### 模型库
-
-可以在以下链接中下载已经转换好的模型
-
-[huggingface](https://huggingface.co/huangyuyang) 
-
-### 模型导出
-
-#### ChatGLM模型导出 (默认脚本导出ChatGLM2-6b模型)
-
-``` sh
-# 需要先安装ChatGLM-6B环境
-# 如果使用自己finetune的模型需要修改chatglm_export.py文件中创建tokenizer, model的代码
-cd build
-python3 tools/chatglm_export.py chatglm2-6b-fp16.flm float16 #导出float16模型
-python3 tools/chatglm_export.py chatglm2-6b-int8.flm int8 #导出int8模型
-python3 tools/chatglm_export.py chatglm2-6b-int4.flm int4 #导出int4模型
-```
-
-#### baichuan模型导出 (默认脚本导出baichuan-13b-chat模型)
-
-``` sh
-# 需要先安装baichuan环境
-# 如果使用自己finetune的模型需要修改baichuan2flm.py文件中创建tokenizer, model的代码
-# 根据所需的精度，导出相应的模型
-cd build
-python3 tools/baichuan2flm.py baichuan-13b-fp16.flm float16 #导出float16模型
-python3 tools/baichuan2flm.py baichuan-13b-int8.flm int8 #导出int8模型
-python3 tools/baichuan2flm.py baichuan-13b-int4.flm int4 #导出int4模型
-```
-
-#### baichuan2模型导出 (默认脚本导出baichuan2-7b-chat模型)
-
-``` sh
-# 需要先安装baichuan2环境
-# 如果使用自己finetune的模型需要修改baichuan2_2flm.py文件中创建tokenizer, model的代码
-# 根据所需的精度，导出相应的模型
-cd build
-python3 tools/baichuan2_2flm.py baichuan2-7b-fp16.flm float16 #导出float16模型
-python3 tools/baichuan2_2flm.py baichuan2-7b-int8.flm int8 #导出int8模型
-python3 tools/baichuan2_2flm.py baichuan2-7b-int4.flm int4 #导出int4模型
-```
-
-#### MOSS模型导出
-
-``` sh
-# 需要先安装MOSS环境
-# 如果使用自己finetune的模型需要修改moss_export.py文件中创建tokenizer, model的代码
-# 根据所需的精度，导出相应的模型
-cd build
-python3 tools/moss_export.py moss-fp16.flm float16 #导出float16模型
-python3 tools/moss_export.py moss-int8.flm int8 #导出int8模型
-python3 tools/moss_export.py moss-int4.flm int4 #导出int4模型
-```
-
-#### LLAMA系列模型导出
-``` sh
-# 修改build/tools/alpaca2flm.py程序进行导出
-# 不同llama模型使用的指令相差很大，需要参照torch2flm.py中的参数进行配置
-```
-一些模型的转换可以[参考这里的例子](docs/llama_cookbook.md)
-
-#### QWEN模型导出
-* **Qwen**
-```sh
-# 需要先安装QWen环境
-# 如果使用自己finetune的模型需要修改qwen2flm.py文件中创建tokenizer, model的代码
-# 根据所需的精度，导出相应的模型
-cd build
-python3 tools/qwen2flm.py qwen-7b-fp16.flm float16 #导出float16模型
-python3 tools/qwen2flm.py qwen-7b-int8.flm int8 #导出int8模型
-python3 tools/qwen2flm.py qwen-7b-int4.flm int4 #导出int4模型
-```
-
-* **Qwen1.5**
-
-```sh
-# 需要先安装QWen2环境（transformers >= 4.37.0）
-# 根据所需的精度，导出相应的模型
-cd build
-python3 tools/llamalike2flm.py qwen1.5-7b-fp16.flm float16 "qwen/Qwen1.5-4B-Chat" #导出wen1.5-4B-Chat float16模型
-python3 tools/llamalike2flm.py qwen1.5-7b-int8.flm int8 "qwen/Qwen1.5-7B-Chat" #导出Qwen1.5-7B-Chat int8模型
-python3 tools/llamalike2flm.py qwen1.5-7b-int4.flm int4 "qwen/Qwen1.5-14B-Chat" #导出Qwen1.5-14B-Chat int4模型
-# 最后一个参数可替换为模型路径
-```
-
-#### MINICPM模型导出
-```sh
-# 需要先安装MiniCPM环境（transformers >= 4.36.0） 
-# 默认脚本导出iniCPM-2B-dpo-fp16模型
-cd build 
-python tools/minicpm2flm.py minicpm-2b-float16.flm #导出dpo-float16模型
-./main -p minicpm-2b-float16.flm # 执行模型
-```
\ No newline at end of file
diff --git a/docs/fastllm_pytools.md b/docs/fastllm_pytools.md
deleted file mode 100644
index e69de29b..00000000
diff --git a/docs/llama_cookbook.md b/docs/llama_cookbook.md
index fc9bdcb1..d2d3bda6 100644
--- a/docs/llama_cookbook.md
+++ b/docs/llama_cookbook.md
@@ -238,7 +238,7 @@ XVERSE-13B-Chat V1 版本需要对输入做NFKC规范化，fastllm暂不支持
                      user_role="[|Human|]:", bot_role="\n[|AI|]:", history_sep="\n", dtype=dtype)
 ```
 
-## Yi
+### Yi
 
 * 01-ai/[Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
 
@@ -249,6 +249,8 @@ XVERSE-13B-Chat V1 版本需要对输入做NFKC规范化，fastllm暂不支持
                      user_role="<|im_start|>user\n", bot_role="<|im_end|><|im_start|>assistant\n", history_sep="<|im_end|>\n", dtype=dtype)
 ```
 
+* [SUSTech/SUS-Chat-34B](https://huggingface.co/SUSTech/SUS-Chat-34B)
+
 ### WizardCoder
 
   * [WizardCoder-Python-7B-V1.0](https://huggingface.co/WizardLM/WizardCoder-Python-7B-V1.0)
diff --git a/docs/models.md b/docs/models.md
index 7dfab9e5..4945b0d1 100644
--- a/docs/models.md
+++ b/docs/models.md
@@ -1,15 +1,261 @@
-## 支持直接读取的模型
+# 支持的模型 Supported Models
 
-#### THUDM/glm-4-9b-chat
+## 说明
 
-#### meta-llama/Meta-Llama-3-8B-Instruct
+目前Fastllm加载模型有以下几种方式。
 
-#### meta-llama/Meta-Llama-3-70B-Instruct
+* **加载后转换（两行加速模式）** (convert on-the-fly)
+    将原始模型加载为HuggingFace模型，再通过`from_hf()`方法，转换并加速，这种方法内存占用大且速度慢，目前不再推荐。
 
-#### Qwen/Qwen2-0.5B-Instruct
+* **离线转换** (convert offline)
+    将原始模型转换为.flm格式的模型，一些[模型](#flm模型库)已经转换好。
 
-#### Qwen/Qwen2-1.5B-Instruct
+* **直接读取** (load from Huggingface .safetensors)
+    直接读取HuggingFace上发布的模型，仅支持.safetensors格式的模型。
 
-#### Qwen/Qwen2-7B-Instruct
 
-#### Qwen/Qwen2-72B-Instruct
+## 支持模型一览 Model List
+
+
+* ✔ 表示支持该方式，并测试通过；
+    ✔ means supports this mode and passes the test.
+
+* ❌ 表示本应该支持该方式，但实际测试后发现本功能并不受支持，可能在后续版本修复。
+    ❌ means this method is supposed to be supported, but failed after actual testing.
+
+* √ 表示支持，但是还没有测试过
+    √ means supported, but not tested.
+
+### GLM系列
+
+|              模型  | 加载后转换 |  离线转换  |  直接读取  |
+|-----------------: |------------|------------|------------|
+| THUDM/ChatGLM-6b | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) |  |
+| THUDM/ChatGLM-6b-int8 | [✔](#Cchatglm系列) | ❌ |  |
+| THUDM/ChatGLM-6b-int4 | [✔](#chatglm系列) | ❌ |  |
+| THUDM/ChatGLM2-6b | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) |  |
+| THUDM/glm-large-chinese |  | [✔](tools\scripts/glm_export.py) | |
+| THUDM/ChatGLM2-6b-int8 | [✔](#chatglm系列) | ❌ |  |
+| THUDM/ChatGLM2-6b-int4 | [✔](#chatglm系列) | ❌ |  |
+| THUDM/ChatGLM2-6b-32k | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) |  |
+| THUDM/ChatGLM3-6b | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) |  |
+| THUDM/ChatGLM3-6b-32k | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) |  |
+| THUDM/ChatGLM3-6b-128k | ❌ | ❌ |  |
+| THUDM/glm-4-9b-chat | [✔](#chatglm系列) | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型) | ✔ |
+| THUDM/codegeex4-all-9b | [✔](#chatglm系列)<sup>2</sup> | [✔](#chatglm模型导出-默认脚本导出chatglm2-6b模型)<sup>2</sup> | ✔ |
+
+> 注2：需要手动设置 pre_prompt
+
+### Qwen系列
+
+|              模型  | 加载后转换 |  离线转换  |  直接读取  |
+|-------------------: |------------|------------|------------|
+| Qwen/Qwen-7B-Chat   | [✔](#其它模型) | [✔](#qwen模型导出) |  |
+| Qwen/Qwen-14B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) |  |
+| Qwen/Qwen-72B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) |  |
+| Qwen/Qwen-1_8B-Chat | [✔](#其它模型) | [✔](#qwen模型导出) |  |
+| Qwen/Qwen1.5-0.5B-Chat | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-1.8B-Chat | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-4B-Chat   | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-7B-Chat   | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-14B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-72B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-32B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/CodeQwen1.5-7B-Chat | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
+| Qwen/Qwen2-0.5B-Instruct | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
+| Qwen/Qwen2-1.5B-Instruct | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
+| Qwen/Qwen2-7B-Instruct   | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
+| Qwen/Qwen2-72B-Instruct  |  | [✔](#qwen模型导出) | ✔ |
+
+> 注3： 需要更新，检查 tokenizer_config.json 是否为最新版本
+
+### DeepSeek系列
+
+|                                       模型  | 加载后转换 |  离线转换  |  直接读取  |
+|-------------------------------------------: |------------|------------|------------|
+| deepseek-ai/Deepseek-Coder-1.3B-Instruct    | [✔](llama_cookbook.md#deepseek-coder) | [✔](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
+| deepseek-ai/Deepseek-Coder-6.7B-Instruct    | [✔](llama_cookbook.md#deepseek-coder) | [✔](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
+| deepseek-ai/Deepseek-Coder-7B-Instruct v1.5 | [✔](llama_cookbook.md#deepseek-coder) | [✔](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
+| deepseek-ai/deepseek-coder-33b-instruct     | [√](llama_cookbook.md#deepseek-coder) | [√](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
+| deepseek-ai/DeepSeek-V2-Chat                | √ | ✔ | √<sup>4</sup> |
+| deepseek-ai/DeepSeek-V2-Lite-Chat           | √ | ✔ | √<sup>4</sup> |
+| deepseek-ai/DeepSeek-Coder-V2-Instruct      | √ | √ | √<sup>4</sup> |
+| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct | √ | √ | √<sup>4</sup> |
+
+### LLaMA类模型
+
+|              模型  | 加载后转换 |  离线转换  |  直接读取  |
+|-----------------: |------------|------------|------------|
+| meta-llama/Llama-2-7b-chat-hf | [✔](llama_cookbook.md#llama2-chat) | [✔](llama_cookbook.md#llama2-chat) |  |
+| meta-llama/Llama-2-13b-chat-hf | [✔](llama_cookbook.md#llama2-chat) | [✔](llama_cookbook.md#llama2-chat) |  |
+| codellama/CodeLlama-7b-Instruct-hf | [✔](llama_cookbook.md#llama2-chat) | [✔](llama_cookbook.md#llama2-chat) |  |
+| codellama/CodeLlama-13b-Instruct-hf | [✔](llama_cookbook.md#llama2-chat) | [✔](llama_cookbook.md#llama2-chat) |  |
+| xverse/XVERSE-13B-Chat | [✔](llama_cookbook.md#xverse) | [✔](llama_cookbook.md#xverse) |  |
+| xverse/XVERSE-7B-Chat | [✔](llama_cookbook.md#xverse) | [✔](llama_cookbook.md#xverse) |  |
+|  |  |  |  |
+| internlm/internlm-chat-7b | [✔](llama_cookbook.md#internlm书生) | [✔](llama_cookbook.md#internlm书生) |  |
+| internlm/internlm-chat-20b | [✔](llama_cookbook.md#internlm书生) | [✔](llama_cookbook.md#internlm书生) |  |
+| internlm/internlm2-chat-1_8b | [✔](llama_cookbook.md#internlm书生) | [✔](llama_cookbook.md#internlm书生) | ❌<sup>4</sup> |
+| internlm/internlm2-chat-7b | [✔](llama_cookbook.md#internlm书生) | [✔](llama_cookbook.md#internlm书生) | ❌<sup>4</sup> |
+| internlm/internlm2-chat-20b | [✔](llama_cookbook.md#internlm书生) | [✔](llama_cookbook.md#internlm书生) | ❌<sup>4</sup> |
+|  |  |  |  |
+| 01-ai/Yi-6B-Chat | [✔](llama_cookbook.md#yi) | [✔](llama_cookbook.md#yi) | ❌<sup>4</sup> |
+| 01-ai/Yi-34B-Chat | [✔](llama_cookbook.md#yi) | [✔](llama_cookbook.md#yi) | ❌<sup>4</sup> |
+| SUSTech/SUS-Chat-34B | [✔](llama_cookbook.md#llama2-chat) | [✔](llama_cookbook.md#llama2-chat) |  |
+|  |  |  |  |
+| meta-llama/Meta-Llama-3-8B-Instruct |  | [✔](tools/scripts/llama3_to_flm.py) | ✔ |
+| meta-llama/Meta-Llama-3-70B-Instruct |  | [✔](tools/scripts/llama3_to_flm.py) | ✔ |
+
+> 注4： Python ftllm用AutoTokenizer而不使用Fastllm Tokenizer可以实现加载，但是C++程序尚不支持加载该模型的Tokenizer。
+
+### 其它模型
+
+|              模型  | 加载后转换 |  离线转换  |  直接读取  |
+|-----------------: |------------|------------|------------|
+| fnlp/moss-moon-003-sft | [✔]() | [✔](#moss模型导出) |  |
+| fnlp/moss-moon-003-sft-plugin | [✔]() | [✔](#moss模型导出) |  |
+|  |  |  |  |
+| baichuan-inc/baichuan-13b-chat | [✔](#其它模型) | [✔](#baichuan模型导出-默认脚本导出baichuan-13b-chat模型) |  |
+| baichuan-inc/Baichuan2-7B-Chat | [✔](#其它模型) | [✔](#baichuan2模型导出-默认脚本导出baichuan2-7b-chat模型) |  |
+| baichuan-inc/baichuan2-13b-chat | [✔](#其它模型) | [✔](#baichuan2模型导出-默认脚本导出baichuan2-7b-chat模型) |  |
+|  |  |  |  |
+| openbmb/MiniCPM-2B-sft-fp16 | [✔](#其它模型) | [✔](#minicpm模型导出) |  |
+| openbmb/MiniCPM-2B-dpo-fp16 | [✔](#其它模型) | [✔](#minicpm模型导出) |  |
+
+
+### 加载后转换（两行加速模式）(convert on-the-fly)
+
+#### ChatGLM系列
+
+``` python
+# 这是原来的程序，通过huggingface接口创建模型
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code = True)
+model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code = True)
+
+# 加入下面这两行，将huggingface模型转换成fastllm模型
+# 目前from_hf接口只能接受原始模型，或者ChatGLM的int4, int8量化模型，暂时不能转换其它量化模型
+from ftllm import llm
+model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16", "int8", "int4"
+model = model.eval()
+```
+
+model支持了ChatGLM的API函数`chat()`, `stream_chat()`，因此ChatGLM的demo程序无需改动其他代码即可运行
+
+#### 其它模型
+
+``` python
+# 通过huggingface接口创建模型，参考每个模型readme.md中的加载方式
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code = True)
+model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code = True)
+
+# 加入下面这两行，将huggingface模型转换成fastllm模型
+# 目前from_hf接口只能接受原始模型，或者ChatGLM的int4, int8量化模型，暂时不能转换其它量化模型
+from ftllm import llm
+model = llm.from_hf(model, tokenizer, dtype = "float16") # dtype支持 "float16", "int8", "int4"
+```
+ftllm实现了兼容Transformers的`generate()`方法。
+
+
+转好的模型也可以导出到本地文件，之后可以直接读取，也可以使用fastllm cpp接口读取
+
+``` python
+model.save("model.flm"); # 导出fastllm模型
+new_model = llm.model("model.flm"); # 导入flm模型
+```
+
+### flm模型库
+
+可以在以下链接中找到一部分已经转换好的模型
+
+[huggingface](https://huggingface.co/huangyuyang) [modelscope](https://modelscope.cn/profile/huangyuyang)
+
+### 模型导出(convert offline)
+
+#### ChatGLM模型导出 (默认脚本导出ChatGLM2-6b模型)
+
+``` sh
+# 需要先安装ChatGLM-6B环境
+# 如果使用自己finetune的模型需要修改chatglm_export.py文件中创建tokenizer, model的代码
+cd build
+python3 tools/chatglm_export.py chatglm2-6b-fp16.flm float16 #导出float16模型
+python3 tools/chatglm_export.py chatglm2-6b-int8.flm int8 #导出int8模型
+python3 tools/chatglm_export.py chatglm2-6b-int4.flm int4 #导出int4模型
+```
+
+#### baichuan模型导出 (默认脚本导出baichuan-13b-chat模型)
+
+``` sh
+# 需要先安装baichuan环境
+# 如果使用自己finetune的模型需要修改baichuan2flm.py文件中创建tokenizer, model的代码
+# 根据所需的精度，导出相应的模型
+cd build
+python3 tools/baichuan2flm.py baichuan-13b-fp16.flm float16 #导出float16模型
+python3 tools/baichuan2flm.py baichuan-13b-int8.flm int8 #导出int8模型
+python3 tools/baichuan2flm.py baichuan-13b-int4.flm int4 #导出int4模型
+```
+
+#### baichuan2模型导出 (默认脚本导出baichuan2-7b-chat模型)
+
+``` sh
+# 需要先安装baichuan2环境
+# 如果使用自己finetune的模型需要修改baichuan2_2flm.py文件中创建tokenizer, model的代码
+# 根据所需的精度，导出相应的模型
+cd build
+python3 tools/baichuan2_2flm.py baichuan2-7b-fp16.flm float16 #导出float16模型
+python3 tools/baichuan2_2flm.py baichuan2-7b-int8.flm int8 #导出int8模型
+python3 tools/baichuan2_2flm.py baichuan2-7b-int4.flm int4 #导出int4模型
+```
+
+#### MOSS模型导出
+
+``` sh
+# 需要先安装MOSS环境
+# 如果使用自己finetune的模型需要修改moss_export.py文件中创建tokenizer, model的代码
+# 根据所需的精度，导出相应的模型
+cd build
+python3 tools/moss_export.py moss-fp16.flm float16 #导出float16模型
+python3 tools/moss_export.py moss-int8.flm int8 #导出int8模型
+python3 tools/moss_export.py moss-int4.flm int4 #导出int4模型
+```
+
+#### LLAMA系列模型导出
+``` sh
+# 修改build/tools/alpaca2flm.py程序进行导出
+# 不同llama模型使用的指令相差很大，需要参照torch2flm.py中的参数进行配置
+```
+一些模型的转换可以[参考这里的例子](llama_cookbook.md)
+
+#### QWEN模型导出
+* **Qwen**
+```sh
+# 需要先安装QWen环境
+# 如果使用自己finetune的模型需要修改qwen2flm.py文件中创建tokenizer, model的代码
+# 根据所需的精度，导出相应的模型
+cd build
+python3 tools/qwen2flm.py qwen-7b-fp16.flm float16 #导出float16模型
+python3 tools/qwen2flm.py qwen-7b-int8.flm int8 #导出int8模型
+python3 tools/qwen2flm.py qwen-7b-int4.flm int4 #导出int4模型
+```
+
+* **Qwen1.5**
+
+```sh
+# 需要先安装QWen2环境（transformers >= 4.37.0）
+# 根据所需的精度，导出相应的模型
+cd build
+python3 tools/llamalike2flm.py qwen1.5-7b-fp16.flm float16 "qwen/Qwen1.5-4B-Chat" #导出wen1.5-4B-Chat float16模型
+python3 tools/llamalike2flm.py qwen1.5-7b-int8.flm int8 "qwen/Qwen1.5-7B-Chat" #导出Qwen1.5-7B-Chat int8模型
+python3 tools/llamalike2flm.py qwen1.5-7b-int4.flm int4 "qwen/Qwen1.5-14B-Chat" #导出Qwen1.5-14B-Chat int4模型
+# 最后一个参数可替换为模型路径
+```
+
+#### MINICPM模型导出
+```sh
+# 需要先安装MiniCPM环境（transformers >= 4.36.0） 
+# 默认脚本导出iniCPM-2B-dpo-fp16模型
+cd build 
+python tools/minicpm2flm.py minicpm-2b-float16.flm #导出dpo-float16模型
+./main -p minicpm-2b-float16.flm # 执行模型
+```
\ No newline at end of file

From 40a962208ec4a0ccf1f9207777a4b9b07d71f0b3 Mon Sep 17 00:00:00 2001
From: TylunasLi <pwstudio@163.com>
Date: Fri, 19 Jul 2024 14:27:28 +0800
Subject: [PATCH 2/3] =?UTF-8?q?=E5=AF=B9=E5=BA=94=E8=84=9A=E6=9C=AC?=
 =?UTF-8?q?=E4=BF=AE=E6=94=B9=EF=BC=8C=E8=B0=83=E6=95=B4FAQ?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/faq.md    | 20 ++++++--------------
 docs/models.md | 13 +++++++------
 2 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/docs/faq.md b/docs/faq.md
index 40d1e024..fec11fd0 100755
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -27,22 +27,14 @@ cmake .. -DUSE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native
 
 **解决办法：**
 
-手动修改 CMakeLists.txt，根据GPU型号手动指定GPU的[Compute Capability](https://developer.nvidia.com/cuda-gpus)。如：
-
-``` diff
---- a/CMakeLists.txt
-+++ b/CMakeLists.txt
-@@ -52,7 +52,7 @@
-     #message(${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES})
-     set(FASTLLM_CUDA_SOURCES src/devices/cuda/cudadevice.cpp src/devices/cuda/cudadevicebatch.cpp src/devices/cuda/fastllm-cuda.cu)
-     set(FASTLLM_LINKED_LIBS ${FASTLLM_LINKED_LIBS} cublas)
--    set(CMAKE_CUDA_ARCHITECTURES "native")
-+    set(CMAKE_CUDA_ARCHITECTURES 61 75 86 89)
- endif()
- 
- if (PY_API)
+根据GPU型号手动指定GPU的[Compute Capability](https://developer.nvidia.com/cuda-gpus)。如：
+
+```shell
+cmake .. -DUSE_CUDA=ON -DCUDA_ARCH="61;75;86;89"
 ```
 
+若需要支持多种GPU架构，请使用“;”分隔（如上面例子）。
+
 ### identifier "__hdiv" is undefined
 
 **现象：**
diff --git a/docs/models.md b/docs/models.md
index 4945b0d1..dbecd853 100644
--- a/docs/models.md
+++ b/docs/models.md
@@ -4,26 +4,26 @@
 
 目前Fastllm加载模型有以下几种方式。
 
-* **加载后转换（两行加速模式）** (convert on-the-fly)
+* **加载后转换（两行加速模式）** (convert on-the-fly)  
     将原始模型加载为HuggingFace模型，再通过`from_hf()`方法，转换并加速，这种方法内存占用大且速度慢，目前不再推荐。
 
-* **离线转换** (convert offline)
+* **离线转换** (convert offline)  
     将原始模型转换为.flm格式的模型，一些[模型](#flm模型库)已经转换好。
 
-* **直接读取** (load from Huggingface .safetensors)
+* **直接读取** (load from Huggingface .safetensors)  
     直接读取HuggingFace上发布的模型，仅支持.safetensors格式的模型。
 
 
 ## 支持模型一览 Model List
 
 
-* ✔ 表示支持该方式，并测试通过；
+* ✔ 表示支持该方式，并测试通过；  
     ✔ means supports this mode and passes the test.
 
-* ❌ 表示本应该支持该方式，但实际测试后发现本功能并不受支持，可能在后续版本修复。
+* ❌ 表示本应该支持该方式，但实际测试后发现本功能并不受支持，可能在后续版本修复；  
     ❌ means this method is supposed to be supported, but failed after actual testing.
 
-* √ 表示支持，但是还没有测试过
+* √ 表示支持，但是还没有测试过。  
     √ means supported, but not tested.
 
 ### GLM系列
@@ -61,6 +61,7 @@
 | Qwen/Qwen1.5-14B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
 | Qwen/Qwen1.5-72B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
 | Qwen/Qwen1.5-32B-Chat  | [✔](#其它模型) | [✔](#qwen模型导出) | ✔<sup>3</sup> |
+| Qwen/Qwen1.5-110B-Chat | [√](#其它模型) | [√](#qwen模型导出) | √<sup>3</sup> |
 | Qwen/CodeQwen1.5-7B-Chat | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
 | Qwen/Qwen2-0.5B-Instruct | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |
 | Qwen/Qwen2-1.5B-Instruct | [✔](#其它模型) | [✔](#qwen模型导出) | ✔ |

From 0dc630b4f90b03cb2e640e7fd2526f216c00747d Mon Sep 17 00:00:00 2001
From: cgli <cgli@iyunwen.com>
Date: Fri, 19 Jul 2024 14:20:26 +0800
Subject: [PATCH 3/3] =?UTF-8?q?=E4=BF=AE=E5=A4=8DGCC=207.x=E4=B8=8B?=
 =?UTF-8?q?=E7=9A=84=E7=BC=96=E8=AF=91?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 include/utils/utils.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/utils/utils.h b/include/utils/utils.h
index 0721bbb6..4e258119 100644
--- a/include/utils/utils.h
+++ b/include/utils/utils.h
@@ -14,7 +14,11 @@
 #include <cstdint>
 #include <thread>
 #include <vector>
+#if defined(__GNUC__) && __GNUC__ < 8
+#include <experimental/filesystem>
+#else
 #include <filesystem>
+#endif
 
 #if defined(_WIN32) or defined(_WIN64)
 #include <Windows.h>
@@ -32,7 +36,7 @@
 #endif
 #endif
 
-#if defined(_MSC_VER) && _MSC_VER <= 1900 // VS 2015
+#if (defined(_MSC_VER) && _MSC_VER <= 1900) || (defined(__GNUC__) && __GNUC__ < 8) // VS 2015) 
     namespace fs = std::experimental::filesystem;
 #else
     namespace fs = std::filesystem;