-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【请教】大佬请教一下config.yaml #16
Comments
cpu 目前主流的推理框架 用代码的方式运行目前都比较麻烦, 我研究研究 |
公司商业化肯定还是要GPU,但是GPU的显存太贵了,大模型自己测试下来怎么也要20B左右的才能达到3.5的水平,但要运行多个模型这个就要上多卡了。一张4090的价格买epyc9534配合大内存,可以跑72b的大模型,速度和阿里官方的差不多,解决了平民能不能用大模型的问题。尤其是可以平替gpt-4的deepseek,moe+mla理论上非常适合cpu跑。加油!大哥小弟刚入门,不知道这方面要从哪些学起,想贡献代码,奈何能力有限。。。 |
比如 想要实现 llma_cpp 后端, 仿照其它的后端 实现一个 llama_cpp 后端就行了 |
llamacpp 我这里 pip install 安装失败 |
大佬,之前看到这个项目一直在测试模型,我这边目前使用的是llamacpp,因为有公司这边的服务器,所以是用cpu的,速度10t/s。公司内部用足够了。
知识库用到的模型:
Qwen2.5-14B:q4
bge-reranker-base
Dmeta-embedding-zh-small
大佬这是我这边使用的模型,不知道能不能改成使用cpu的,非常感谢。
后台启动 nohup sh start.sh > gptserver.log &
openai_api_server
serve_args:
host: 0.0.0.0
port: 8082
controller_address: http://localhost:21001
api_keys: 111,222
controller
controller_args:
host: 0.0.0.0
port: 21001
dispatch_method: shortest_queue # lottery shortest_queue
model worker
model_worker_args:
host: 0.0.0.0
controller_address: http://localhost:21001
models:
alias: gpt-4o-mini,gpt-3.5 # 别名 例如 gpt4,gpt3
enable: true # false true
model_name_or_path: /home/dev/model/qwen/Qwen2.5-14B-Instruct-AWQ/
model_type: qwen # qwen yi internlm
work_mode: lmdeploy-turbomind # vllm hf lmdeploy-turbomind lmdeploy-pytorch
device: gpu # gpu / cpu
workers:
- 1
Embedding 模型
bge-reranker-base:
alias: reranker # 别名
enable: true # false true
model_name_or_path: /home/dev/model/BAAI/bge-reranker-base/
model_type: embedding_infinity # embedding_infinity
work_mode: hf
device: gpu # gpu / cpu
workers:
acge_text_embedding:
alias: text-embedding-ada-002 # 别名
enable: true # false true
model_name_or_path: /home/dev/model/DMetaSoul/Dmeta-embedding-zh-small
model_type: embedding_infinity # embedding_infinity
work_mode: hf
device: gpu # gpu / cpu
workers:
The text was updated successfully, but these errors were encountered: