-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server/Client architecture #43
Comments
Hi @WikiLucas00, I have been meaning to document the websockets API that the dynamic agent uses, it is a general protocol and akin to lightweight RPC by passing around dictionaries. If you are doing text LLM, then you could just spin up an OpenAI-compliant endpoint with MLC. If you are using multimodal VLM or speech, then you can probably make a REST API with flask or FastAPI tailored to your application more quickly. It is also on my todo list to do a ROS connector which provides distributed messaging. Another thing I need to do is split the plugins and Agent Studio into it's own repo 🤷♂️ 🤣 |
Thanks for your reply @dusty-nv! It would be just for text LLM (streamed and async). |
@WikiLucas00 I have plans that you will just be able to spin up whatever agent/plugins you want in Agent Studio, and it will expose the webserver for interfacing with them remotely (for distributed architectures in addition to client UIs) - which technically is already done, the websocket/RPC protocol just isn't documented and I should make lightweight Python/javascript client libraries for it. For now, what I would recommend is to run nano_llm container in dev mode (where you mount an external copy of the NanoLLM sources into the container so you can easily edit them from your host) or from your own script, and then just add a lightweight Flask or FastAPI REST server on top of that. If you are unfamiliar with the likes of Flask, JSON, websockets, ect, I would highly recommend to invest an afternoon going through a couple tutorials and learning it, as it is very common that some new model/ect comes out and you wanna spin up a quick server or process for it. It's easy like this: https://pythonbasics.org/flask-rest-api/ |
Hello @dusty-nv !
I think the nanoLLM project could benefit from offering a server-client architecture, a little like what ollama does with
ollama serve
and the ollama client API.This way, the client for nanoLLM could easily be run outside of the docker container, and even on another device. I know you already worked on a web GUI, but it's not exactly the same use case IMHO.
The text was updated successfully, but these errors were encountered: