Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server/Client architecture #43

Open
WikiLucas00 opened this issue Sep 9, 2024 · 3 comments
Open

Server/Client architecture #43

WikiLucas00 opened this issue Sep 9, 2024 · 3 comments

Comments

@WikiLucas00
Copy link

Hello @dusty-nv !
I think the nanoLLM project could benefit from offering a server-client architecture, a little like what ollama does with ollama serve and the ollama client API.
This way, the client for nanoLLM could easily be run outside of the docker container, and even on another device. I know you already worked on a web GUI, but it's not exactly the same use case IMHO.

@dusty-nv
Copy link
Owner

Hi @WikiLucas00, I have been meaning to document the websockets API that the dynamic agent uses, it is a general protocol and akin to lightweight RPC by passing around dictionaries. If you are doing text LLM, then you could just spin up an OpenAI-compliant endpoint with MLC. If you are using multimodal VLM or speech, then you can probably make a REST API with flask or FastAPI tailored to your application more quickly. It is also on my todo list to do a ROS connector which provides distributed messaging. Another thing I need to do is split the plugins and Agent Studio into it's own repo 🤷‍♂️ 🤣

@WikiLucas00
Copy link
Author

Thanks for your reply @dusty-nv! It would be just for text LLM (streamed and async).
Are you talking about the MLC package on jetson-containers or about something else, still within the nanoLLM project?
NanoLLM works great in the terminal on my jetson, but I didn't find resources for a straightforward way to serve a model in a container and send requests from a client via http.
Could you please share a little more precise example steps to follow? It would help a lot! 🙏

@dusty-nv
Copy link
Owner

@WikiLucas00 I have plans that you will just be able to spin up whatever agent/plugins you want in Agent Studio, and it will expose the webserver for interfacing with them remotely (for distributed architectures in addition to client UIs) - which technically is already done, the websocket/RPC protocol just isn't documented and I should make lightweight Python/javascript client libraries for it.

For now, what I would recommend is to run nano_llm container in dev mode (where you mount an external copy of the NanoLLM sources into the container so you can easily edit them from your host) or from your own script, and then just add a lightweight Flask or FastAPI REST server on top of that.

If you are unfamiliar with the likes of Flask, JSON, websockets, ect, I would highly recommend to invest an afternoon going through a couple tutorials and learning it, as it is very common that some new model/ect comes out and you wanna spin up a quick server or process for it. It's easy like this: https://pythonbasics.org/flask-rest-api/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants