Server/Client architecture #43

WikiLucas00 · 2024-09-09T19:56:34Z

Hello @dusty-nv !
I think the nanoLLM project could benefit from offering a server-client architecture, a little like what ollama does with ollama serve and the ollama client API.
This way, the client for nanoLLM could easily be run outside of the docker container, and even on another device. I know you already worked on a web GUI, but it's not exactly the same use case IMHO.

The text was updated successfully, but these errors were encountered:

dusty-nv · 2024-09-10T05:20:37Z

Hi @WikiLucas00, I have been meaning to document the websockets API that the dynamic agent uses, it is a general protocol and akin to lightweight RPC by passing around dictionaries. If you are doing text LLM, then you could just spin up an OpenAI-compliant endpoint with MLC. If you are using multimodal VLM or speech, then you can probably make a REST API with flask or FastAPI tailored to your application more quickly. It is also on my todo list to do a ROS connector which provides distributed messaging. Another thing I need to do is split the plugins and Agent Studio into it's own repo 🤷‍♂️ 🤣

WikiLucas00 · 2024-09-12T16:46:03Z

Thanks for your reply @dusty-nv! It would be just for text LLM (streamed and async).
Are you talking about the MLC package on jetson-containers or about something else, still within the nanoLLM project?
NanoLLM works great in the terminal on my jetson, but I didn't find resources for a straightforward way to serve a model in a container and send requests from a client via http.
Could you please share a little more precise example steps to follow? It would help a lot! 🙏

dusty-nv · 2024-09-13T16:39:49Z

@WikiLucas00 I have plans that you will just be able to spin up whatever agent/plugins you want in Agent Studio, and it will expose the webserver for interfacing with them remotely (for distributed architectures in addition to client UIs) - which technically is already done, the websocket/RPC protocol just isn't documented and I should make lightweight Python/javascript client libraries for it.

For now, what I would recommend is to run nano_llm container in dev mode (where you mount an external copy of the NanoLLM sources into the container so you can easily edit them from your host) or from your own script, and then just add a lightweight Flask or FastAPI REST server on top of that.

If you are unfamiliar with the likes of Flask, JSON, websockets, ect, I would highly recommend to invest an afternoon going through a couple tutorials and learning it, as it is very common that some new model/ect comes out and you wanna spin up a quick server or process for it. It's easy like this: https://pythonbasics.org/flask-rest-api/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server/Client architecture #43

Server/Client architecture #43

WikiLucas00 commented Sep 9, 2024

dusty-nv commented Sep 10, 2024

WikiLucas00 commented Sep 12, 2024

dusty-nv commented Sep 13, 2024

Server/Client architecture #43

Server/Client architecture #43

Comments

WikiLucas00 commented Sep 9, 2024

dusty-nv commented Sep 10, 2024

WikiLucas00 commented Sep 12, 2024

dusty-nv commented Sep 13, 2024