Deploying HuggingFace model/pipeline using uvicorn-gunicorn-fastapi-docker on Google Cloud Run #328
-
Hi everybody, I am pretty new to web app development and have doubts about how to make the best out of this incredible docker image. How should I set up the number of workers/threads for gunicorn/uvicorn, and the characteristics of the base cloud run instance? I noticed that, for every additional worker and/or thread, 3.5GB of RAM are needed. Also, during execution, memory leakage occurs, which would require a worker to be restarted every now and then. My naif guess is that I should have as many workers as the number of vCPU and a RAM of at least 3.5GB times the number of workers. Is that correct? What about the number of concurrent requests? Right now, my uvicorn command in the dockerfile looks like this:
Nonetheless, with this setting, after a while the RAM gets saturated that the service breaks down :( Any help is more then welcome. Thank you in advance. Best |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
If you use If you use To decide the number of workers: N = number of threads + 1. So if you are GPU limited, that's your criteria to decide the number of workers. What I wrote above is based on what I observed in a few tests. It might well be incorrect. |
Beta Was this translation helpful? Give feedback.
-
I would also recommend using Gunicorn instead of Uvicorn to run the app |
Beta Was this translation helpful? Give feedback.
-
Now that Uvicorn supports managing workers with Because of that, I deprecated this Docker image: https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker#-warning-you-probably-dont-need-this-docker-image That would also probably make use cases like this simpler to deal with. 🤓 |
Beta Was this translation helpful? Give feedback.
Now that Uvicorn supports managing workers with
--workers
, including restarting dead ones, there's no need for Gunicorn. That also means that it's much simpler to build a Docker image from scratch now, I updated the docs to explain it.Because of that, I deprecated this Docker image: https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker#-warning-you-probably-dont-need-this-docker-image
That would also probably make use cases like this simpler to deal with. 🤓