You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to operationalize a few Flair models behind a Flask/Gunicorn REST API.
I am consistently hitting the following error when loading a second model instance in Gunicorn:
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
This happens whether I try to invoke workers sharing a global SequenceTagger or if I try to instantiate two SequenceTaggers for different models in the same worker.
For example, the following code will throw the above error inside Gunicorn:
Hi @kkarski
as you noticed, this is an issue with torch & gunicorn, hence I'd suggest you forward this problematic to this issue.
It doesn't make sense for Flair to put out any suggestions on that topic, hence I am closing this issue.
Question
I am trying to operationalize a few Flair models behind a Flask/Gunicorn REST API.
I am consistently hitting the following error when loading a second model instance in Gunicorn:
Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
This happens whether I try to invoke workers sharing a global SequenceTagger or if I try to instantiate two SequenceTaggers for different models in the same worker.
For example, the following code will throw the above error inside Gunicorn:
I suspect this is not directly Flair but PyTorch related however, it still poses an issue in my case.
I have tried the first of the following resolution suggestions without success (the second seems very involved):
https://stackoverflow.com/questions/61120314/cannot-launch-gunicorn-flask-app-with-torch-model-on-the-docker
https://stackoverflow.com/questions/72779926/gunicorn-cuda-cannot-re-initialize-cuda-in-forked-subprocess
Is there a different suggested method of running shared models in production to parallelize processing?
The text was updated successfully, but these errors were encountered: