Replies: 1 comment 1 reply
-
Hi, this is not a diffusers issue though, what you're asking is more a question on a top level over diffusers. You can instantiate a pipeline and keep it in memory (gpu or cpu) and just run it when there's a user request. What you need is to learn how to create an API that keeps a service running without terminating on each run, there's multiple libraries and multiple ways to do this but as I stated before, this has nothing to do with diffusers, a very basic way of doing this is to load the pipeline and create a pipe = StableDiffusionPipeline.from_pretrained(
model_name_or_path,
torch_dtype=torch.float16,
).to("cuda")
while True:
prompt = input("Enter a prompt: ")
image = pipe(
prompt=prompt,
).images[0]
# save the image Now if you replace the part of the user input with something like There's multiple other issues that comes with this like concurrent user calls, model management, etc. But again, diffusers is the library you use as the foundation to generate images, you'll need to build a complete web api solution on top of it. |
Beta Was this translation helpful? Give feedback.
-
I'm looking all around the internet for this:
I'm deploying an application that for every request from the users, run the python script, loads the diffusers library and instantiate a pipeline (
pipeline = DiffusionPipeline.from_pretrained(...)
) and then generates an image to return to the user.However, my inference times are already fast (since calling
pipeline(prompt="example_prompt"...)
only takes about 0.3 seconds) ... but every user request takes more than 40 seconds because on every request from different userspipeline = DiffusionPipeline.from_pretrained(...)
needs to be loaded over and over again every time a user makes a request, making the whole inference process very slow.So my question is this: is it possible to instantiate (or cache) a single
pipeline = DiffusionPipeline.from_pretrained(...)
to be used from different http requests? and be reused multiple times? So it doesn't have to run this for each request to my server.The closest thing that I've found is Mystic (https://docs.mystic.ai/docs/deploy-a-stable-diffusion-pipeline) that allows you to load the model into the cache of the GPU, but its loading the whole model, and I only want to instantiate the pipeline.
Regards,
Beta Was this translation helpful? Give feedback.
All reactions