-
in my use case, i start by loading a text2image pipeline for the relevant model architecture, and then i instanciate a task-specific pipeline if needed using autoPipelineForxxxxx my question is, regarding "memory management" (that is, using .to('cuda') or .enable_sequential_cpu_offload() or .enable_model_offload() ) , should i call these on the "base pipeline" , the "task specific" pipeline, both.. ? during one "session", a given "base pipeline" can be used to instanciate SUCCESSIVELY several "task specific pipelines" (successively, as in, i use the same python variable for all the successive incarnations of the pipeline, so at one given moment, only the base pipeline and ONE task specific pipeline are referenced) so.. what's the best approach ? does it make any difference if i call these methods on one, the other, or both of these pipelines ? thanks in advance |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
I believe .enable_model_offload() is slightly better to use than .enable_sequential_cpu_offload() and depending on your VRAM it makes sense to offload a memory intensive task to the cpu while it's not being utilized at that moment. So for instance if your base pipeline is idle while your task pipeline is doing stuff, offload the base pipeline until needed. Either way, over time you eventually run out of memory at least that's the case for me when I reuse an existing pipeline for many iterations. |
Beta Was this translation helpful? Give feedback.
-
AFAIK the model offloading is per pipeline and the To clarify the other points that were mentioned:
If you run out of RAM while using this, probably the problem is in your code and not in the library, I can use a lot of times this option and never get this error, probably you have some references to the models, you will also need to do some garbage collection if this is the case. If someone has a reproducible code example of this we can test, I'll be happy to help if it's an issue with diffusers. |
Beta Was this translation helpful? Give feedback.
-
From my experience (with my potato 4Gb GPU), making new pipelines from base pipeline does not take any resources, it is just declaration, VRam will be taken only when you actually start using second pipeline and utilize model. Also diffusers loading models very fast, from NVME drive it takes several seconds to load SD 1.5, so no real reason to keep pipelines in memory, simpler just discard them and load new. |
Beta Was this translation helpful? Give feedback.
AFAIK the model offloading is per pipeline and the
auto pipeline
just transfers the different modules, so you'll still need to enable it in all the pipelines you will use for inference but if you just re-use the same one, you don't need to.To clarify the other points that were mentioned: