using from_pipe and sequential / model offload #10015

paparico · 2024-11-25T12:35:52Z

paparico
Nov 25, 2024

in my use case, i start by loading a text2image pipeline for the relevant model architecture, and then i instanciate a task-specific pipeline if needed using autoPipelineForxxxxx

my question is, regarding "memory management" (that is, using .to('cuda') or .enable_sequential_cpu_offload() or .enable_model_offload() ) , should i call these on the "base pipeline" , the "task specific" pipeline, both.. ?

during one "session", a given "base pipeline" can be used to instanciate SUCCESSIVELY several "task specific pipelines" (successively, as in, i use the same python variable for all the successive incarnations of the pipeline, so at one given moment, only the base pipeline and ONE task specific pipeline are referenced)

so.. what's the best approach ? does it make any difference if i call these methods on one, the other, or both of these pipelines ?
is it ok to just call it once on the base pipeline and then forget about it ?

thanks in advance

Answered by asomoza

Nov 25, 2024

AFAIK the model offloading is per pipeline and the auto pipeline just transfers the different modules, so you'll still need to enable it in all the pipelines you will use for inference but if you just re-use the same one, you don't need to.

To clarify the other points that were mentioned:

You should not enable any kind of offloading even if your CPU is idle, this will always make inference slower, you should only use it when you don't have enough VRAM to load all the models at the same time. The only other use case when you can use this is when you want to load multiple models in the RAM and just transfer them to VRAM when needed, but this must be implemented manually and is not in the s…

View full answer

ukaprch · 2024-11-25T12:53:38Z

ukaprch
Nov 25, 2024

I believe .enable_model_offload() is slightly better to use than .enable_sequential_cpu_offload() and depending on your VRAM it makes sense to offload a memory intensive task to the cpu while it's not being utilized at that moment. So for instance if your base pipeline is idle while your task pipeline is doing stuff, offload the base pipeline until needed. Either way, over time you eventually run out of memory at least that's the case for me when I reuse an existing pipeline for many iterations.

1 reply

paparico Nov 25, 2024
Author

that's not quite my question : model_offload and sequential_offload each have their pro and cons (memory gain / performance hit trade-off), and i may choose one or the other depending on a set of constraints.

My question is : once i settled for one option, can i apply it on the "base pipeline" once and for all (remember, its "call" method will never be used, it only serves as a components store for subsequent task specific pipelines instantiations, as the argument for xxxx.from_pipe ), or should i call ir on every subsequent instance of task specific pipelines

asomoza · 2024-11-25T17:01:53Z

asomoza
Nov 25, 2024
Maintainer

AFAIK the model offloading is per pipeline and the auto pipeline just transfers the different modules, so you'll still need to enable it in all the pipelines you will use for inference but if you just re-use the same one, you don't need to.

To clarify the other points that were mentioned:

You should not enable any kind of offloading even if your CPU is idle, this will always make inference slower, you should only use it when you don't have enough VRAM to load all the models at the same time. The only other use case when you can use this is when you want to load multiple models in the RAM and just transfer them to VRAM when needed, but this must be implemented manually and is not in the scope of this library.
sequential offload is for extreme low VRAM use cases, is not equivalent to the other one, it's extremely slow but it enables for example to use SDXL with just 2 GB of VRAM.

If you run out of RAM while using this, probably the problem is in your code and not in the library, I can use a lot of times this option and never get this error, probably you have some references to the models, you will also need to do some garbage collection if this is the case.

If someone has a reproducible code example of this we can test, I'll be happy to help if it's an issue with diffusers.

0 replies

larinius · 2024-11-26T17:25:41Z

larinius
Nov 26, 2024

From my experience (with my potato 4Gb GPU), making new pipelines from base pipeline does not take any resources, it is just declaration, VRam will be taken only when you actually start using second pipeline and utilize model.

Also diffusers loading models very fast, from NVME drive it takes several seconds to load SD 1.5, so no real reason to keep pipelines in memory, simpler just discard them and load new.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using from_pipe and sequential / model offload #10015

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

using from_pipe and sequential / model offload #10015

paparico Nov 25, 2024

Replies: 3 comments · 1 reply

ukaprch Nov 25, 2024

paparico Nov 25, 2024 Author

asomoza Nov 25, 2024 Maintainer

larinius Nov 26, 2024

paparico
Nov 25, 2024

Replies: 3 comments 1 reply

ukaprch
Nov 25, 2024

paparico Nov 25, 2024
Author

asomoza
Nov 25, 2024
Maintainer

larinius
Nov 26, 2024