Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Not leverage True parallelism (Constant time) in Batch Inference, rather taking Sequential (Linear) time as batch size increases. #85

Open
abdullah-al-munem opened this issue Dec 4, 2024 · 2 comments

Comments

@abdullah-al-munem
Copy link

We've tried inference.py with --batch-size 2, 4, and 8. We expected the inference time to be constant or at least near constant for these different batch sizes. But instead, we've observed linear increase in time with increasing batch size i.e. when we doubled the batch size, the inference time also became doubled.

Is this the expected behavior? If so, why? Is it possible to leverage true parallelism (constant time) in Batch Inference at all using CatVTON? It will be really helpful, if you please guide through the solution of this problem.

Screenshot 2024-12-04 133140
Screenshot 2024-12-04 133202
Screenshot 2024-12-04 130951

@Zheng-Chong
Copy link
Owner

Zheng-Chong commented Dec 4, 2024

Increasing the batch size can lead to some speedup, as in your example, where 2.46s x 2 > 4.64s. However, this is achieved on the same GPU, so the speedup will not be particularly significant. If you aim to maintain a constant processing speed, you need to implement parallel processing across multiple GPUs.

@abdullah-al-munem
Copy link
Author

But I noticed one thing when trying out IDM-VTON is that, although it takes around 19GB VRAM (compared to around only 8.5GB of CatVTON) while inferencing with batch size 2, it does take almost same time (around 18.5 seconds) as it takes during single inference i.e. batch size 1 using single GPU, not multiple. Can you please guide me on whether achieving this type of true parallelism during batch inference is possible for CatVTON at all (in single GPU)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants