Clear the torch cuda cache after response #301

RandomGitUser321 · 2024-06-26T22:57:16Z

If a user is using 2.5 with int4, it can just barely fit into 8gb of vram (an extremely common vram size) without using any shared memory. If you mess with the settings and switch from sampling to beam search mode, with the default settings, it will cause the GPU to use more than 8gb of vram and roll over into the shared pool, which exponentially slows things down.

If the user then tries to go back to using sampling mode, to regain the lost speed, the vram usage will still contain the leftovers from using beam and will still be slowed down.

This PR just purges the cuda cache after response, so if you change between the settings, you don't get stuck with the garbage in vram that will keep you stuck with a reduced speed.

EDIT: Updated to only run the command if device == "cuda"

RandomGitUser321 added 4 commits June 26, 2024 18:44

torch.cuda.empty_cache()

c783686

torch.cuda.empty_cache()

79a95a8

Update web_demo_2.5.py

6371590

Update web_demo.py

4814266

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear the torch cuda cache after response #301

Clear the torch cuda cache after response #301

RandomGitUser321 commented Jun 26, 2024 •

edited

Loading

Clear the torch cuda cache after response #301

Are you sure you want to change the base?

Clear the torch cuda cache after response #301

Conversation

RandomGitUser321 commented Jun 26, 2024 • edited Loading

RandomGitUser321 commented Jun 26, 2024 •

edited

Loading