WIP: Float16 KV Cache in voicecraft.py #72

Ph0rk0z · 2024-04-05T17:57:02Z

Didn't appear to do anything bad. Not sure how much it helps. Give it a try. I think there are some missing torch GC calls somewhere because not all memory is always cleared. Are there other places we can use FP16? In inference it shouldn't matter, unlike training.

jasonppy · 2024-04-05T21:58:45Z

Thanks!

Do you have an estimate on how much VRAM after do make the cache fp16?

With fp32, for the default example in the demo, For the 830M model, it needs around 22GB with kvcache on, 12GB with kvcache off (i.e. kvcache=0); for the 330M model, 15GB with kvcache on, 5GB with kvcache off

In addition, can one make the entire model/operation in fp16?

Ph0rk0z · 2024-04-06T00:03:03Z

The model loading with whisperX is about 6gb but it goes up on inference.

I tried to add model.half() in the model loading code too but there was no difference. It could be due to the 4 batches, I think it uses less if you set it do do 1 batch.

Ph0rk0z · 2024-04-06T00:19:43Z

https://files.catbox.moe/azwyj4.mov

here is what it does on my machine. I wonder why the CPU use is so high as well.

Float16 KV Cache in voicecraft.py

6dda1a4

jasonppy self-assigned this Apr 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Float16 KV Cache in voicecraft.py #72

WIP: Float16 KV Cache in voicecraft.py #72

Ph0rk0z commented Apr 5, 2024

jasonppy commented Apr 5, 2024

Ph0rk0z commented Apr 6, 2024

Ph0rk0z commented Apr 6, 2024

WIP: Float16 KV Cache in voicecraft.py #72

Are you sure you want to change the base?

WIP: Float16 KV Cache in voicecraft.py #72

Conversation

Ph0rk0z commented Apr 5, 2024

jasonppy commented Apr 5, 2024

Ph0rk0z commented Apr 6, 2024

Ph0rk0z commented Apr 6, 2024