[Minor] Fix KV cache block size #39

dasistwo · 2024-10-15T10:20:46Z

This PR fixes the KV cache block size to reserve the GPU memory it should have.

When I tried to reproduce "TABLE IV" in the QServe paper, the throughput was somewhat saturated before the point shown in the paper, regardless of the batch size. I found this was due to a miscalculation of the number of KV cache blocks. (The unit of the block size would be 'bytes', not 'bits').

Using 90% of free GPU memory sometimes doesn't even leave room for the hidden dimension, so I reduced it to 75%.

change KV cache block size

79220de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor] Fix KV cache block size #39

[Minor] Fix KV cache block size #39

dasistwo commented Oct 15, 2024

[Minor] Fix KV cache block size #39

Are you sure you want to change the base?

[Minor] Fix KV cache block size #39

Conversation

dasistwo commented Oct 15, 2024