Skip to content

Commit

Permalink
[V1][Minor] Set pin_memory=False for token_ids_cpu tensor (vllm-proje…
Browse files Browse the repository at this point in the history
…ct#11581)

Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: xcnick <[email protected]>
  • Loading branch information
WoosukKwon authored and xcnick committed Dec 31, 2024
1 parent 9ec8df8 commit 08003bf
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion vllm/v1/worker/gpu_input_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,13 @@ def __init__(

# TODO(woosuk): This buffer could be too large if max_model_len is big.
# Find a way to reduce the CPU memory usage.
# This buffer is not directly transferred to the GPU, so it does not
# need to be pinned.
self.token_ids_cpu_tensor = torch.zeros(
(max_num_reqs, max_model_len),
device="cpu",
dtype=torch.int32,
pin_memory=pin_memory,
pin_memory=False,
)
self.token_ids_cpu = self.token_ids_cpu_tensor.numpy()
self.num_computed_tokens_cpu = np.empty(max_num_reqs, dtype=np.int32)
Expand Down

0 comments on commit 08003bf

Please sign in to comment.