-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more Prometheus metrics #2764
Changes from 9 commits
319bc37
9d4ce95
ae7eb6e
2188daa
e41c15f
71ec7c3
45bd839
f237c50
f17a966
8e0d8c1
9ed04ef
6aebd80
de84dac
35944cc
76cd774
93b0796
3643e0c
60f1049
95daee7
cf4acef
0f8dae9
bce096c
e15f653
7b05baa
0958259
5e2c246
5cc7b64
1eeb31d
4c79cbe
4c41a89
ac8435b
f22abf5
9352ce7
b2c0445
e147575
f9bc64e
5ded719
dd84d51
e127a4c
2d36609
e81d95a
ece2ec0
5a658c8
717b559
61fad41
f103ad8
bf1a0c4
cc0d5eb
d7f493b
54bf260
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
import copy | ||
from collections import defaultdict | ||
from collections import defaultdict, Counter as CollectionsCounter | ||
import os | ||
import time | ||
from typing import (TYPE_CHECKING, Any, Dict, Iterable, List, Optional, Tuple, | ||
|
@@ -845,19 +845,44 @@ def _get_stats(self, | |
# Iteration stats if we have scheduler output. | ||
num_prompt_tokens = 0 | ||
num_generation_tokens = 0 | ||
num_prompt_tokens_lst = [] | ||
num_generation_tokens_lst = [] | ||
max_tokens = [] | ||
request_n = [] | ||
time_to_first_tokens = [] | ||
time_per_output_tokens = [] | ||
time_e2e_requests = [] | ||
finished_reason_counter = CollectionsCounter() | ||
if scheduler_outputs is not None: | ||
prompt_run = scheduler_outputs.prompt_run | ||
|
||
# Number of Tokens. | ||
# Number of Tokens | ||
if prompt_run: | ||
num_prompt_tokens = scheduler_outputs.num_batched_tokens | ||
num_prompt_tokens_lst = [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually an incorrect calculation. This So what I would suggest is that we log the of the generation and the prompt once the This way we know that we are logging the full prefill length and that we are logging one item per iteration |
||
len(seq_group.prompt_token_ids) | ||
for seq_group in scheduler_outputs.scheduled_seq_groups | ||
] | ||
else: | ||
num_generation_tokens = scheduler_outputs.num_batched_tokens | ||
num_generation_tokens_lst = [ | ||
seq.get_output_len() | ||
for seq_group in scheduler_outputs.scheduled_seq_groups | ||
for seq in seq_group.get_finished_seqs() | ||
] | ||
|
||
# Latency Timings. | ||
# Sampling Params | ||
if prompt_run: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue related to chunked prefill. If our prefill has N chunked, we will create N histogram entires |
||
max_tokens = [ | ||
seq_group.sampling_params.max_tokens | ||
for seq_group in scheduler_outputs.scheduled_seq_groups | ||
] | ||
request_n = [ | ||
seq_group.sampling_params.n | ||
for seq_group in scheduler_outputs.scheduled_seq_groups | ||
] | ||
|
||
# Latency Timings | ||
time_last_iters = [] | ||
for seq_group in scheduler_outputs.scheduled_seq_groups: | ||
# Time since last token. (n.b. updates seq_group.last_token_time) | ||
|
@@ -869,15 +894,29 @@ def _get_stats(self, | |
time_to_first_tokens = time_last_iters if prompt_run else [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these are also now incorrect b/c of chunked prefill |
||
time_per_output_tokens = [] if prompt_run else time_last_iters | ||
|
||
# Finished Requests | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its a bit hard to follow where we have 5-6 loops that loop through the same list of |
||
for seq_group in scheduler_outputs.scheduled_seq_groups: | ||
if not seq_group.is_finished(): | ||
continue | ||
finished_reason_counter += CollectionsCounter([ | ||
SequenceStatus.get_finished_reason(seq.status) | ||
for seq in seq_group.get_finished_seqs() | ||
]) | ||
|
||
return Stats( | ||
now=now, | ||
num_running=num_running, | ||
num_swapped=num_swapped, | ||
num_waiting=num_waiting, | ||
gpu_cache_usage=gpu_cache_usage, | ||
cpu_cache_usage=cpu_cache_usage, | ||
finished_reason_counter=finished_reason_counter, | ||
num_prompt_tokens=num_prompt_tokens, | ||
num_generation_tokens=num_generation_tokens, | ||
num_prompt_tokens_lst=num_prompt_tokens_lst, | ||
num_generation_tokens_lst=num_generation_tokens_lst, | ||
max_tokens=max_tokens, | ||
request_n=request_n, | ||
time_to_first_tokens=time_to_first_tokens, | ||
time_per_output_tokens=time_per_output_tokens, | ||
time_e2e_requests=time_e2e_requests, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this one to be just after the
# Finished Requests
commentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason this can't just be a list of finished reasons that we process on the
stat_logger
side?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, and it would make the stats logging code in the engine a bit cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I move
finished_reason_counter = CollectionsCounter()
after# Finished Requests
, it might be undefined when accessed later for the same reason asrequest_n
andrequest_best_of
. Please refer to that comment for details.Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah in the diff view I hadn't realised this was nested, it's ok to leave as is then