-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating Monitoring + CW Constructs #18
Conversation
8b97aaa
to
06ec42b
Compare
def get_duration_min_metric( | ||
self, | ||
name_override: Optional[str] = None, | ||
) -> GraphMetricConfig: | ||
name = name_override or self.lambda_function_name | ||
return GraphMetricConfig( | ||
metric="Duration", | ||
statistic="Minimum", | ||
dimension_map=self.dimension_map, | ||
label=f"{name} Min", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should start with a more minimal set of metrics (maybe just successes, failures, and durations?) and then only add if we know we really need them? Metrics like min duration don't seem the most useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the metrics you see in lambda monitoring dashboard. So I just replicated what is displayed there. This is the case already for ocs graphs.
I think things like min/max in 5 minute windows help give more insight into whether there are outlier runs. but lets chat at standup
label=f"{name_override or self.state_machine_name} Started", | ||
statistic="Sum", | ||
dimension_map=self.dimension_map, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, do we need to know number of invocations if we are already logging completions/failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This metric gives us a sense as to what long running jobs have started. I added this to OCS because, like analysis jobs, the alignment jobs take a long time to run and I think seeing the start and completion times is helpful to see.
What's in this Change?
Testing