Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Monitoring + CW Constructs #18

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

rpmcginty
Copy link
Collaborator

@rpmcginty rpmcginty commented Jul 25, 2024

What's in this Change?

  • cleaning up monitoring constructs

Testing

  • deploying changes to merscope pipeline

@rpmcginty rpmcginty force-pushed the feature/update-monitoring-constructs branch from 8b97aaa to 06ec42b Compare July 26, 2024 00:23
@rpmcginty rpmcginty requested a review from njmei July 26, 2024 00:24
Comment on lines +110 to +120
def get_duration_min_metric(
self,
name_override: Optional[str] = None,
) -> GraphMetricConfig:
name = name_override or self.lambda_function_name
return GraphMetricConfig(
metric="Duration",
statistic="Minimum",
dimension_map=self.dimension_map,
label=f"{name} Min",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should start with a more minimal set of metrics (maybe just successes, failures, and durations?) and then only add if we know we really need them? Metrics like min duration don't seem the most useful?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the metrics you see in lambda monitoring dashboard. So I just replicated what is displayed there. This is the case already for ocs graphs.

I think things like min/max in 5 minute windows help give more insight into whether there are outlier runs. but lets chat at standup

label=f"{name_override or self.state_machine_name} Started",
statistic="Sum",
dimension_map=self.dimension_map,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, do we need to know number of invocations if we are already logging completions/failures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metric gives us a sense as to what long running jobs have started. I added this to OCS because, like analysis jobs, the alignment jobs take a long time to run and I think seeing the start and completion times is helpful to see.

@njmei njmei self-requested a review July 26, 2024 19:06
@rpmcginty rpmcginty merged commit b768105 into main Jul 26, 2024
4 checks passed
@rpmcginty rpmcginty deleted the feature/update-monitoring-constructs branch July 26, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants