-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat (CI): Dump GH CI Stats to GCP Metrics #10338
Conversation
Deploying agoric-sdk with Cloudflare Pages
|
const jobExecutionTime = (new Date(job.completed_at) - new Date(job.started_at)) / 1000; | ||
await sendMetricsToGCP('ci_job_execution_time', jobExecutionTime, jobLabels); | ||
|
||
// Send job status (1 for success, 0 for failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will see if we can have a separate number for a cancelled job. It is 0 for cancel too
.github/workflows/dump-ci-stats.yml
Outdated
node-version: '18' | ||
|
||
- name: Clear npm cache | ||
run: npm cache clean --force |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid corrupted or stale cache issue. Encountered this issue hence clearing the cache. Also its one package installation no such major impact.
} catch (error) { | ||
console.error('Error sending metric:', error); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I'll remove this catch infuture as incase of error, the step should fail.
But sometimes timeseries issue on a single datapoint is fixed on rpc retry and it should not affect other metrics.
So for now catch makes sense
'Integration Tests', | ||
'Test Golang', | ||
'golangci-lint', | ||
'Build release Docker Images', | ||
'Test all Packages', | ||
'Test Documentation', | ||
'Manage integration check', | ||
'after-merge.yml', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are reported to DD too. So kept that list here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, we have to explictly named the workflows we want to capture on completion.
Github don't provide a wildcard (or capture all) option.
As a follow-up, we'll figure out some other way where a custom hook should automatically be called on the end of each workflow which triggers our statsjob.
For now, can keep it as is so that we atleast start capturing data
node-version: '18' | ||
|
||
- name: Install GCP Monitoring/Metrics Client | ||
run: yarn add @google-cloud/monitoring --ignore-workspace-root-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor improvement:
We can cache it. "cache: yarn" can help.
Will do it as a follow-up together with other tweakings.
9b1b527
to
7202c10
Compare
closes: #XXXX
refs: #XXXX
Description
This PR adds a workflow job and a node script which captures Github CI stats on completion of CI workflows and dump them to GCP metrics.
Based on those metrics, we'll have dashboard on grafana.
This is a pre-req of migration from Datadog to GCP/Grafana.
Successful CI link:
https://github.com/Muneeb147/agoric-sdk/actions/runs/11550905067/job/32146827437
Screenshot:
Demo Clip of Metrics:
ci-metrics-demo.mov
Security Considerations
Scaling Considerations
Documentation Considerations
Testing Considerations
Upgrade Considerations