Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat (CI): Dump GH CI Stats to GCP Metrics #10338

Merged
merged 1 commit into from
Oct 31, 2024
Merged

Conversation

Muneeb147
Copy link
Contributor

@Muneeb147 Muneeb147 commented Oct 25, 2024

closes: #XXXX
refs: #XXXX

Description

This PR adds a workflow job and a node script which captures Github CI stats on completion of CI workflows and dump them to GCP metrics.
Based on those metrics, we'll have dashboard on grafana.
This is a pre-req of migration from Datadog to GCP/Grafana.

Successful CI link:
https://github.com/Muneeb147/agoric-sdk/actions/runs/11550905067/job/32146827437

Screenshot:
Screenshot 2024-10-28 at 4 39 08 PM

Demo Clip of Metrics:

ci-metrics-demo.mov

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

@Muneeb147 Muneeb147 requested a review from a team as a code owner October 25, 2024 09:46
@Muneeb147 Muneeb147 added the force:integration Force integration tests to run on PR label Oct 25, 2024
Copy link

cloudflare-workers-and-pages bot commented Oct 25, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7202c10
Status: ✅  Deploy successful!
Preview URL: https://f46e0b49.agoric-sdk.pages.dev
Branch Preview URL: https://muneeb-capture-gh-ci-stats.agoric-sdk.pages.dev

View logs

@Muneeb147 Muneeb147 removed the request for review from AgoricTriage October 25, 2024 09:55
@Muneeb147 Muneeb147 marked this pull request as draft October 25, 2024 12:00
const jobExecutionTime = (new Date(job.completed_at) - new Date(job.started_at)) / 1000;
await sendMetricsToGCP('ci_job_execution_time', jobExecutionTime, jobLabels);

// Send job status (1 for success, 0 for failure)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will see if we can have a separate number for a cancelled job. It is 0 for cancel too

node-version: '18'

- name: Clear npm cache
run: npm cache clean --force
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid corrupted or stale cache issue. Encountered this issue hence clearing the cache. Also its one package installation no such major impact.

@Muneeb147 Muneeb147 changed the title Chore (CI): Dump GH CI Stats to GCP Metrics Feat (CI): Dump GH CI Stats to GCP Metrics Oct 28, 2024
Comment on lines +47 to +49
} catch (error) {
console.error('Error sending metric:', error);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I'll remove this catch infuture as incase of error, the step should fail.
But sometimes timeseries issue on a single datapoint is fixed on rpc retry and it should not affect other metrics.
So for now catch makes sense

Comment on lines +7 to +14
'Integration Tests',
'Test Golang',
'golangci-lint',
'Build release Docker Images',
'Test all Packages',
'Test Documentation',
'Manage integration check',
'after-merge.yml',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are reported to DD too. So kept that list here.

Copy link
Contributor Author

@Muneeb147 Muneeb147 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we have to explictly named the workflows we want to capture on completion.
Github don't provide a wildcard (or capture all) option.

As a follow-up, we'll figure out some other way where a custom hook should automatically be called on the end of each workflow which triggers our statsjob.

For now, can keep it as is so that we atleast start capturing data

@Muneeb147 Muneeb147 marked this pull request as ready for review October 28, 2024 11:51
@Muneeb147 Muneeb147 self-assigned this Oct 28, 2024
@Muneeb147 Muneeb147 removed the force:integration Force integration tests to run on PR label Oct 28, 2024
node-version: '18'

- name: Install GCP Monitoring/Metrics Client
run: yarn add @google-cloud/monitoring --ignore-workspace-root-check
Copy link
Contributor Author

@Muneeb147 Muneeb147 Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor improvement:
We can cache it. "cache: yarn" can help.
Will do it as a follow-up together with other tweakings.

@Muneeb147 Muneeb147 force-pushed the muneeb/capture-gh-ci-stats branch from 9b1b527 to 7202c10 Compare October 31, 2024 11:56
@Muneeb147 Muneeb147 added the automerge:rebase Automatically rebase updates, then merge label Oct 31, 2024
@mergify mergify bot merged commit a6b3352 into master Oct 31, 2024
90 checks passed
@mergify mergify bot deleted the muneeb/capture-gh-ci-stats branch October 31, 2024 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants