Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[catalog] Write a script that scrapes the GH traffic API #53

Open
alexeagle opened this issue Aug 9, 2022 · 8 comments
Open

[catalog] Write a script that scrapes the GH traffic API #53

alexeagle opened this issue Aug 9, 2022 · 8 comments
Labels
bcr Bazel Central Registry bounty-1000USD A contributor who completes this will be rewarded $1000

Comments

@alexeagle
Copy link
Contributor

alexeagle commented Aug 9, 2022

Some Googler with GH auth token could run this script on some cadence and hand the data dump to the SIG so we get relative numbers.

I emailed with the team:
"
Maybe obvious, but the Bazel team doesn't actually have to do any work here, if you were willing to share a GitHub access token that has needed permission across the bazelbuild org. This is what blocks an outside party from gathering numbers:

% curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $TOKEN" https://api.github.com/repos/bazelbuild/rules_python/traffic/views
{
  "count": 10476,
  "uniques": 1405,
  "views": [
    {
      "timestamp": "2022-06-03T00:00:00Z",
      "count": 218,
      "uniques": 54
    },
...
% curl -H "Accept: application/vnd.github.v3+json" -H "Authorization: token $TOKEN" https://api.github.com/repos/bazelbuild/rules_apple/traffic/views
{
  "message": "Must have push access to repository",
  "documentation_url": "https://docs.github.com/rest/reference/repos#get-page-views"
}

@meteorcloudy indicated willingness to accept a PR on the bazelbuild/bazel_metrics repo via email:

Sounds good, maybe you can send a PR to bazel_metrics to add the script? We can the decide to either manually run it or set up a pipeline to do it.

So this ticket is to create such a script, along with a process (can just be a scheduled reminder email) for someone at Google to run the script, and it should publish the data to a place we can ingest (maybe a GH Gist or something simple like that)

@alexeagle alexeagle added the bounty-1000USD A contributor who completes this will be rewarded $1000 label Aug 9, 2022
@cgrindel cgrindel added the bcr Bazel Central Registry label Aug 9, 2022
@cgrindel cgrindel moved this to Todo in Rules SIG Tracker Aug 9, 2022
@cgrindel cgrindel moved this from Todo to Blocked in Rules SIG Tracker Aug 9, 2022
@aherrmann
Copy link
Member

IIUC Github Apps can access the traffic/views endpoint if they have read permissions on the repo

GET /repos/:owner/:repo/traffic/views (:read)

Perhaps a Github App could be a good way to set this up. Each repo to be listed on the catalog could install the app and that app could periodically query the traffic endpoint and send the data wherever it's needed.


As a simpler alternative I tried running the query in a GH action, but it looks like the automatic GITHUB_TOKEN is insufficient for that API endpoint.

@alexeagle
Copy link
Contributor Author

That's a good idea, and we've been working with the Google team on permissions for another GitHub App (publishing new ruleset releases to BCR) so I think this can reuse a lot of work from @kormide

@alexeagle
Copy link
Contributor Author

@ashi009 this might be a place to start.

@ashi009
Copy link

ashi009 commented Feb 5, 2023

I believe the best approach's to build an GitHub app to do this. So that we no longer need a personal access token to access the endpoint. Instead we can grant permission to this app, which will definitively make secops happy.

@alexeagle
Copy link
Contributor Author

Yes and it might also let us handle "registration" - installing that app is enough to get your ruleset added to our catalog instead of needing to send a separate PR

@ashi009
Copy link

ashi009 commented Feb 5, 2023 via email

@ashi009
Copy link

ashi009 commented Feb 6, 2023

I just finished a POC Github App in go. The traffic API requires only read-only access to admin and meta to work. We can talk about this more after I send the PR.

@alexeagle
Copy link
Contributor Author

I think @kormide will be a good code reviewer for that.

@ashi009 ashi009 mentioned this issue Feb 7, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bcr Bazel Central Registry bounty-1000USD A contributor who completes this will be rewarded $1000
Projects
Status: Needs Assignment
Development

No branches or pull requests

4 participants