Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI: Ability to check health of all projects for support users #3461

Closed
AndrewRTsao opened this issue Nov 10, 2023 · 3 comments · Fixed by #3725
Closed

CLI: Ability to check health of all projects for support users #3461

AndrewRTsao opened this issue Nov 10, 2023 · 3 comments · Fixed by #3725
Assignees
Labels
Area:CLI Team:Platform Platform Working Group Type:Feature New feature request

Comments

@AndrewRTsao
Copy link
Contributor

Currently, if there's a failing project on Rill Cloud and we are getting alerted, it can be difficult for support to quickly diagnose whether this is an internal issue or customer impacting. It's a multi-part process (see this thread):

  • We have to find the project ID
  • We have to assume the user and check the status

As we scale (or if there are multiple alerts), this requires the support user to iterate instances one by one. We should provide a rill sudo project check health <pattern> command that would allow the support user to quickly list / check the health of all projects by some predefined pattern. For example, we could do something like:

  • rill sudo project check health all to print out the health of all projects
  • rill sudo project check health <name_of_org> to print out the health of all projects belonging to a certain org
  • rill sudo project check health <name_of_user> to print out the health of all projects belonging to a certain user
  • rill sudo project check health <domain> to print out the health of all projects by email domain (or perhaps covered by org)
  • rill sudo project check health <status_type> to quickly print out the project status by certain "status type", e.g. error / failing, warnings, healthy (nice-to-have but not sure if reasonable to implement)
  • Ability to support other creative search patterns, such as chaining conditions (search by org / user AND status type)

This would be extremely beneficial from both a devops / infra and support perspective in terms of utility functions.

@AndrewRTsao AndrewRTsao added Type:Feature New feature request Team:Platform Platform Working Group labels Nov 10, 2023
@AndrewRTsao
Copy link
Contributor Author

FYI @cohenscottr per our earlier conversation.

Tagging @himadrisingh if he has any suggestions / thoughts he wanted to add on utility commands he'd find helpful in this area. Tagging @begelundmuller and @nishantmonu51 for visibility.

@begelundmuller
Copy link
Contributor

This seems like a great idea.

In addition to checking status fields, it would also be nice to make at least one analytical API request (or potentially even run our K6 load test, though that might slow it down too much), since certain errors that break dashboards only surface at request time (e.g. a dimension value that's larger than the allowed response size). Alternatively, we might consider having a health check run as part of reconcile as well to capture such errors in a status field.

@begelundmuller
Copy link
Contributor

Just to link the conversations – this PR comment has more details for how to resolve this issue: #3725 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area:CLI Team:Platform Platform Working Group Type:Feature New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants