CLI: Ability to check health of all projects for support users #3461

AndrewRTsao · 2023-11-10T23:59:40Z

Currently, if there's a failing project on Rill Cloud and we are getting alerted, it can be difficult for support to quickly diagnose whether this is an internal issue or customer impacting. It's a multi-part process (see this thread):

We have to find the project ID
We have to assume the user and check the status

As we scale (or if there are multiple alerts), this requires the support user to iterate instances one by one. We should provide a rill sudo project check health <pattern> command that would allow the support user to quickly list / check the health of all projects by some predefined pattern. For example, we could do something like:

rill sudo project check health all to print out the health of all projects
rill sudo project check health <name_of_org> to print out the health of all projects belonging to a certain org
rill sudo project check health <name_of_user> to print out the health of all projects belonging to a certain user
rill sudo project check health <domain> to print out the health of all projects by email domain (or perhaps covered by org)
rill sudo project check health <status_type> to quickly print out the project status by certain "status type", e.g. error / failing, warnings, healthy (nice-to-have but not sure if reasonable to implement)
Ability to support other creative search patterns, such as chaining conditions (search by org / user AND status type)

This would be extremely beneficial from both a devops / infra and support perspective in terms of utility functions.

The text was updated successfully, but these errors were encountered:

AndrewRTsao · 2023-11-11T00:00:37Z

FYI @cohenscottr per our earlier conversation.

Tagging @himadrisingh if he has any suggestions / thoughts he wanted to add on utility commands he'd find helpful in this area. Tagging @begelundmuller and @nishantmonu51 for visibility.

begelundmuller · 2023-11-12T10:10:00Z

This seems like a great idea.

In addition to checking status fields, it would also be nice to make at least one analytical API request (or potentially even run our K6 load test, though that might slow it down too much), since certain errors that break dashboards only surface at request time (e.g. a dimension value that's larger than the allowed response size). Alternatively, we might consider having a health check run as part of reconcile as well to capture such errors in a status field.

begelundmuller · 2024-01-08T09:58:38Z

Just to link the conversations – this PR comment has more details for how to resolve this issue: #3725 (review)

AndrewRTsao added Type:Feature New feature request Team:Platform Platform Working Group labels Nov 10, 2023

nishantmonu51 added the Area:CLI label Nov 18, 2023

nishantmonu51 assigned rakeshsharma14317 Nov 18, 2023

rakeshsharma14317 linked a pull request Dec 19, 2023 that will close this issue

CLI: ability to check health of all projects for support users #3725

Merged

begelundmuller closed this as completed in #3725 Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: Ability to check health of all projects for support users #3461

CLI: Ability to check health of all projects for support users #3461

AndrewRTsao commented Nov 10, 2023

AndrewRTsao commented Nov 11, 2023

begelundmuller commented Nov 12, 2023

begelundmuller commented Jan 8, 2024

CLI: Ability to check health of all projects for support users #3461

CLI: Ability to check health of all projects for support users #3461

Comments

AndrewRTsao commented Nov 10, 2023

AndrewRTsao commented Nov 11, 2023

begelundmuller commented Nov 12, 2023

begelundmuller commented Jan 8, 2024