Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better visibility of stored files and their shard status #133

Open
scottyeager opened this issue Nov 23, 2024 · 3 comments
Open

Better visibility of stored files and their shard status #133

scottyeager opened this issue Nov 23, 2024 · 3 comments

Comments

@scottyeager
Copy link

Currently there is no way to query the list of stored files and also no way to see the health of individual files in terms of how many shards they have stored in healthy backends. This makes it difficult to assess whether the system is in a degraded state. It's especially relevant when recovering from some backend failure to be able to check if all files have been rebuilt onto newly supplied backends. Being able to see a list of stored files is also helpful for general inspection of the system without needing to run lots of check commands and also keep a separate list of files that have been removed from local storage.

So I'm thinking of something like this:

  1. A list command that lists the stored files
  2. Some way of outputting the number of shards present for a given file in live backends (this could be part of list or check or both)
  3. At least one Prometheus metric that helps to understand whether the files are, overall, in a degraded state or not (do they have expected shards available, if not do they at least have minimal shards available)
@iwanbk
Copy link
Member

iwanbk commented Nov 25, 2024

This makes it difficult to assess whether the system is in a degraded state.

While i agree that it is something that should be improved.
I don't think that assessing by listing all stored files is a good idea for these reasons:

  • The list could be very long
  • Human eyes is not a trusted tool to check that long list of stored files

i think exposing repair/rebuild queue would be enough

@scottyeager
Copy link
Author

I can do without the list command, though I do think it would be handy for both human and machine consumption under different circumstances.

Exposing info on the repair queue would be fine. One thing I think is important though is that there's a way to get at the info both from CLI and via Prometheus.

@iwanbk
Copy link
Member

iwanbk commented Nov 25, 2024

One thing I think is important though is that there's a way to get at the info both from CLI and via Prometheus.

yes, fully agree with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants