Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair namespace #686

Open
matthiasdeblock opened this issue Mar 28, 2017 · 10 comments
Open

Repair namespace #686

matthiasdeblock opened this issue Mar 28, 2017 · 10 comments

Comments

@matthiasdeblock
Copy link

matthiasdeblock commented Mar 28, 2017

Is it possible to implement an option to repair a single namespace? Or an option where you can ask the maintenance agent to start repair all namespace below disk safety < n >.

Via the healthcheck we get an error message of the disk safety when a namespace is at disk safety 0. We get a warning message once the disk safety is greater then 0 en smaller then the maximum disk safety.

@matthiasdeblock matthiasdeblock changed the title Repair single namespace Repair namespace Mar 28, 2017
@wimpers
Copy link

wimpers commented Mar 28, 2017

@domsj doesn't the maintenance agent repairs the objects with lowest disk safety first automatically?

@domsj
Copy link
Contributor

domsj commented Mar 28, 2017

@wimpers yes it does so, within the context of a namespace, for all namespaces in parallel.
It doesn't prioritize repair for what are globally (across all namespaces) the objects with lowest disk safety.

@wimpers
Copy link

wimpers commented Jun 8, 2017

Please implement a cli command to repair a certain namespace. I don't think this should be part of the maintenance process.

@wimpers wimpers modified the milestones: I, H Jun 8, 2017
@domsj
Copy link
Contributor

domsj commented Jun 8, 2017

Just to be sure: you think data safety is not a main concern for alba, and that this responsibility should be shifted towards the user (or the framework)?

@wimpers
Copy link

wimpers commented Jun 8, 2017

Let me clarify,

The maintenance agent should be adjusted so it repairs objects with lowest safety globally first.
The maintenance agent should not provide functionality to fix a certain namespace.
A cli needs to be provides so admins can fix a certain namespace by means of a CLI.

@domsj does that clear things up ...

@toolslive
Copy link
Member

Some things are so important you don't want to leave them to admins.
If a namespace is more important than another namespace, then its preset should reflect that.
If there are 2 objects in subawesome shape, it's their safety that tells you which one should get priority, and not it's namespace_id

@domsj
Copy link
Contributor

domsj commented Jun 8, 2017

@wimpers ok, so you want 2 changes. so split up the ticket?

regarding the first change (repair globally weakest object first):
Can you settle for "a maintenance process repairs the weakest object that it is responsible for"?
Currently maintenance work is sharded over the available maintenance processes, based on the modulo of the namespace_id.
(I don't see yet how to efficiently implement repairing the globally weakest object in combination with sharding the maintenance work, hence the proposed alternative.)

regarding the second change (repair a certain namespace using the cli):
what is the point of having this? especially if we would implement the first change?
(I think this one isn't a lot of work though)

@wimpers
Copy link

wimpers commented Jun 9, 2017

Can you settle for "a maintenance process repairs the weakest object that it is responsible for"?

Yes, lets' do that in this ticket.

@wimpers
Copy link

wimpers commented Jun 9, 2017

regarding the second change (repair a certain namespace using the cli):
what is the point of having this? especially if we would implement the first change?

I can imagine a case where OPS want to fix a certain namespace as it has higher importance (remember we don't allows to move vDisks between vPools and the importance may change). @jeroenmaelbrancke @matthiasdeblock @jtorreke pelase advise

@wimpers wimpers modified the milestones: I, H Jun 29, 2017
@wimpers wimpers removed their assignment Oct 19, 2017
@wimpers wimpers modified the milestones: I, J Oct 19, 2017
@matthiasdeblock
Copy link
Author

Possible case

Last week, due to adding huge amount of ASD's to a certain backend, we had to stop the maintenance agent for that certain backend. That was due to the possible high load on the re-balance.
Due to a ASD vm that went down (broken SD-card), a namespace lost a fragment and came in a 'only 15 of the 16 fragments available' state.
If we get into a state where we had to stop the maintenance agent for a certain backend, it could come in handy to maybe only repair a single namespace by hand. Which deploys a single maintenance process just to repair that namespace.

@wimpers wimpers removed this from the J milestone Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants