Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Log error without falling over for failed recovery #2049

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

msfstef
Copy link
Contributor

@msfstef msfstef commented Nov 26, 2024

We have seen some issues related to lucaong/cubdb#71 come up during recovery, and it currently causes issues with the whole electric.

Initially my thought was to delete the shape entirely if it fails, but perhaps as an initial workaround we might just want to log the error and continue recovery without the problematic shape?

@KyleAMathews
Copy link
Contributor

If we don't "recover" the shape — what happens if someone requests it?

@msfstef
Copy link
Contributor Author

msfstef commented Nov 26, 2024

@KyleAMathews as far as electric is concerned, the shape does not exist - it just leaves the data on disk rather than cleaning it up (and thus will try to recover it on a subsequent deploy again)

So effectively it's abandoning a shape but leaving it on disk, either for manual cleanup or if we figure out the issue and it's not actually corrupted data it can be restored.

@KyleAMathews
Copy link
Contributor

@msfstef cool then leaving it seems fine — presumably we'll either figure a fix or just move off cubdb making the problem moot).

@msfstef msfstef merged commit e815b91 into main Nov 26, 2024
26 checks passed
@msfstef msfstef deleted the msfstef/clean_unrecoverable_shape branch November 26, 2024 17:02
msfstef added a commit that referenced this pull request Nov 27, 2024
Followup to #2049

We are currently leaving folders behind when deleting shapes, which do
take up a little bit of space and eventually could be a bit of an issue.
This PR takes care of removing everything.

Additionally, when a shape fails to recover by not being able to start
the consumer at all (see CubDB issue), the shape data is removed using a
new method called `unsafe_cleanup!` which simply `rm -rf`s the shape
directory. Not great to use while the shape is active as it can cause
various bugs, but great for cleaning up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants