Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node crashed due to DB corruption #808

Closed
adrianromanbc opened this issue Dec 13, 2019 · 5 comments
Closed

Node crashed due to DB corruption #808

adrianromanbc opened this issue Dec 13, 2019 · 5 comments
Assignees
Labels

Comments

@adrianromanbc
Copy link

Node address: 7ceb849f04694fc1a4c35bc099ef66b2bb7f753a56e23177643bda10eef02873
Saved DB : http://elrond.community/internal_key_0.tar.gz

INFO [2019-12-13 08:04:15] starting node version=v1.0.61-0-ge6dbedbc/go1.13.3/linux-amd64 pid=15203
INFO [2019-12-13 08:04:15] start time formatted=Wed Dec 11 17:00:00 CET 2019 seconds=1576080000
INFO [2019-12-13 08:04:15] shard info started in shard=3
WARN [2019-12-13 08:04:16] No views for current node
panic: leveldb: internal key "", len=0: invalid length [recovered]
panic: leveldb: internal key "", len=0: invalid length [recovered]
panic: leveldb: internal key "", len=0: invalid length
db_corrupted

@iulianpascalau
Copy link
Contributor

We have managed to get a similar corrupted DB. This needs investigation.

@chainum
Copy link

chainum commented Dec 16, 2019

@iulianpascalau - do you know when your DBs got corrupted?

Not entirely sure if it's related to my attack, but sharing anyways.

I ran my spam tool the following days/time:

  • 11th of December - around 15:30 UTC - 15:55 UTC (initial test to test the code - seems quite a few nodes started to struggle because of it)
  • 12th of December - around 12:30 UTC - 13:00 UTC (testing the code again after massive refactoring - plenty of reports on Elrond Validators on Telegram about high node memory usage). Started it again around 15:00 UTC for around 10 minutes since people asked about it in the Telegram chat - more reports of high memory usage and node restarts.
  • 13th of December - started the tool again during my attack slot time around 09:25 UTC and ran it until 10:00 UTC (stopped by request of Lucian). Started the tool again around 10:50 UTC and ran it until 11:35 UTC when Lucian asked me to stop it again.

The corrupted DB issue could potentially be related to the spam I did in #799 since nodes were automatically killed by the OS/OOM-reaped. Such unclean shutdowns can lead to file corruption.

Might be worthwhile comparing the timestamps of when your internal node DB:s started to get corrupted to when I ran my tool (+ ~1 hour after I stopped it because of delayed propagated txs).

@adrianromanbc
Copy link
Author

The DB was got corrupted Friday 13th of December around 06:30 UTC

@iulianpascalau
Copy link
Contributor

Ok, thanks for the info. We have suspected force shutdown to be the trigger of DB corruption. The DB saved from you guys will be used to test if there is a possibility to always check for error when we try to open a DB and (if possible) to make level DB rebuilt its indexes and recover as much as possible.

@iulianpascalau
Copy link
Contributor

Fixed, reference PR #1425 (node will detect and will try to fix at runtime a corrupted DB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants