Node crashed due to DB corruption #808

adrianromanbc · 2019-12-13T08:15:06Z

Node address: 7ceb849f04694fc1a4c35bc099ef66b2bb7f753a56e23177643bda10eef02873
Saved DB : http://elrond.community/internal_key_0.tar.gz

INFO [2019-12-13 08:04:15] starting node version=v1.0.61-0-ge6dbedbc/go1.13.3/linux-amd64 pid=15203
INFO [2019-12-13 08:04:15] start time formatted=Wed Dec 11 17:00:00 CET 2019 seconds=1576080000
INFO [2019-12-13 08:04:15] shard info started in shard=3
WARN [2019-12-13 08:04:16] No views for current node
panic: leveldb: internal key "", len=0: invalid length [recovered]
panic: leveldb: internal key "", len=0: invalid length [recovered]
panic: leveldb: internal key "", len=0: invalid length

iulianpascalau · 2019-12-16T06:58:25Z

We have managed to get a similar corrupted DB. This needs investigation.

chainum · 2019-12-16T08:39:42Z

@iulianpascalau - do you know when your DBs got corrupted?

Not entirely sure if it's related to my attack, but sharing anyways.

I ran my spam tool the following days/time:

11th of December - around 15:30 UTC - 15:55 UTC (initial test to test the code - seems quite a few nodes started to struggle because of it)
12th of December - around 12:30 UTC - 13:00 UTC (testing the code again after massive refactoring - plenty of reports on Elrond Validators on Telegram about high node memory usage). Started it again around 15:00 UTC for around 10 minutes since people asked about it in the Telegram chat - more reports of high memory usage and node restarts.
13th of December - started the tool again during my attack slot time around 09:25 UTC and ran it until 10:00 UTC (stopped by request of Lucian). Started the tool again around 10:50 UTC and ran it until 11:35 UTC when Lucian asked me to stop it again.

The corrupted DB issue could potentially be related to the spam I did in #799 since nodes were automatically killed by the OS/OOM-reaped. Such unclean shutdowns can lead to file corruption.

Might be worthwhile comparing the timestamps of when your internal node DB:s started to get corrupted to when I ran my tool (+ ~1 hour after I stopped it because of delayed propagated txs).

adrianromanbc · 2019-12-16T10:12:19Z

The DB was got corrupted Friday 13th of December around 06:30 UTC

iulianpascalau · 2019-12-23T08:07:17Z

Ok, thanks for the info. We have suspected force shutdown to be the trigger of DB corruption. The DB saved from you guys will be used to test if there is a possibility to always check for error when we try to open a DB and (if possible) to make level DB rebuilt its indexes and recover as much as possible.

iulianpascalau · 2020-05-15T10:53:36Z

Fixed, reference PR #1425 (node will detect and will try to fix at runtime a corrupted DB)

iulianpascalau self-assigned this Dec 16, 2019

iulianpascalau added type:bug Something isn't working type:needs-investigation P1 labels Dec 16, 2019

iulianpascalau closed this as completed May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node crashed due to DB corruption #808

Node crashed due to DB corruption #808

adrianromanbc commented Dec 13, 2019

iulianpascalau commented Dec 16, 2019

chainum commented Dec 16, 2019 •

edited

Loading

adrianromanbc commented Dec 16, 2019

iulianpascalau commented Dec 23, 2019

iulianpascalau commented May 15, 2020

Node crashed due to DB corruption #808

Node crashed due to DB corruption #808

Comments

adrianromanbc commented Dec 13, 2019

iulianpascalau commented Dec 16, 2019

chainum commented Dec 16, 2019 • edited Loading

adrianromanbc commented Dec 16, 2019

iulianpascalau commented Dec 23, 2019

iulianpascalau commented May 15, 2020

chainum commented Dec 16, 2019 •

edited

Loading