Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mempool usage becomes 100% over time #73

Open
HKalbasi opened this issue Nov 19, 2024 · 7 comments
Open

Mempool usage becomes 100% over time #73

HKalbasi opened this issue Nov 19, 2024 · 7 comments
Labels
documentation Improvements or additions to documentation

Comments

@HKalbasi
Copy link
Contributor

I'm replaying a ~5GB pcap file in a nic and using a modified basic example which uses online configuration. In that setup, I see after each iteration of the replay, mempool usage goes up around ~14%, and so after 7 or 8 replays it becomes 100% and then retina drops everything.

I'm using a mempool with capacity 655350 so it gets ~1.3GB of ram I think. I can increase the amount of mempool, but it just delays the failure,

I can easily reproduce the problem and add logs and monitor configs in retina, or analyze the pcap file and look for things such as unfinished connections.

@thegwan
Copy link
Contributor

thegwan commented Nov 19, 2024

Two questions:

  1. Is your pcap generated by taking a capture of a live traffic stream or built up from individual well-formed connections? If it is the former, I would expect many of the connections to have been cut-off before finishing.
  2. When replaying, are you modifying the 4-tuple of each connection on each replay or is it simply a loop of the same pcap?

@HKalbasi
Copy link
Contributor Author

Yes the pcap file is a capture and has incomplete connections, and it is a naive loop of the same pcap with no modifications. Do the cut-off connections remain indefinitely in the ram? And what happens when a single connection has huge data, like a big download? Would it remain in the ram until it is finished?

@thegwan
Copy link
Contributor

thegwan commented Nov 20, 2024

Do the cut-off connections remain indefinitely in the ram?
In your config file, there should be a section that specifies timeouts for incomplete connections. I believe the current default TCP inactivity timeout is 300 seconds.

And what happens when a single connection has huge data, like a big download? Would it remain in the ram until it is finished?
This depends on what datatype you are subscribed to. If the subscription requires that you return the actual data then yes, that data will sit in memory until either the connection terminates naturally or the callback is invoked. If you are only tracking, say, connection 4-tuples, then the application-layer data will not stay in memory.

Can you say more about how you have modified the basic example? Are you storing packets, connection records, something else?

@HKalbasi
Copy link
Contributor Author

HKalbasi commented Nov 20, 2024

I just changed the config part. That is, removed the line:

let config = default_config();

and replaced it with something like

let config = load_config("config.toml");

@thearossman thearossman added the documentation Improvements or additions to documentation label Nov 21, 2024
@thearossman
Copy link
Collaborator

One thing to add -- there would also be spikes in mempool utilization if there are a number of out-of-order packets that need to be stored for later TCP reassembly. The lightweight stream reassembly (section 5.2) doesn't release mbufs back to the DPDK mempool until gaps are filled or the connection times out.

In addition to toggling the TCP inactivity timeout, could you try reducing the max_out_of_order parameter in the config file to see if that helps? It's set at 500 by default.

Flagging this as a documentation need for now. (It may require a fix in the future.)

@HKalbasi
Copy link
Contributor Author

Reducing the max_out_of_order parameter to 50 resulted in reducing the memory usage by half in the first iteration and it becomes almost constant in 4 or 5 iterations. So now I have a sense about the problem and I hope it doesn't happen in a real world scenario, and if it happens I can reduce the mempool usage by reducing the timeout and max_out_of_order count.

So I can control the memory usage by tuning the config. Is it possible to see how many connections are dropped/failed due these settings?

@thearossman
Copy link
Collaborator

thearossman commented Nov 21, 2024

You could enable warn logs to get a sense of the # of overflows that cause a connection to be dropped:

log::warn!("Out-of-order buffer overflow");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Development

No branches or pull requests

3 participants