Quorum queue memory usage, and sensible defaults for node resource requests #1554

tkoft · 2024-02-06T18:58:45Z

tkoft
Feb 6, 2024

Describe the bug

We occasionally find in our GKE rabbitmq cluster (3 nodes) that memory alarms will get triggered, and causing issues with publishes. Digging through docs, I've learned that:

Quorum queues can use up to 512mb memory for write ahead logs
We don't specify resource requests for our cluster, so by default the rabbitmq cluster operator just requests 2Gi memory for each node
RabbitMQ sets a high-watermark threshold of 40% of available memory (655 MiB), and once that's surpassed an alarm is triggered that stops new publishes

It seems like for a cluster using these defaults, a memory alarm will certainly be triggered at some point, even with just one quorum queue? There are reports of some folks seeing this issue unless they lower the default WAL size limit.

I was also unclear on what happens when these alarms are set. The docs say that publishes are blocked, but is that to the offending node only or to the whole cluster? I did find in AWS MQ docs this:

In cluster deployments, queues might experience paused synchronization of messages between replicas on different nodes. Paused queue syncs prevent consumption of messages from queues and must be addressed separately while resolving the memory alarm.

So a couple questions:

Do memory alarms indeed block publishes to all nodes? Or just the one?
If so, why is this the case? Is there any way to mitigate the risk of the entire cluster being paused from one node triggering an alarm?
If not, can a memory alarm on one node cause quorum queue replicas on other nodes to stop accepting publishes too like what's describe in AWS docs?

Regardless, it seems to me that more sensible defaults could be configured here.

To Reproduce

Steps to reproduce the behavior:

Deploy a cluster using RabbitMQ Cluster Kubernetes Operator
Publish and consume a quorum queue on the instance
Observe memory usage increases on a node until memory alarm is set
From memory use reporting on the node, observe that quorum queue tables are what's growing and causing the alarm
Publishes start getting blocked, even though two other nodes are under the high-watermark

Expected behavior
By default, I expect quorum queue WAL size threshold and cluster operator's memory requests to work with each other so memory alarm's aren't triggered by normal usage of quorum queues.

Screenshots

Version and environment information

RabbitMQ: Cluster operator default (3.10.2)
RabbitMQ Cluster Operator: 2.1.0
Kubernetes: 1.26.6-gke.1700
Cloud provider or hardware configuration: GKE autopilot

lukebakken · 2024-02-06T21:56:24Z

lukebakken
Feb 6, 2024
Maintainer

This discussion may be informative - #1537 (reply in thread)

cc @mkuratczyk

0 replies

mkuratczyk · 2024-02-08T08:40:20Z

mkuratczyk
Feb 8, 2024
Maintainer

Indeed, I can reproduce this with the default configuration and just 1 publisher, 1 QQ and 1 consumer:

So I agree that we should change something here. The options are:

Increase the amount of memory assigned to the pods by default - simple, but I recall some people already complaining that the default is relatively high
Change the vm_memory_high_watermark value in the default configuration generated by the Operator;
Change the default vm_memory_high_watermark in RabbitMQ

Personally, I think it's time we changed vm_memory_high_watermark in RabbitMQ. That 40% might have been justified when it was introduced but I really don't think it is these days, with all the advancements in Erlang, the fact that in 3.12+ we don't keep messages in memory in either QQs nor classic queues and so on. I guess RabbitMQ 4.0 could be a good moment to set it to a higher value. However, this is something that will need quite a bit of discussion and 4.0 is months away.

As for the alarms - they block all publishers, since there is no coupling between publishers and queues. With quorum queues, the queue is usually on all nodes anyway, but even with a classic queue - if the queue is on node-0 and that node runs out of memory, we have to block all publishers, because a publisher connected to node-1 may publish a message to a queue on node-0.

I think the whole alarm mechanism is something that will need to be reconsidered to be honest. Again, the assumptions were correct years ago but things change. When classic queues were the only game in town, things were relatively simple - block publishers, allow consumers to consume some of the messages, that releases the memory (or disk) and we can continue. However, since then:

classic queues no longer keep messages in memory, if a node runs out of memory, quite likely consuming messages won't help much, if at all
quorum queues use a bit of memory per-message (even though the payload of the message is not stored in memory) but they also write consumption operations to the WAL and therefore consuming messages also requires some memory (and disk).
Streams are very different again.

So I think the ultimate solution is a change to the default alarm threshold in RabbitMQ. We could introduce some other mitigations sooner in the Operator, but I'm not 100% sold on whether that's worth it. For example - you mentioned you have a 3-node cluster, that's already a difference compared to the default single-node deployment. The same document that says 3-nodes should be used in prod, also talks about memory considerations: https://rabbitmq.com/production-checklist.html#resource-limits-ram

I'm open to counter-arguments but I'm not sure about the value of introducing changes that will be relevant for just a few months and only for a small subset of users. And if we feel like we need a solution/workaround soon, I'd go with a slightly higher pod memory limit that we'll revert back to 2GB later on. Having the default configuration subtly different between Operator deployments and other deployments is confusing for users (who find generic docs) and us, when people report issues.

3 replies

Zerpet Feb 8, 2024
Maintainer

I agree to what's been said here. I wanted to share this from the documentation, just to add some context regarding why the default is 40%.

Erlang's garbage collector can, in the worst case, cause double the amount of memory to be used (by default, 80% of RAM).

I'm not sure if that's applicable to modern Erlang versions. As Michal said, a lot has changed since the memory guide was written.

@tkoft a quick workaround for your issue would be to increase vm_memory_high_watermark.relative to 0.66. That's a magic number I remember from my experience in RabbitMQ support some years ago. The number makes sense, because you assume that Erlang garbage collector will use, at most, 50% of the allocated memory (that is, 33% of the total Pod memory limit).

There's one hidden safeguard built-in by the Operator: we configure RabbitMQ to assume it has 80% of the Pod memory limit. This was to prevent Erlang VM from allocating memory too close, or even over, the Pod memory limit, because that would cause an OOM killed Pod.

tkoft Feb 8, 2024
Author

Is there an option 4 to reduce the default rabbitmq raft.wal_max_size_bytes also? Or is 512mb generally a good number for quorum queues?

mkuratczyk Feb 8, 2024
Maintainer

It is something that can be discussed as well. You can totally do this on your system right away. As for the default change - I guess 0.5GB will remain "the default default" but we can consider lowering it when the available memory is "low". So the default would be something like "0.5GB, unless the watermark is less than 1GB, in which case the default raft.wal_max_size_bytes is 128MB" (just an example, the actual values could be different). I guess we don't want to have too many options here but the current default for most and a lower value for more memory-constrained systems seems reasonable to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quorum queue memory usage, and sensible defaults for node resource requests #1554

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Quorum queue memory usage, and sensible defaults for node resource requests #1554

tkoft Feb 6, 2024

Describe the bug

To Reproduce

Screenshots

Version and environment information

Replies: 2 comments · 3 replies

lukebakken Feb 6, 2024 Maintainer

mkuratczyk Feb 8, 2024 Maintainer

Zerpet Feb 8, 2024 Maintainer

tkoft Feb 8, 2024 Author

mkuratczyk Feb 8, 2024 Maintainer

tkoft
Feb 6, 2024

Replies: 2 comments 3 replies

lukebakken
Feb 6, 2024
Maintainer

mkuratczyk
Feb 8, 2024
Maintainer

Zerpet Feb 8, 2024
Maintainer

tkoft Feb 8, 2024
Author

mkuratczyk Feb 8, 2024
Maintainer