-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Records are missing when second rebalance happens. #1051
Comments
One more observation: |
I think that it may be related: |
I think I found the potential issue. I've noticed that above issues are appearing when some of the partitions were paused before rebalance. In Kafka 3.2.0 there is a special INFO message logs them:
Those partitions were then re-assigned, but if we take a look at example above:
Fetching for those partitions, after rebalance, hasn't been started from What I think that happened is that missing records were dropped during revoke as part of KafkaConsumerActor defined callback, too eagerly. |
@MrKustra94 nice research! Thank you. Also, I wonder why do we need this |
I think it was introduced in order to increase the general fairness and throughput on slow consumers. |
I know this issue is quite old now, but I think we're facing this same issue. We also see lots of similar FS2 Kafka logs to the OP about pausing/revoking etc |
Hey,
during deployments we have noticed a strange issue with records consumption.
Versions
FS2-Kafka Version = 3.0.0-M7 (observed also for M4)
Kafka client version = 3.2.0
Background
Let's assume that we are working with topic called
topic
, which has100
partitions.Our consumer is running as a single pod (
pod-1
). It is consuming all 100 partitions. During Kubernetes rolling deployments another instance is created, let's call itpod-2
. First rebalance is triggered, makingpod-1
andpod-2
consuming 50 partitions each.Let's assume that:
pod-1
is consuming partitions 0, 1, 2, 3, ...., 49pod-2
is consuming partitions 50, 51, 52, 53, ...., 99After few seconds,
pod-1
gets shutdown andpod-2
is the only working pod. Second rebalance gets triggered and nowpod-2
is consuming all partitions.What we have noticed is that for partitions, which were previously assigned, some messages are not consumed just after second re-assignment.
More examples below.
Code
This issue is really hard to reproduce. It happens very rarely.
Example issued output
The text was updated successfully, but these errors were encountered: