If Processor stopped, should not put records any more #10

blling · 2018-08-02T01:27:57Z

If the Processor is stopped, BlockingQueueConsumer should not put record into Processor at all.
https://github.com/datanerds-io/verteiler/blob/develop/verteiler/src/main/java/io/datanerds/verteiler/BlockingQueueConsumer.java#L113

The text was updated successfully, but these errors were encountered:

larsp · 2018-08-02T14:34:24Z

Hello @blling, thanks for raising the issue!

There is certainly room for improvement in the synchronization during reassignments. 😄 Your issue made me look into it in more detail.
Could you elaborate on the side effects and problem you are seeing? Any duplicate data processing? Wrong offset commits? How large is the queue/buffer for each processor?

A Processor is usually stopped when a partition is revoked. The Partitions will be revoked once all data via poll has been consumed. (See https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/ConsumerRebalanceListener.html#onPartitionsRevoked-java.util.Collection-) After assignment of partitions corresponding processors are created, so all received messages (via poll) should find their respective processor.

In case a Processor stops as a result of an exception during processing, the processor queue will fill up and block, no new messages are processed and therefore no offsets for that partition should be altered.

Thanks

Lars

blling · 2018-08-03T08:33:32Z

@larsp I have made some failure test (#11 ) for discussion.

I think:

a. handleMessageWithExceptionTest fail is because the processor queue fill up and finally blocked ConsumerRecordRelay thread at here https://github.com/datanerds-io/verteiler/blob/develop/verteiler/src/main/java/io/datanerds/verteiler/ConsumerRecordRelay.java#L38, so ConsumerRecordRelay thread could not reach here https://github.com/datanerds-io/verteiler/blob/develop/verteiler/src/main/java/io/datanerds/verteiler/ConsumerRecordRelay.java#L48, then kafka consumer of ConsumerRecordRelay will not be closed and kafka will not do rebanlace. Finally the message is dead in the blocked queue.

b. stopOneOfTwoConsumerTest is something same as reassignmentTest, but i do not know why it failed. Maybe it is something about Wrong offset commits?

larsp · 2018-08-03T12:52:18Z

Thanks for providing the tests. I will look into it 👍

larsp · 2018-08-03T15:53:16Z

Thanks again for providing the tests @blling

a.

Usually I would expect the processing code to be stable, so that it is not throwing any unchecked exceptions. If such exception happens it results in an error log which ideally should trigger an alert in your application monitoring setup.

I would argue that this exception is probably something lower level from the file system or the database, thus one could assume that other consumers/ processors are running into the same situation and eventually run into the same problem.

Let me think about it for a while and see if stopping the consumer on processing errors is something which should be added.

b.

In the reassignmentTest you can see that in the assertion of the counter value is >= the number of messages. In your test you expect it to be equal. In order to make sure that the count is equal, the producer and consumer need to be configured to support exactly once delivery semantics (https://kafka.apache.org/documentation/#upgrade_11_exactly_once_semantics)

The test will pass once "exactly once" has been enabled:
Producer:
enable.idempotence=true

Consumer:
isolation.level=read_committed

Since (in my experience) exactly once comes with a performance penalty I would try to make sure my message stream and processing is idempotent: Consuming a message multiple times will lead to the same result as consuming it once.

Hope that helped

Thanks

Lars

blling · 2018-08-04T05:10:50Z

@larsp thanks for the details, it useful for me.
I pushed a fix for a in #11 , you can review if necessary :)

blling mentioned this issue Aug 3, 2018

Some failure test #11

Open

blling closed this as completed Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If Processor stopped, should not put records any more #10

If Processor stopped, should not put records any more #10

blling commented Aug 2, 2018 •

edited

Loading

larsp commented Aug 2, 2018 •

edited

Loading

blling commented Aug 3, 2018 •

edited

Loading

larsp commented Aug 3, 2018

larsp commented Aug 3, 2018 •

edited

Loading

blling commented Aug 4, 2018

If Processor stopped, should not put records any more #10

If Processor stopped, should not put records any more #10

Comments

blling commented Aug 2, 2018 • edited Loading

larsp commented Aug 2, 2018 • edited Loading

blling commented Aug 3, 2018 • edited Loading

larsp commented Aug 3, 2018

larsp commented Aug 3, 2018 • edited Loading

blling commented Aug 4, 2018

blling commented Aug 2, 2018 •

edited

Loading

larsp commented Aug 2, 2018 •

edited

Loading

blling commented Aug 3, 2018 •

edited

Loading

larsp commented Aug 3, 2018 •

edited

Loading