Dead letter table #233

tabmatfournier · 2024-04-09T02:00:52Z

Implements an (optional) dead letter table where failed messages go to a dedicated Iceberg table. This functionality aims to improve and simplify the error handling provided by Kafka Connect. Kafka Connect's dead letter queue only handles deserialization and SMT failures and writes to another Kafka topic where it requires additional engineering effort to inspect and recover messages. With this PR, errors are written to a dedicated Iceberg Table where messages can be inspected and recovered using tools users may be more comfortable with (Spark, etc). The table row contains everything required to convert a row back into a Kafka ProducerRecord; however, the functionality to do this is engine specific and not provided in this PR.

Location of Failure	Kafka Connect DLQ	This PR
Deserialization/Converter	Yes	Yes
SMT	Yes	Yes
Table creation / schema issues	No	Yes
Iceberg record conversion	No	Yes
Malformed records (e.g. missing table route information)	No	Yes
Schema evolution issues	No	Yes

This PR aims to minimize stream processing exceptions from imperfect producers by writing to the Dead Letter Table rather than failing constantly and causing rebalances inside of the Kafka Cluster which can negatively affect other jobs.

It is comprised of two components:

An ErrorTransform SMT that wraps the Deserializer and zero or more SMTs
Changes to Worker.java / IcebergWriterFactory.java to catch issues around table creation, schema parsing, and Iceberg record conversion

Not all errors result in conversion to records for the Dead Letter Table. For example, network/connection errors thrown during table operations w/ the underlying Catalog will still fail the connector (as a form of retry when the connector is restarted).

This is opt in. Users can decide not to use this, use the Kafka Connect DLQ, ignore errors, or fail on all errors just like previous functionality.

Error Transform

Kafka Connects value, key, and header converters must be ByteArrayConverters. The desired converters (AvroConverter, JsonConverter, etc.) are supplied to the ErrorTransform SMT along with any other SMTs.

On deserialization success and SMT transformation success: A special record containing a Map<String, Object> is constructed that contains the deserialized and transformed SinkRecord as well as the original key, value, and header bytes of the message.
On deserialization failure OR SMT transformation failure: a SinkRecord of Struct is created containing failure metadata such as kafka metadata, exception, stack trace, and original key, value, and header bytes.

Changes to `Worker.java` / `IcebergWriterFactory.java`

When configured to use the DeadLetterTable the connector expects to messages to be in the shape of the data from the ErrorTransform SMT. Failed records from the SMT will be written to the Dead Letter Table. Successfully transformed SinkRecords will attempt the normal connector flow. If it fails for non-transient reasons, the original key, value, and header bytes in the specially transformed record are used to construct a SinkRecord for the Dead Letter Table with the required Kafka and error metadata before being written via normal table fanout.

Limitations

This is the first PR. Additional work is required for some advanced Converters such as the AvroConverter, where finely grained exception handling needs to be implemented to differentiate between real Avro errors (e.g. the message is not valid Avro bytes or the message does not have an entry in the Schema registry) and network related Avro exceptions (e.g. contacting the Schema registry times out). This is planned in future PRs. In the interim, an ErrorHandler class is exposed as a config option for both converters and SMTs and can be extended by users to implement the required error handling (and rethrowing) for advanced Converters / custom Converters / etc.

- implements dead letter table handling as a config option

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/IcebergWriterFactory.java

...-connect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/DeadLetterUtils.java

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java

fqtab

I have more comments but I'm struggling to get past some of the limitations of the current approach like the fixed schema. I have a different take on the problem that I would strongly like for us to consider:

Exceptions happening within SinkTask.put would be captured by a user configured WriteExceptionHandler and handled there as the user wants (write to a dead-letter-table, kafka, log it, whatever the user wants)
Converter/SMT exceptions (i.e. things before SinkTask.put), users should configure the connector in iceberg.tables.dynamic-enabled with a iceberg.tables.route-field and write an exception-handling-SMT that points to the dead-letter-table.

See #243 #244 for draft implementations of the idea

...-connect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/DeadLetterUtils.java

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/channel/Worker.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/CatalogApi.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/IcebergWriter.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/channel/Worker.java

- substantiall reworks the PR - error transform and connector are connected via FailedRecordFactory - users can plug in their own schema shape for the failed records - users can dispatch to whatever dead letter table they want - bunch of classes moved around - pluggable WriteExceptionHandler introducered to catch failures - lots of code updated with custom WriteExceptions meant to be caught by the WriteExceptionHandler - works with or without the ErrorTransform in play

tabmatfournier · 2024-04-26T23:42:44Z

Substantial re-work. Still need to add tests for ErrorHandlingRecordRouter but the code is more or less there now.

tabmatfournier · 2024-04-26T23:43:14Z

I have more comments but I'm struggling to get past some of the limitations of the current approach like the fixed schema. I have a different take on the problem that I would strongly like for us to consider:

Exceptions happening within SinkTask.put would be captured by a user configured WriteExceptionHandler and handled there as the user wants (write to a dead-letter-table, kafka, log it, whatever the user wants)

Converter/SMT exceptions (i.e. things before SinkTask.put), users should configure the connector in iceberg.tables.dynamic-enabled with a iceberg.tables.route-field and write an exception-handling-SMT that points to the dead-letter-table.

See #243 #244 for draft implementations of the idea

Largely been addressed in the latest update, appreciate the feedback and discussions we've had.

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/RecordRouter.java

tabmatfournier · 2024-05-13T21:55:56Z

Need to add a config for third mode. Look at IntegrationMultiTableTest where both Iceberg.tables and a regex is set --I can't differentiate this case from the static routing (with dynamic fallback for dead letter routing). HRM.

README.md

ron-trail · 2024-05-23T09:04:01Z

README.md

@@ -322,6 +325,114 @@ See above for creating the table
 }
 ```

+## Dead Letter Table 
+


Example config with Dead Letter will be very useful

Docs can always be improved. I'll try to take another stab at this.

This is a big feature with a somewhat clunky coinfig API due to config visibility rules in Kafka Connect, so more docs/examples certainly help.

ron-trail · 2024-05-23T09:04:42Z

Super useful functionality, thanks!

...nect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/FailedRecordFactory.java

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java

...ransforms/src/main/java/io/tabular/iceberg/connect/transforms/TransformExceptionHandler.java

...-transforms/src/main/java/io/tabular/iceberg/connect/transforms/DefaultExceptionHandler.java

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java

fqtab · 2024-05-31T16:29:16Z

...-connect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/DeadLetterUtils.java

+  public static final String KEY_HEADER = "t_original_key";
+  public static final String VALUE_HEADER = "t_original_value";


I believe you're de-risking the chance of a collision with an existing header by prefixing a t_
More out of curiosity, what does the t_ stand for?
And wondering if we can do a little better.

Also, we could de-risk collisions more by just adding a single header t_original_record which is a Struct with this kind of structure (psuedo-code) instead of adding 3 separate headers:

Struct { OPTIONAL_BYTES_SCHEMA key, OPTIONAL_BYTES_SCHEMA value, OPTIONAL_ARRAY_HEADER_SCHEMA headers, }

nit: I would also name the header something specific to iceberg-kafka-connect IDK something along the lines of kafka.connect.iceberg.error.transform.original.record or something (obviously this is too long but you get the idea).

You are correct in derisking collisions. I chose t for tabular.

I chose t for tabular.

🤦 should have figured that one out ....

fqtab · 2024-05-31T19:07:06Z

kafka-connect-deadletter/build.gradle

+  implementation libs.iceberg.core
+  implementation libs.iceberg.common
+  implementation libs.iceberg.guava


You don't need any of the actual iceberg functionality in this module?

Suggested change

implementation libs.iceberg.core

implementation libs.iceberg.common

implementation libs.iceberg.guava

The only thing you do need is this import :D

import org.apache.iceberg.relocated.com.google.common.collect.Lists; List<Struct> headers = Lists.newArrayList();

Which IMO you can just replace with this safely.

import java.util.ArrayList; @SuppressWarnings("RegexpSingleline") List<Struct> headers = new ArrayList<>();

...adletter/src/main/java/io/tabular/iceberg/connect/deadletter/DefaultFailedRecordFactory.java

...nect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/FailedRecordFactory.java

...adletter/src/main/java/io/tabular/iceberg/connect/deadletter/DefaultFailedRecordFactory.java

...nect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/FailedRecordFactory.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/DefaultWriteExceptionHandler.java

...adletter/src/main/java/io/tabular/iceberg/connect/deadletter/DefaultFailedRecordFactory.java

...nect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/FailedRecordFactory.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/WriteExceptionHandler.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/RecordRouter.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/IcebergSinkConfig.java

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/DefaultWriteExceptionHandler.java

fqtab · 2024-06-04T14:58:04Z

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/DefaultWriteExceptionHandler.java

+import org.apache.kafka.connect.sink.SinkRecord;
+import org.apache.kafka.connect.sink.SinkTaskContext;
+
+public class DefaultWriteExceptionHandler implements WriteExceptionHandler {


That's concerning because that means other users defining their own WriteExceptionHandler implementations can't reference those exceptions either?

- simplify original bytes location - clean up unused methods, values - simplify write exception interface - move failed record handler initialization to be write exception handler specific - clean up error transform converters - close SMT and converters

ron-trail · 2024-06-17T16:09:08Z

Hi, any ETA for this feature?

okayhooni · 2024-06-18T04:02:24Z

Thank you for implementing this valuable feature..!

It will solve the many issues like below

preetpuri · 2024-10-09T03:10:38Z

Hi, do you have an ETA for when this PR will be merged?

Daesgar · 2024-10-14T09:34:47Z

This would be much appreciated to have as we are dealing with it and skipping the problematic offset leads to loss of data. 😞

tabmatfournier added 9 commits April 4, 2024 09:58

dead-letter-table

03c3e36

- implements dead letter table handling as a config option

next steps

e87d820

moved transform in

b062c51

more refactoring

9b7f6b0

even more refactoring

c1b4de2

cruft

4e706e1

even more cruft

f9136d8

tests

d3905a5

table create exceptions

18c79ba

tabmatfournier commented Apr 9, 2024

View reviewed changes

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/IcebergWriterFactory.java Outdated Show resolved Hide resolved

tabmatfournier commented Apr 9, 2024

View reviewed changes

...-connect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/DeadLetterUtils.java Outdated Show resolved Hide resolved

tabmatfournier commented Apr 9, 2024

View reviewed changes

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java Show resolved Hide resolved

tabmatfournier commented Apr 9, 2024

View reviewed changes

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java Outdated Show resolved Hide resolved

tabmatfournier added 4 commits April 9, 2024 11:47

introduces identifier column, more error handle wrapping

77aad4b

make converter/smt error converters configurable/user extensible

edc75a5

introduce catalogApi to make IcebergWriterFactory testing easier

8ee6840

catalogApi and test stubs

de479d3

tabmatfournier commented Apr 9, 2024

View reviewed changes

...a-connect-transforms/src/main/java/io/tabular/iceberg/connect/transforms/ErrorTransform.java Outdated Show resolved Hide resolved

finished tests

b78a68d

tabmatfournier marked this pull request as ready for review April 10, 2024 15:48

negate

ac5d6a5

tabmatfournier requested a review from fqtab April 10, 2024 17:45

fqtab reviewed Apr 23, 2024

View reviewed changes

This was referenced Apr 23, 2024

DRAFT: Dead letter implementations #244

Closed

Add WriteExceptionHandler #243

Closed

tabmatfournier added 2 commits April 26, 2024 16:49

put null record dropping back into iceberg writer

01f8cbb

fix dead letter utils private constructor

d89f15a

tabmatfournier commented Apr 29, 2024

View reviewed changes

kafka-connect/src/main/java/io/tabular/iceberg/connect/data/RecordRouter.java Show resolved Hide resolved

tabmatfournier added 8 commits April 29, 2024 16:31

Merge branch 'main' into dead-letter-table

831f205

Merge branch 'main' into dead-letter-table

451e20f

post-merge fixes

41c4372

more comments removing cruft

13220e9

regexrecordrouter

797b861

start of fallback mode

7bd7d5d

third mode

bff7233

another test case

25208da

tabmatfournier added 5 commits May 13, 2024 16:10

better regex detection to avoid an extra config

205c2d7

cruft cleanup and starting docs

4aeedcd

fix error transform tests

0cf18d1

more docs

05ea87f

Merge branch 'main' into dead-letter-table

457a2f3

ron-trail reviewed May 23, 2024

View reviewed changes

tabmatfournier mentioned this pull request May 28, 2024

Should S3 503 errors be unrecoverable failures? #257

Open

fqtab reviewed May 31, 2024

View reviewed changes

...nect-deadletter/src/main/java/io/tabular/iceberg/connect/deadletter/FailedRecordFactory.java Outdated Show resolved Hide resolved

fqtab reviewed Jun 4, 2024

View reviewed changes

tabmatfournier added 4 commits June 5, 2024 08:29

Merge branch 'main' into dead-letter-table

1518cc7

dead-letter-table

e4977c4

- simplify original bytes location - clean up unused methods, values - simplify write exception interface - move failed record handler initialization to be write exception handler specific - clean up error transform converters - close SMT and converters

rename module to not include deadletter

4036557

moved writeExceptions into exception module

0ff7972

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dead letter table #233

Dead letter table #233

tabmatfournier commented Apr 9, 2024 •

edited

Loading

fqtab left a comment •

edited

Loading

tabmatfournier commented Apr 26, 2024

tabmatfournier commented Apr 26, 2024

tabmatfournier commented May 13, 2024

ron-trail May 23, 2024

tabmatfournier May 23, 2024

ron-trail commented May 23, 2024

fqtab May 31, 2024

fqtab May 31, 2024

tabmatfournier May 31, 2024

fqtab May 31, 2024

fqtab May 31, 2024

fqtab Jun 4, 2024

ron-trail commented Jun 17, 2024

okayhooni commented Jun 18, 2024 •

edited

Loading

preetpuri commented Oct 9, 2024

Daesgar commented Oct 14, 2024

		public static final String KEY_HEADER = "t_original_key";
		public static final String VALUE_HEADER = "t_original_value";

	implementation libs.iceberg.core
	implementation libs.iceberg.common
	implementation libs.iceberg.guava

Dead letter table #233

Are you sure you want to change the base?

Dead letter table #233

Conversation

tabmatfournier commented Apr 9, 2024 • edited Loading

Error Transform

Changes to Worker.java / IcebergWriterFactory.java

Limitations

fqtab left a comment • edited Loading

Choose a reason for hiding this comment

tabmatfournier commented Apr 26, 2024

tabmatfournier commented Apr 26, 2024

tabmatfournier commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ron-trail commented May 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ron-trail commented Jun 17, 2024

okayhooni commented Jun 18, 2024 • edited Loading

preetpuri commented Oct 9, 2024

Daesgar commented Oct 14, 2024

tabmatfournier commented Apr 9, 2024 •

edited

Loading

Changes to `Worker.java` / `IcebergWriterFactory.java`

fqtab left a comment •

edited

Loading

okayhooni commented Jun 18, 2024 •

edited

Loading