-
Notifications
You must be signed in to change notification settings - Fork 70
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Kafka Replicator Source & Quix Environment Source (#448)
- Loading branch information
1 parent
102494e
commit d09e7fa
Showing
20 changed files
with
1,104 additions
and
46 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Kafka Replicator Source | ||
|
||
A source that reads data from a Kafka topic and produce it to another Kafka topic. The two topics can be located on different Kafka clusters. | ||
|
||
This source supports exactly-once guarantees. | ||
|
||
## How to use the Kafka Replicator Source | ||
|
||
To use a Kafka Replicator source, you need to create an instance of `KafkaReplicatorSource` and pass it to the `app.dataframe()` method. | ||
|
||
```python | ||
from quixstreams import Application | ||
from quixstreams.sources import KafkaReplicatorSource | ||
|
||
def main(): | ||
app = Application() | ||
source = KafkaReplicatorSource( | ||
name="my-source", | ||
app_config=app.config, | ||
topic="source-topic", | ||
broker_address="source-broker-address" | ||
) | ||
|
||
sdf = app.dataframe(source=source) | ||
sdf.print(metadata=True) | ||
|
||
app.run(sdf) | ||
|
||
if __name__ == "__main__": | ||
main() | ||
``` | ||
|
||
## Topic | ||
|
||
The Kafka Replicator source only deals with bytes. It reads the remote keys and values as bytes and produces them directly as bytes. | ||
You can configure the key and value deserializer used by the Streaming Dataframe with the `key_deserializer` and `value_deserializer` paramaters. | ||
|
||
## Consumer group | ||
|
||
The Kafka Replicator consumer group is the source name prefixed by `source-`. Changing the name will reset the source state and it will re-replicate the data based on the configured `auto_offset_reset`. It is not based on the application consumer group, changing the application consumer group will not reset the source. | ||
|
||
For more information about consumer group [see the glosary](https://quix.io/docs/kb/glossary.html?h=consumer+group#consumer-group) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Quix Environment Source | ||
|
||
A specialised [Kafka Source](kafka-source.md) that simplify copying data from a Quix environment. | ||
|
||
## How to use the Quix Environment Source | ||
|
||
To use a Quix Environment source, you need to create an instance of `QuixEnvironmentSource` and pass it to the `app.dataframe()` method. | ||
|
||
```python | ||
from quixstreams import Application | ||
from quixstreams.sources import QuixEnvironmentSource | ||
|
||
def main(): | ||
app = Application() | ||
source = QuixEnvironmentSource( | ||
name="my-source", | ||
app_config=app.config, | ||
topic="source-topic", | ||
quix_sdk_token="quix-sdk-token", | ||
quix_workspace_id="quix-workspace-id", | ||
) | ||
|
||
sdf = app.dataframe(source=source) | ||
sdf.print(metadata=True) | ||
|
||
app.run(sdf) | ||
|
||
if __name__ == "__main__": | ||
main() | ||
``` | ||
|
||
## Token | ||
|
||
The Quix Environment Source requires the sdk token of the source environment. [Click here](../../../develop/authentication/streaming-token.md) for more information on SDK tokens. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
from .checkpoint import Checkpoint | ||
from .checkpoint import Checkpoint, BaseCheckpoint | ||
from .exceptions import InvalidStoredOffset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
from .kafka import * | ||
from .quix import * |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
from confluent_kafka import TopicPartition, KafkaException | ||
|
||
from typing import List | ||
|
||
from quixstreams.checkpointing import BaseCheckpoint | ||
from quixstreams.checkpointing.exceptions import ( | ||
CheckpointProducerTimeout, | ||
CheckpointConsumerCommitError, | ||
) | ||
from quixstreams.models.topics import Topic | ||
from quixstreams.rowconsumer import Consumer | ||
from quixstreams.rowproducer import RowProducer | ||
|
||
|
||
class Checkpoint(BaseCheckpoint): | ||
""" | ||
Checkpoint implementation used by the KafkaReplicatorSource | ||
""" | ||
|
||
def __init__( | ||
self, | ||
producer: RowProducer, | ||
producer_topic: Topic, | ||
consumer: Consumer, | ||
commit_interval: float, | ||
commit_every: int = 0, | ||
flush_timeout: float = 10, | ||
exactly_once: bool = False, | ||
): | ||
super().__init__(commit_interval, commit_every) | ||
|
||
self._producer = producer | ||
self._producer_topic = producer_topic | ||
self._consumer = consumer | ||
self._flush_timeout = flush_timeout | ||
self._exactly_once = exactly_once | ||
|
||
if self._exactly_once: | ||
self._producer.begin_transaction() | ||
|
||
def close(self): | ||
""" | ||
Perform cleanup (when the checkpoint is empty) instead of committing. | ||
Needed for exactly-once, as Kafka transactions are timeboxed. | ||
""" | ||
if self._exactly_once: | ||
self._producer.abort_transaction() | ||
|
||
def commit(self): | ||
""" | ||
Commit the checkpoint. | ||
This method will: | ||
1. Flush the producer to ensure everything is delivered. | ||
2. Commit topic offsets. | ||
""" | ||
unproduced_msg_count = self._producer.flush(self._flush_timeout) | ||
if unproduced_msg_count > 0: | ||
raise CheckpointProducerTimeout( | ||
f"'{unproduced_msg_count}' messages failed to be produced before the producer flush timeout" | ||
) | ||
|
||
offsets = [ | ||
TopicPartition( | ||
topic=self._producer_topic.name, | ||
partition=partition, | ||
offset=offset + 1, | ||
) | ||
for (_, partition), offset in self._tp_offsets.items() | ||
] | ||
self._tp_offsets = {} | ||
|
||
try: | ||
self._commit(offsets=offsets) | ||
except KafkaException as e: | ||
raise CheckpointConsumerCommitError(e.args[0]) from None | ||
|
||
def _commit(self, offsets: List[TopicPartition]): | ||
if self._exactly_once: | ||
self._producer.commit_transaction( | ||
offsets, self._consumer.consumer_group_metadata() | ||
) | ||
else: | ||
partitions = self._consumer.commit(offsets=offsets, asynchronous=False) | ||
for partition in partitions: | ||
if partition.error: | ||
raise CheckpointConsumerCommitError(partition.error) |
Oops, something went wrong.