Skip to content

Commit

Permalink
updated word count, more standardization/cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
tim-quix committed Dec 6, 2024
1 parent 73b1f72 commit ce2f0d9
Show file tree
Hide file tree
Showing 8 changed files with 145 additions and 92 deletions.
2 changes: 1 addition & 1 deletion docs/tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Besides a Kafka instance to connect to, most of them likely fit the pattern of:

`python ./path/to/producer.py`

`python ./path/to/application.py`
`python ./path/to/tutorial_app.py`


## Running Kafka Locally
Expand Down
6 changes: 4 additions & 2 deletions docs/tutorials/anomaly-detection/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ When this occurs, we want to send alerts as soon as possible so appropriate acti

## Our Example

We will use a `Source` to generate mock temperature events for 3 machines (MACHINE_IDs '0', '1', or '2'); ID's 0 and 1 are functioning normally, 2 is malfunctioning (overheating).
We will use a [Quix Streams `Source`](../../connectors/sources/README.md) to generate mock temperature events for
3 machines (MACHINE_IDs '0', '1', or '2'); ID's 0 and 1 are functioning normally,
2 is malfunctioning (overheating).

These events will be processed by our new Anomaly Detector `Application`.

Expand Down Expand Up @@ -53,7 +55,7 @@ This approach is desirable since temperatures fluctuate quickly; it enables more
- ***All other links provided are completely optional***.
- They are great ways to learn more about various concepts if you need it!

2. This tutorial uses a Quix Streams [`Source`](../../connectors/sources/README.md) rather than a Kafka [`Topic`]() to ingest data.
2. This tutorial uses a [`Source`](../../connectors/sources/README.md) rather than a Kafka [`Topic`]() to ingest data.
- `Source` connectors enable reading data from a non-Kafka origin (typically to get it into Kafka).
- This approach circumvents users having to run a [producer](../../producer.md) alongside the `Application`.
- A `Source` is easily replaced with an actual Kafka topic (just pass a `Topic` instead of a `Source`).
Expand Down
10 changes: 5 additions & 5 deletions docs/tutorials/purchase-filtering/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ necessary information downstream.

## Our Example

We will use a `Source` to generate some mock purchase data to be processed by our
new Purchase Filtering `Application`.
We will use a [Quix Streams `Source`](../../connectors/sources/README.md) to generate some mock purchase data to be
processed by our new Purchase Filtering `Application`.



Expand Down Expand Up @@ -62,7 +62,7 @@ as if it were a dataframe.

## Generating Purchase Data

Our [**>>> Purchase Filtering Application <<<**](application.py) uses a `Source` called
Our [**>>> Purchase Filtering Application <<<**](tutorial_app.py) uses a `Source` called
`PurchaseGenerator` that generates a small static set of "purchases", which are simply
dictionaries with various info about what was purchased by a customer during their visit.

Expand Down Expand Up @@ -100,7 +100,7 @@ kafka_value: {


Now let's go over the `setup_and_run_application()` portion of
our [**>>> Purchase Filtering Application <<<**](application.py) in detail!
our [**>>> Purchase Filtering Application <<<**](tutorial_app.py) in detail!



Expand Down Expand Up @@ -349,7 +349,7 @@ One thing to keep in mind is that the Quix Streams does not log/print any messag
operations by default.

To get visual outputs around message processing, you can either:
- use [recommended way of printing/logging stuff](../../processing.md#debugging)
- use [recommended way of printing/logging with SDF](../../processing.md#debugging)

- use `DEBUG` mode via `Application(loglevel="DEBUG")`
- WARNING: you should NOT run your applications in `DEBUG` mode in production.
16 changes: 12 additions & 4 deletions docs/tutorials/websocket-source/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,26 @@ the data into Kafka first.
## Our Example
This example showcases:

1. Extending the Quix Streams `Source` class to read from the Coinbase Websocket API.
1. Extending the [Quix Streams `Source`](../../connectors/sources/README.md) class to read from the Coinbase Websocket API.
2. Using the new extension (`CoinbaseSource`).

## Before Getting Started

- You will see links scattered throughout this tutorial.
1. You will see links scattered throughout this tutorial.
- Tutorial code links are marked **>>> LIKE THIS <<<** .
- ***All other links provided are completely optional***.
- They are great ways to learn more about various concepts if you need it!

2. This tutorial uses a [`Source`](../../connectors/sources/README.md) rather than a Kafka [`Topic`]() to ingest data.
- `Source` connectors enable reading data from a non-Kafka origin (typically to get it into Kafka).
- This approach circumvents users having to run a [producer](../../producer.md) alongside the `Application`.
- A `Source` is easily replaced with an actual Kafka topic (just pass a `Topic` instead of a `Source`).


## Creating `CoinbaseSource`

Let's take a look at [**>>> CoinbaseSource <<<**](tutorial_app.py) in detail to
understand what modifications to `Source` were necessary.
First Let's take a detailed look at `CoinbaseSource` in our [**>>> Coinbase Application <<<**](tutorial_app.py)
to understand what modifications to `Source` were necessary.

> [!NOTE]
> Check out the [custom Source docs](../../connectors/sources/custom-sources.md)
Expand Down Expand Up @@ -84,6 +89,9 @@ Now that `CoinbaseSource` exists, we can ingest raw data from Coinbase.

Of course, each user will have their own desired product ID's and transformations to apply.

Now let's go over the `setup_and_run_application()` portion of
our [**>>> Coinbase Application <<<**](tutorial_app.py) in detail!

### Define the Source

First, set up a `CoinBaseSource` with our desired `product_ids`.
Expand Down
31 changes: 0 additions & 31 deletions docs/tutorials/word-count/application.py

This file was deleted.

31 changes: 0 additions & 31 deletions docs/tutorials/word-count/producer.py

This file was deleted.

74 changes: 56 additions & 18 deletions docs/tutorials/word-count/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@ We will build a simple word counter, which is a great introduction to Quix Strea

You'll learn how to:

- Create a topic
- Ingest a non-Kafka data source
- Do simple event alterations
- Generate multiple events from a single event
- Filter any undesired events
- Produce events, with new Kafka keys, to a topic
- Create a Kafka topic
- Produce results, with new Kafka keys, to a topic



Expand All @@ -26,17 +27,17 @@ the counts of each individually downstream for further processing.

## Our Example

We will use a simple producer to generate text to be processed by our
new Word Counter application.
We will use a [Quix Streams `Source`](../../connectors/sources/README.md) to generate text to be processed by our
new Word Counter `Application`.

NOTE: Our example uses JSON formatting for Kafka message values.
NOTE: Our example uses `JSON` formatting for Kafka message values.



## Event Expansion

The most important concept we want to explore here is how you can "expand" a single
event into multiple new ones.
event/message into multiple new ones.

More than that: each new event generated via expansion is processed individually
through the remainder of your pipeline, allowing you to write ALL your operations
Expand All @@ -46,20 +47,27 @@ NOTE: Expanding often includes adjusting outgoing Kafka keys as well, so we addi
showcase that.



## Before Getting Started

- You will see links scattered throughout this tutorial.
1. You will see links scattered throughout this tutorial.
- Tutorial code links are marked **>>> LIKE THIS <<<** .
- ***All other links provided are completely optional***.
- They are great ways to learn more about various concepts if you need it!

2. This tutorial uses a [`Source`](../../connectors/sources/README.md) rather than a Kafka [`Topic`]() to ingest data.
- `Source` connectors enable reading data from a non-Kafka origin (typically to get it into Kafka).
- This approach circumvents users having to run a [producer](../../producer.md) alongside the `Application`.
- A `Source` is easily replaced with an actual Kafka topic (just pass a `Topic` instead of a `Source`).



## Generating Text Data

We have a [**>>> Review Producer <<<**](producer.py) that generates a static set of "reviews",
which are simply strings, where the key is the product name.
Our [**>>> Word Counter Application <<<**](tutorial_app.py) uses a `Source` called `ReviewGenerator`
that generates a static set of "reviews", which are simply strings, where the key is the product name.

The Kafka message looks like:
The incoming Kafka messages look like:

```python
# ...
Expand All @@ -71,7 +79,10 @@ The Kafka message looks like:

## Word Counter Application

Now let's go over our [**>>> Word Counter Application <<<**](application.py) line-by-line!
Now let's go over the `setup_and_run_application()` portion of
our [**>>> Word Counter Application <<<**](tutorial_app.py) in detail!



### Create Application

Expand All @@ -90,6 +101,8 @@ First, create the [Quix Streams Application](../../configuration.md), which is o

NOTE: Once you are more familiar with Kafka, we recommend [learning more about auto_offset_reset](https://www.quix.io/blog/kafka-auto-offset-reset-use-cases-and-pitfalls).



### Define Topics

```python
Expand All @@ -104,6 +117,8 @@ They each return [`Topic`](../../api-reference/topics.md) objects, used later on
NOTE: The topics will automatically be created for you in Kafka when you run the application should they not exist.




### The StreamingDataFrame (SDF)

```python
Expand All @@ -118,6 +133,8 @@ After initializing, we continue re-assigning to the same `sdf` variable as we ad

(Also, notice that we pass our input `Topic` from the previous step to it.)



### Tokenizing Text

```python
Expand Down Expand Up @@ -155,6 +172,8 @@ NOTE: Two VERY important and related points around the `expand=True` argument:
2. Our `F` returns a `list` (or a non-dict iterable of some kind), hence the "expand"!




### Filtering Expanded Results

```python
Expand All @@ -181,6 +200,8 @@ With this filter applied, our "and" event is removed:
`>>> [('bob', 1), ('likes', 2), ('bananas', 1), ('frank', 1), ('apples', 1)]`




### Producing Events With New Keys

```python
Expand Down Expand Up @@ -209,22 +230,39 @@ In the end we would produce 5 messages in total, like so:

NOTE: This is how you would see the values in the Kafka topic `product_review_word_counts`.




## Try it yourself!

### 1. Run Kafka
First, have a running Kafka cluster.

To conveniently follow along with this tutorial, just [run this simple one-liner](../README.md#running-kafka-locally).
To easily run a broker locally with Docker, just [run this simple one-liner](../README.md#running-kafka-locally).

### 2. Install Quix Streams
In your python environment, run `pip install quixstreams`
### 2. Download files
- [tutorial_app.py](tutorial_app.py)

### 3. Run the Producer and Application
Just call `python producer.py` and `python application.py` in separate windows.
### 3. Install Quix Streams
In your desired python environment, execute: `pip install quixstreams`

### 4. Run the application
In your desired python environment, execute: `python tutorial_app.py`.

### 5. Check out the results!

...but wait, I don't see any `Application` processing output...Is it working???

One thing to keep in mind is that the Quix Streams does not log/print any message processing
operations by default.

To get visual outputs around message processing, you can either:
- use [recommended way of printing/logging with SDF](../../processing.md#debugging)

- use `DEBUG` mode via `Application(loglevel="DEBUG")`
- WARNING: you should NOT run your applications in `DEBUG` mode in production.

### 4. Check out the results!

Look at all those counted words, beautiful!


## Related topics - Data Aggregation
Expand Down
Loading

0 comments on commit ce2f0d9

Please sign in to comment.