Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

APEXMALHAR-2427 Kinesis Input Operator documentation. #564

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

deepak-narkhede
Copy link
Contributor

No description provided.

@deepak-narkhede
Copy link
Contributor Author

@chaithu14 Please review.


### Pre-requisites

This operator uses the AWS kinesis java sdk version 1.9.10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a pre-requisite ? You may remove the "pre-requisites" and add this information after Maven Aritifiact

### Introduction

The kinesis input operator consumes data from the partitions of a Kinesis shard(s) for processing in Apex.
The operator is fault-tolerant, scalable and supports input from multiple shards of AWS kinesis streams in a single operator instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion : you may highlight "AWS Kinesis Streams" and provide link to https://aws.amazon.com/kinesis/streams/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

![AbstractKinesisInput.png](images/kinesisInput/classdiagram.png)

#### Configuration properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a table with columns "Parameter" , Type, Description, Mandatory , Default Value etc
This will make it readable and consistent with other operator documentation. You may refer to KafkaInputOperator documentation

#### Abstract Methods

`T getTuple(Record record)`: Converts the Kinesis Stream record(s) to tuple.
`void emitTuple(Pair<String, Record> data)`: This method emits tuples extracted from AWS kinesis streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on another line. Do verify it by viewing the doc in mkdocs on chrome


- ***streamName*** - String[]
- Mandatory Parameter.
- Specifies the name of the stream from where the records to be accessed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

records are accessed / records are to be accessed

- ***strategy*** - PartitionStrategy
- Operator supports two types of partitioning strategies, `ONE_TO_ONE` and `MANY_TO_ONE`.

`ONE_TO_ONE`: If this is enabled, the AppMaster creates one input operator instance per kinesis shard partition. So the number of shard partitions equals the number of operator instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of operator instances equals the number of shard partitions

Default Value = 30 Seconds.

- ***repartitionCheckInterval*** - Long
- Interval specified in milliseconds. This value specifies the minimum interval between two stat checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stat ?

Default value = 1024.

- ***windowDataManager*** - WindowDataManager
- If set to a value other than the default, such as `FSWindowDataManager`, specifies that the operator will process the same set of messages in a window before and after a failure. This is important but it comes with higher cost because at the end of each window the operator needs to persist some state with respect to that window.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be rephrased. The description should convey following .

  1. What is the purpose of this property ?
  2. How does the default work ?
  3. What are my choices if i want to specify any other windowDataManager and how each of those choices will affect the behaviour?

returned by processStats(...) method, the application master invokes
definePartitions(...) on the logical instance which is also the
partitioner instance. Dynamic partition can be disabled by setting the
parameter repartitionInterval value to a negative value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highlight repartitionInterval

}
}
```
Below is the configuration for “KINESIS-STREAM-NAME” Kinesis stream name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not quite understand the meaning of this line. Are you referring to configuration for the app ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants