Documentation updates for DataPrepper 2.8 (#7135)

* Documentation updates for DataPrepper 2.8 Signed-off-by: Kondaka <[email protected]> * Fixed vale errors Signed-off-by: Kondaka <[email protected]> * Addressed review comments Signed-off-by: Kondaka <[email protected]> * Update _data-prepper/common-use-cases/codec-processor-combinations.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/common-use-cases/codec-processor-combinations.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/aggregate.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/aggregate.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/expression-syntax.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Co-authored-by: Melissa Vagi <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> * Update _data-prepper/pipelines/configuration/processors/aggregate.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/aggregate.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/key-value.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/sources/s3.md Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/write_json.md Signed-off-by: Melissa Vagi <[email protected]> * Update _data-prepper/pipelines/configuration/processors/aggregate.md Signed-off-by: Melissa Vagi <[email protected]> --------- Signed-off-by: Kondaka <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]> Signed-off-by: Melissa Vagi <[email protected]> Co-authored-by: Melissa Vagi <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
opensearch-project · May 24, 2024 · 2941ecf · 2941ecf
1 parent 8f754dd
commit 2941ecf
Show file tree

Hide file tree

Showing 6 changed files with 45 additions and 4 deletions.
diff --git a/_data-prepper/common-use-cases/codec-processor-combinations.md b/_data-prepper/common-use-cases/codec-processor-combinations.md
@@ -45,3 +45,6 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config
 
 [Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the [`avro` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/s3#avro-codec) inside an `s3` sink.
 
+## `event_json`
+
+The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in Data Prepper.
diff --git a/_data-prepper/pipelines/configuration/processors/aggregate.md b/_data-prepper/pipelines/configuration/processors/aggregate.md
@@ -20,6 +20,7 @@ Option | Required | Type | Description
 identification_keys | Yes | List | An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required (for example, `["sourceIp", "destinationIp", "port"]`).
 action | Yes | AggregateAction | The action to be performed on each group. One of the [available aggregate actions](#available-aggregate-actions) must be provided, or you can create custom aggregate actions. `remove_duplicates` and `put_all` are the available actions. For more information, see [Creating New Aggregate Actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions).
 group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`.
+local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`.
 
 ## Available aggregate actions
 
@@ -176,4 +177,4 @@ The `aggregate` processor includes the following custom metrics.
 
 **Gauge**
 
-* `currentAggregateGroups`: The current number of groups. This gauge decreases when a group concludes and increases when an event initiates the creation of a new group.
+* `currentAggregateGroups`: This gauge represents the current number of active aggregate groups. It decreases when an aggregate group is completed and its results are emitted and increases when a new event initiates the creation of a new aggregate group.
diff --git a/_data-prepper/pipelines/configuration/processors/key-value.md b/_data-prepper/pipelines/configuration/processors/key-value.md
@@ -33,6 +33,8 @@ You can use the `key_value` processor to parse the specified field into key-valu
 | recursive | Specifies whether to recursively obtain additional key-value pairs from values. The extra key-value pairs will be stored as sub-keys of the root key. Default is `false`. The levels of recursive parsing must be defined by different brackets for each level: `[]`, `()`, and `<>`, in this order. Any other configurations specified will only be applied to the outmost keys. <br />When `recursive` is `true`: <br /> `remove_brackets` cannot also be `true`;<br />`skip_duplicate_values` will always be `true`; <br />`whitespace` will always be `"strict"`. | If `recursive` is true, `{"item1=[item1-subitem1=item1-subitem1-value&item1-subitem2=(item1-subitem2-subitem2A=item1-subitem2-subitem2A-value&item1-subitem2-subitem2B=item1-subitem2-subitem2B-value)]&item2=item2-value"}` will parse into `{"item1": {"item1-subitem1": "item1-subitem1-value", "item1-subitem2" {"item1-subitem2-subitem2A": "item1-subitem2-subitem2A-value", "item1-subitem2-subitem2B": "item1-subitem2-subitem2B-value"}}}`. |
 | overwrite_if_destination_exists | Specifies whether to overwrite existing fields if there are key conflicts when writing parsed fields to the event. Default is `true`. | If `overwrite_if_destination_exists` is `true` and destination is `null`, `{"key1": "old_value", "message": "key1=new_value"}` will parse into `{"key1": "new_value", "message": "key1=new_value"}`. |
 | tags_on_failure | When a `kv` operation causes a runtime exception within the processor, the operation is safely stopped without crashing the processor, and the event is tagged with the provided tags. | If `tags_on_failure` is set to `["keyvalueprocessor_failure"]`, `{"tags": ["keyvalueprocessor_failure"]}` will be added to the event's metadata in the event of a runtime exception. |
+| value_grouping | Specifies whether to group values using predefined value grouping delimiters: `{...}`, `[...]', `<...>`, `(...)`, `"..."`, `'...'`, `http://... (space)`, and `https:// (space)`. If this flag is enabled, then the content between the delimiters is considered to be one entity and is not parsed for key-value pairs. Default is `false`. If `value_grouping` is `true`, then `{"key1=[a=b,c=d]&key2=value2"}` parses to `{"key1": "[a=b,c=d]", "key2": "value2"}`. |
+| drop_keys_with_no_value | Specifies whether keys should be dropped if they have a null value. Default is `false`. If `drop_keys_with_no_value` is set to `true`, then `{"key1=value1&key2"}` parses to `{"key1": "value1"}`. |
 
 
 
@@ -42,4 +44,4 @@ Content will be added to this section.
 
 ## Metrics
 
-Content will be added to this section. --->
+Content will be added to this section. --->
diff --git a/_data-prepper/pipelines/configuration/processors/write_json.md b/_data-prepper/pipelines/configuration/processors/write_json.md
@@ -0,0 +1,18 @@
+---
+layout: default
+title: write_json
+parent: Processors
+grand_parent: Pipelines
+nav_order: 56
+---
+
+# write_json
+
+
+The `write_json` processor converts an object in an event into a JSON string. You can customize the processor to choose the source and target field names.
+
+| Option | Description | Example |
+| :--- | :--- | :--- |
+| source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}`, then the `write_json` processor generates `{"message": "{\"key1\":\"value`\", \"key2\":"{\"key3\":\"value3\"}"}"`.
+| target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used.
+
diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md
@@ -104,7 +104,7 @@ Option | Required | Type | Description
 `s3_select` | No | [s3_select](#s3_select) | The Amazon S3 Select configuration.
 `scan` | No | [scan](#scan) | The S3 scan configuration.
 `delete_s3_objects_on_read` | No | Boolean | When `true`, the S3 scan attempts to delete S3 objects after all events from the S3 object are successfully acknowledged by all sinks. `acknowledgments` should be enabled when deleting S3 objects. Default is `false`.
-`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3.  Leaving this value at the default unless your S3 objects are less than 1MB. Performance may decrease for larger S3 objects. This setting only affects SQS-based sources. Default is `1`.
+`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. Leave this value as the default unless your S3 objects are less than 1 MB in size. Performance may decrease for larger S3 objects. This setting affects SQS-based sources and S3-Scan sources. Default is `1`.
 
 
 

diff --git a/_data-prepper/pipelines/expression-syntax.md b/_data-prepper/pipelines/expression-syntax.md
@@ -114,7 +114,23 @@ null != <JSON Pointer>
 null != /response
 ```
 
-#### Conditional expression
+## Type check operator
+
+The type check operator tests whether a JSON Pointer is of a certain data type.
+
+### Syntax
+```
+<JSON Pointer> typeof <DataType>
+```
+Supported data types are `integer`, `long`, `boolean`, `double`, `string`, `map`, and `array`.
+
+#### Example
+```
+/response typeof integer
+/message typeof string
+```
+
+### Conditional expression
 
 A conditional expression is used to chain together multiple expressions and/or values.
 
@@ -218,6 +234,7 @@ White space is **required** surrounding set initializers, priority expressions,
 | `==`, `!=`           | Equality operators       | No                   | `/status == 200`<br>`/status_code==200`                        |                                       |
 | `and`, `or`, `not`   | Conditional operators    | Yes                  | `/a<300 and /b>200`                                            | `/b<300and/b>200`                     |
 | `,`                  | Set value delimiter      | No                   | `/a in {200, 202}`<br>`/a in {200,202}`<br>`/a in {200 , 202}` | `/a in {200,}`                        |
+| `typeof`             | Type check operator      | Yes                   | `/a typeof integer`<br>`/a typeof long`<br>`/a typeof string`<br> `/a typeof double`<br> `/a typeof boolean`<br>`/a typeof map`<br>`/a typeof array` |`/a typeof /b`<br>`/a typeof 2`                      |
 
 
 ## Functions
Original file line number	Diff line number	Diff line change
Expand Up		@@ -45,3 +45,6 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config

		[Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the [`avro` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/s3#avro-codec) inside an `s3` sink.

		## `event_json`

		The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in Data Prepper.