-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_opentelemetry: enhancements for log body and attributes handling (fix #8359) #8491
Conversation
The mp_accessor API allows to manipulate a msgpack object buffer by adding or removing keys through a definition of a list of record accessor patterns. By default, when running mp_accessor it iterates through all the list of record accessors, but there are cases where would be ideal to just enable/disable some of them. This patch extends the mp_accessor API allowing to activate/deactivate certain record accessors on-demand. Signed-off-by: Eduardo Silva <[email protected]>
…ix #8359) The following patch fix and enhance the OpenTelemetry output connector when handling log records. In Fluent Bit world, we deal with ton of unstructured log records which comes from variety of sources, or just simply raw text files. When converting those lines to structured messages, there was no option to define what will be the log body and log attributes and everything is packaged inside log body by default. This patch enhance the previous behavior by allowing the following: - log body: optionally define multiple record accessor patterns that tries to match a key or sub-key from the record structure. For the first matched key, it's value is used as part of the body content. If no matches exists, the whole record is set inside the body. - log attributes: if the log record contains native metadata, the keys are added as OpenTelemetry Attributes. if the log body was populated by using a record accessor pattern as described above, the remaining keys that were not used are added as attributes. To achieve the desired new behavior, the configuration needs to use the new configuration property called 'logs_body_key', which can be used to define multiple record accessor patterns. e.g: pipeline: inputs: - name: dummy metadata: '{"meta": "data"}' dummy: '{"name": "bill", "lastname": "gates", "log": {"abc": {"def":123}, "saveme": true}}' outputs: - name: opentelemetry match: '*' host: localhost port: 4318 logs_body_key: $name logs_body_key: $log['abc']['def'] In the example above, the dummy input plugin will generate a record with metadata, in the output side, the plugin will lookup in order for $name and then $log['abc']['def']. $name will match so 'bill' will become the value of the body and the remaining content as attributes. Here is the output of the vanilla Otel Collector when inspecting the content it receives: Body: Str(bill) Attributes: -> meta: Str(data) -> lastname: Str(gates) -> log: Map({"abc":{"def":123},"saveme":true}) Signed-off-by: Eduardo Silva <[email protected]>
In my understanding, first |
@edsiper Thank you.
I think it may affect user request #8205.
Once |
I tested following configuration and it enrich Attributes field. Following
receivers:
otlp:
protocols:
http:
endpoint: localhost:6969
exporters:
file:
path: 2_out.json
service:
telemetry:
logs:
level: debug
pipelines:
logs:
receivers: [otlp]
exporters: [file] Output of otel-collector {
"resourceLogs": [
{
"resource": {},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1708134916475903956",
"body": {
"stringValue": "bill"
},
"attributes": [
{
"key": "ObservedTimestamp",
"value": {
"intValue": "1234"
}
},
{
"key": "Timestamp",
"value": {
"intValue": "3456"
}
},
{
"key": "SeverityText",
"value": {
"stringValue": "WARN"
}
},
{
"key": "SeverityNumber",
"value": {
"intValue": "5"
}
},
{
"key": "TraceFlags",
"value": {
"intValue": "13"
}
},
{
"key": "Attributes",
"value": {
"kvlistValue": {
"values": [
{
"key": "log.file.name",
"value": {
"stringValue": "a.log"
}
},
{
"key": "test",
"value": {
"stringValue": "val"
}
}
]
}
}
},
{
"key": "lastname",
"value": {
"stringValue": "gates"
}
},
{
"key": "log",
"value": {
"kvlistValue": {
"values": [
{
"key": "abc",
"value": {
"kvlistValue": {
"values": [
{
"key": "def",
"value": {
"intValue": "123"
}
}
]
}
}
},
{
"key": "saveme",
"value": {
"boolValue": true
}
}
]
}
}
}
],
"traceId": "",
"spanId": ""
}
]
}
]
}
]
} |
Thank you for working on this. I was able to compile this branch locally and test it. Here are my results. I started OpenTelemetry and Fluent Bit with the following respective configurations. otel-config.yaml---
receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"
processors:
batch:
exporters:
file:
path: otel-output.json
service:
telemetry:
logs:
level: debug
pipelines:
logs:
receivers:
- otlp
processors:
- batch
exporters:
- file fluent-bit-config.yaml---
pipeline:
inputs:
- name: dummy
tag: dummy
metadata: |
{
"resource-attr": "resource-attr-val-1"
}
# This log record is taken directly from OpenTelemetry's Examples page.
# https://opentelemetry.io/docs/specs/otel/protocol/file-exporter/#examples
dummy: |
{
"severityNumber": 9,
"severityText": "Info",
"name": "logA",
"message": "This is a log message",
"app": "server",
"instance_num": 1,
"traceId": "08040201000000000000000000000000",
"spanId": "0102040800000000"
}
outputs:
- name: opentelemetry
match: "*"
host: localhost
port: 4318
logs_body_key: $message
# For live debugging on the Fluent Bit side.
- name: stdout
match: "*"
format: json Here's what I received in OpenTelemetry. This is a much better experience than previous. However, it's still not exactly what I would expect when shipping logs to an OpenTelemetry endpoint. actual-logs.json{
"resourceLogs": [
{
"resource": {},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1708201530553920000",
"body": {
"stringValue": "This is a log message"
},
"attributes": [
{
"key": "resource-attr",
"value": {
"stringValue": "resource-attr-val-1"
}
},
{
"key": "severityNumber",
"value": {
"intValue": "9"
}
},
{
"key": "severityText",
"value": {
"stringValue": "Info"
}
},
{
"key": "name",
"value": {
"stringValue": "logA"
}
},
{
"key": "app",
"value": {
"stringValue": "server"
}
},
{
"key": "instance_num",
"value": {
"intValue": "1"
}
},
{
"key": "traceId",
"value": {
"stringValue": "08040201000000000000000000000000"
}
},
{
"key": "spanId",
"value": {
"stringValue": "0102040800000000"
}
}
],
"traceId": "",
"spanId": ""
}
]
}
]
}
]
} Instead, I would have expected to receive the following where the expected-logs.json{
"resourceLogs": [
{
"resource": {
"attributes": [
{
"key": "resource-attr",
"value": {
"stringValue": "resource-attr-val-1"
}
}
]
},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1708201530553920000",
"severityNumber": 9,
"severityText": "Info",
"name": "logA",
"body": {
"stringValue": "This is a log message"
},
"attributes": [
{
"key": "app",
"value": {
"stringValue": "server"
}
},
{
"key": "instance_num",
"value": {
"intValue": "1"
}
}
],
"traceId": "08040201000000000000000000000000",
"spanId": "0102040800000000"
},
]
}
]
}
]
} I believe we need to answer the following questions before the Fluent Bit
|
@sudomateo thanks for the feedback. Regarding other Log model fields that are not mapped yet, @nokute78 is working on #8475 to support those dynamically. The concept is as follows, let's use
Through #8475 the expectation is to do that for all the missing keys that are part of the Log data model. Regarding the questions: 1. How does Fluent Bit's event metadata relate to OpenTelemetry's Logs Data Model? Fluent Bit is schema-less, meaning that it can adapt to any type of schema; we provide configurations like the one we are working on to solve specific use cases. In Logging, we have to deal with tons of data that are structured but don't follow a schema, while OTel does it. We just need to finish up these mapping details. 2. How does a Fluent Bit operator populate top-level fields (i.e., TraceId, SpanId, SeverityNumber, etc.) in OpenTelemetry's Logs Data Model? This pull request is focused on solving populating the Body and Attributes fields, but what about the rest? This PR #8491, only focuses on fixing how to populate body and attributes when the data that comes is not from OTel instrumentation. About the rest will be implemented in #8475; note we have other PRs around to solve OTel in the input side, here we are focusing in the output. For this PR, I will do some tweaks to make it optional to set all remaining keys that were not used in the body as part of the attributes, or allow to define specific attributes from the record. Again, thanks for the feedback on this, to get it right we need this type of testing and interaction :) |
…: false) When logs_body_key was added, by default it added all the unmatched keys as attributes, there are many cases where this is not desired. In order to keep the flexibility to the user, this patch adds a new config option called 'logs_body_key_attributes' which takes a boolean value. When it's enabled, the remaining keys are added as attributes; by default this is disabled. Signed-off-by: Eduardo Silva <[email protected]>
@nokute78 I have added a new option called |
To move forward OTel fixes ASAP, I am merging this PR and we will continue iterating in the others to fill the end-user gaps. |
Thank you for working on this @edsiper! I agree with your assessment to merge this and continue iterating in the other pull requests. |
@edsiper This is good but still does not address the main concerns of my ticket (#8359 (comment)). The traceid and spanid are still not being set. It seems your saying the rest will be fixed as part of #8475 ?? However it seems the PR has staled |
May I know what your pod yaml looks like? I am always getting permission issue when I try to write to a file from Aks cluster. Thanks! |
If you look at my changes, they address your concerns and the rest of the fields. |
The following patch fix and enhance the OpenTelemetry output connector when handling log records.
In Fluent Bit world, we deal with ton of unstructured log records which comes from variety of sources, or just simply raw text files. When converting those lines to structured messages, there was no option to define what will be the log body and log attributes and everything is packaged inside log body by default.
This patch enhances the previous behavior by allowing the following:
log body: optionally define multiple record accessor patterns that tries to match a key or sub-key from the record structure. For the first matched key, it's value is used as part of the body content. If no matches exists, the whole record is set inside the body.
log attributes: if the log record contains native metadata, the keys are added as OpenTelemetry Attributes. if the log body was populated by using a record accessor pattern as described above, the remaining keys that were not used can be optionally added as attributes as well if the new option
logs_body_key_attributes
is set totrue
(default: false).To achieve the desired new behavior, the configuration needs to use the new configuration property called
logs_body_key
, which can be used to define multiple record accessor patterns. e.g:In the example above, the dummy input plugin will generate a record with metadata, in the output side, the plugin will lookup in order for
$name
and then$log['abc']['def']
.$name
will match, sobill
will become the value of the body and the remaining content as attributes. Here is the output of the vanilla Otel Collector when inspecting the content it receives:updates
logs_body_key_attributes
.Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.