-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nesting depth limit to SpanFormatter #16
base: airbnb-main
Are you sure you want to change the base?
Conversation
@@ -21,6 +21,7 @@ public class SpanFormatter { | |||
|
|||
public static final String DEFAULT_LOG_MESSAGE_TYPE = "INFO"; | |||
public static final String DEFAULT_INDEX_NAME = "unknown"; | |||
public static final int DEPTH_LIMIT = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a TODO for now to move it to schema config.
List<Trace.KeyValue> tags = new ArrayList<>(); | ||
if (value instanceof Map) { | ||
// todo - consider adding a depth param to prevent excessively nested fields | ||
if (depthLimit <= 0) { | ||
return tags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning empty list of tags, can we instead return a string blob that is un-indexed? Since logging systems are often used as debugging systems, it would be ideal to ingest data where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh. Yeah. We'll need to do that now, but it'll be sort of broken. I'd assumed that the _source field would retain the original json, but this change here: https://github.com/slackhq/astra/pull/816/files#diff-3e3f1e6e9032267dd01c35b0f82f031bd633e1f08355002149316079bbe5711b broken that. It got rid of the code that reified the original document in favor of flattening everything. I'm not sure it was an intended because it isn't mentioned in the PR and there are no tests that cover the change.
Kinda like this:
Before:
{"a":{"b":1}}
=> {"a.b":1,"_source":{"a":{"b":1}}}
After:
{"a":{"b":1}}
=> {"a.b":1,"_source":{"a.b":1}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Maybe not exactly that PR, but certainly from that series of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we instead return a string blob that is un-indexed?
Yes. I'll mark it as BINARY, which will cause it to be unindexed but still be injected into the document correctly.
Summary
Currently the preprocessor will generate fields for all the nested keys in a given document even if they are highly nested. This can cause problems if there are many keys or keys that represent ids with a large cardinality. The index node's equivalent code has a limit on how deep to nest, but the preprocessor's code path just had a TODO.
This patch adds a depth limit so that fields that are too deeply nested are handled differently.
Currently it just ignores them, but it could do something else.
Rather than deciding on a place to put the depth limit in config, I hard coded it in a constant, but I'm thinking it'd make sense to add it to the schema config.
Requirements