Add nesting depth limit to SpanFormatter #16

baroquebobcat · 2024-11-21T23:42:44Z

Summary

Currently the preprocessor will generate fields for all the nested keys in a given document even if they are highly nested. This can cause problems if there are many keys or keys that represent ids with a large cardinality. The index node's equivalent code has a limit on how deep to nest, but the preprocessor's code path just had a TODO.

This patch adds a depth limit so that fields that are too deeply nested are handled differently.

Currently it just ignores them, but it could do something else.

Rather than deciding on a place to put the depth limit in config, I hard coded it in a constant, but I'm thinking it'd make sense to add it to the schema config.

Requirements

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.

mansu · 2024-11-22T00:04:49Z

astra/src/main/java/com/slack/astra/writer/SpanFormatter.java

@@ -21,6 +21,7 @@ public class SpanFormatter {

  public static final String DEFAULT_LOG_MESSAGE_TYPE = "INFO";
  public static final String DEFAULT_INDEX_NAME = "unknown";
+  public static final int DEPTH_LIMIT = 3;


We can add a TODO for now to move it to schema config.

mansu · 2024-11-22T00:08:57Z

astra/src/main/java/com/slack/astra/writer/SpanFormatter.java

    List<Trace.KeyValue> tags = new ArrayList<>();
    if (value instanceof Map) {
-      // todo - consider adding a depth param to prevent excessively nested fields
+      if (depthLimit <= 0) {
+        return tags;


Instead of returning empty list of tags, can we instead return a string blob that is un-indexed? Since logging systems are often used as debugging systems, it would be ideal to ingest data where possible.

argh. Yeah. We'll need to do that now, but it'll be sort of broken. I'd assumed that the _source field would retain the original json, but this change here: https://github.com/slackhq/astra/pull/816/files#diff-3e3f1e6e9032267dd01c35b0f82f031bd633e1f08355002149316079bbe5711b broken that. It got rid of the code that reified the original document in favor of flattening everything. I'm not sure it was an intended because it isn't mentioned in the PR and there are no tests that cover the change.

Kinda like this:
Before:
{"a":{"b":1}} => {"a.b":1,"_source":{"a":{"b":1}}}
After:
{"a":{"b":1}} => {"a.b":1,"_source":{"a.b":1}}

Hm. Maybe not exactly that PR, but certainly from that series of changes.

can we instead return a string blob that is un-indexed?
Yes. I'll mark it as BINARY, which will cause it to be unindexed but still be injected into the document correctly.

add depth limiting

ced5c2e

autata approved these changes Nov 21, 2024

View reviewed changes

mansu reviewed Nov 22, 2024

View reviewed changes

baroquebobcat added 6 commits November 22, 2024 11:15

include the field instead of dropping it

8f211df

formatting

321fa57

fix tests

3b218a1

clean up tostring a little

95b9e19

try depth 1

594fb9a

move depth limit back to 3

d4a471b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nesting depth limit to SpanFormatter #16

Add nesting depth limit to SpanFormatter #16

baroquebobcat commented Nov 21, 2024

mansu Nov 22, 2024

mansu Nov 22, 2024

baroquebobcat Nov 22, 2024

baroquebobcat Nov 22, 2024

baroquebobcat Nov 22, 2024

Add nesting depth limit to SpanFormatter #16

Are you sure you want to change the base?

Add nesting depth limit to SpanFormatter #16

Conversation

baroquebobcat commented Nov 21, 2024

Summary

Requirements

mansu Nov 22, 2024

Choose a reason for hiding this comment

mansu Nov 22, 2024

Choose a reason for hiding this comment

baroquebobcat Nov 22, 2024

Choose a reason for hiding this comment

baroquebobcat Nov 22, 2024

Choose a reason for hiding this comment

baroquebobcat Nov 22, 2024

Choose a reason for hiding this comment