Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nesting depth limit to SpanFormatter #16

Open
wants to merge 7 commits into
base: airbnb-main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions astra/src/main/java/com/slack/astra/writer/SpanFormatter.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ public class SpanFormatter {

public static final String DEFAULT_LOG_MESSAGE_TYPE = "INFO";
public static final String DEFAULT_INDEX_NAME = "unknown";
public static final int DEPTH_LIMIT = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a TODO for now to move it to schema config.


public static Timestamp parseDate(String dateStr, Schema.SchemaFieldType type) {
Instant instant;
Expand Down Expand Up @@ -148,14 +149,24 @@ public static List<Trace.KeyValue> convertKVtoProto(
@VisibleForTesting
public static List<Trace.KeyValue> convertKVtoProtoDefault(
String key, Object value, Schema.IngestSchema schema) {
return convertKVtoProtoDefault(key, value, schema, DEPTH_LIMIT - 1);
}

@VisibleForTesting
public static List<Trace.KeyValue> convertKVtoProtoDefault(
String key, Object value, Schema.IngestSchema schema, int depthLimit) {
List<Trace.KeyValue> tags = new ArrayList<>();
if (value instanceof Map) {
// todo - consider adding a depth param to prevent excessively nested fields
if (depthLimit <= 0) {
return tags;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning empty list of tags, can we instead return a string blob that is un-indexed? Since logging systems are often used as debugging systems, it would be ideal to ingest data where possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh. Yeah. We'll need to do that now, but it'll be sort of broken. I'd assumed that the _source field would retain the original json, but this change here: https://github.com/slackhq/astra/pull/816/files#diff-3e3f1e6e9032267dd01c35b0f82f031bd633e1f08355002149316079bbe5711b broken that. It got rid of the code that reified the original document in favor of flattening everything. I'm not sure it was an intended because it isn't mentioned in the PR and there are no tests that cover the change.

Kinda like this:
Before:
{"a":{"b":1}} => {"a.b":1,"_source":{"a":{"b":1}}}
After:
{"a":{"b":1}} => {"a.b":1,"_source":{"a.b":1}}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Maybe not exactly that PR, but certainly from that series of changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead return a string blob that is un-indexed?
Yes. I'll mark it as BINARY, which will cause it to be unindexed but still be injected into the document correctly.

}

((Map<?, ?>) value)
.forEach(
(key1, value1) -> {
List<Trace.KeyValue> nestedValues =
convertKVtoProtoDefault(String.format("%s.%s", key, key1), value1, schema);
convertKVtoProtoDefault(
String.format("%s.%s", key, key1), value1, schema, depthLimit - 1);
tags.addAll(nestedValues);
});
} else if (value instanceof String || value instanceof List) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
import org.apache.commons.io.FileUtils;
import org.apache.lucene.document.Document;
Expand Down Expand Up @@ -157,6 +158,44 @@ public void canParseSimpleMapValues() throws Exception {
.isEqualTo("subsubvalue1");
}

@Test
@SuppressWarnings("OptionalGetWithoutIsPresent")
public void stopsAtDepthN() throws Exception {

Function<List<String>, Map<String, Object>> nestedMap =
(keyPath) -> {
Object value = "value";
for (String k : keyPath.reversed()) {
value = Map.of(k, value);
}
return (Map<String, Object>) value;
};
Map<String, Object> head = nestedMap.apply(List.of("a", "b", "c", "d", "e", "f"));
List<Trace.KeyValue> list;
list = SpanFormatter.convertKVtoProtoDefault("init", head, schema);
assertThat(list.size()).isEqualTo(0);

list = SpanFormatter.convertKVtoProtoDefault("init", head, schema, 1);
assertThat(list.size()).isEqualTo(0);

list = SpanFormatter.convertKVtoProtoDefault("init", head, schema, 2);
assertThat(list.size()).isEqualTo(0);

list = SpanFormatter.convertKVtoProtoDefault("init", head, schema, 6);
assertThat(list.size()).isEqualTo(1);
assertThat(list.getFirst().getKey()).isEqualTo("init.a.b.c.d.e.f");

list =
SpanFormatter.convertKVtoProtoDefault("init", nestedMap.apply(List.of("a", "b")), schema);
assertThat(list.size()).isEqualTo(1);
assertThat(list.getFirst().getKey()).isEqualTo("init.a.b");

list =
SpanFormatter.convertKVtoProtoDefault(
"init", nestedMap.apply(List.of("a", "b", "c")), schema);
assertThat(list.size()).isEqualTo(0);
}

@Test
public void parseIndexRequestWithNullValues() throws Exception {
final File schemaFile =
Expand Down
Loading