Add error.stack_trace #562

felixbarny · 2019-09-16T13:23:15Z

Based on discussions in #154 and elastic/ecs-logging-java#25.

One missing piece when using ECS for logging is that there's no dedicated field for the stack trace.

We have discussed having a schema for a parsed version of the stack trace. But I don't think that's needed to get started and for some users, that may not even be desired as it would be less readable in the Discover tab in Kibana.

We can always add a schema for a parsed variant of a stack trace later and use the key error.stack_trace_parsed or similar for that. But at the moment we don't really have a use-case of what to do with a parsed stack trace.

That's why this PR just adds a dedicated field for the plain-text stack trace to the error schema.

With this addition, we can do the following mapping:
error.code: exception class name
error.message: exception message
error.stack_trace: plain-text exception stack trace

/cc @pgomulka

felixbarny · 2019-09-16T13:23:22Z

Question: would it be allowed schema-wise if that field was an array? I think it would, right? One use-case for that is to put the different lines of a stack trace in different array elements so that they are human-readable: https://github.com/elastic/beats/pull/9988/files#diff-a639ebd9050d80e47bd50d53210aab49R57

One challenge with that approach is that Filebeat does not really like multi-line JSON. But there's a way to make it work:

filebeat.inputs:
  - type: log
    paths: test.log.json
    # we can't use the normal json parsing as they require the json to be a single line only
    # json.keys_under_root: true
    # json.message_key: message
    multiline.pattern: '^{'
    multiline.negate: true
    multiline.match: after
processors:
  - decode_json_fields:
      fields: message
      max_depth: 1
      target: ""
      overwrite_keys: true
  # in case error.stack_trace can't be an array, join the error.stack_trace_lines to a string
  - script:
      when:
        has_fields: ['error.stack_trace_lines']
      lang: javascript
      id: flatten_stack_trace_lines
      source: >
        function process(event) {
            event.Put("error.stack_trace", event.Get("error.stack_trace_lines").join("\n"));
            event.Delete("error.stack_trace_lines");
        }

felixbarny · 2019-09-16T13:39:04Z

When we get this in I think it makes sense to update the default layout of the Logs UI to include the stack trace.

/cc @weltenwort

depends on elastic/ecs#562

mustafacagataytulun · 2019-09-18T05:27:20Z

@felixbarny I'm not the reviewer but here is my two cents: I haven't think or used stack trace as an array in my entire career, and also haven't require it to be an array. Moreover, stack trace is a single string field in most of programming languages. It might add a little processing overhead to split it line by line into an array.

depends on elastic/ecs#562

felixbarny · 2019-09-18T13:38:54Z

I should have provided more context.

The array is solely used to make the logs more readable. Having the whole stacktrace in one JSON string forces the contents to be in a single line, which is not very readable to say the least.

Compare that to this variant where each line in a stack trace is rendered as a different array element in a new line:

{"@timestamp":"2019-09-17T13:16:48.038Z", "log.level":"ERROR", "message":"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown] with root cause", "process.thread.name":"http-nio-8080-exec-1","log.logger":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]","log.origin":{"file":"DirectJDKLog.java","function":"log","line":175},"error.code":"java.lang.RuntimeException","error.message":"Expected: controller used to showcase what happens when an exception is thrown","error.stack_trace":[
	"java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown",
	"\tat org.springframework.samples.petclinic.system.CrashController.triggerException(CrashController.java:33)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)",
	"\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
	"\tat java.lang.reflect.Method.invoke(Method.java:498)",
	"\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)",
	"\tat java.lang.Thread.run(Thread.java:748)"]}

That makes the JSON logs actually human-readable so that there shouldn't be a need to have plain-text logs as a fallback. Even during development, this format should be good enough, at least for my taste.

With the Filebeat configuration I have pasted above you can also convert the array back to a regular string. But if ECS supports a value to either be a scalar or array value (as Elasticsearch/Lucene does), joining the array to a single string would be optional.

It might add a little processing overhead to split it line by line into an array.

It takes some processing overhead to merge the array back to a string but rendering as an array does not cause overhead. At least not with my implementation here 🙂:
https://github.com/elastic/java-ecs-logging/blob/33ae6d4eeaabb84b904ecfaad80b965a7c02bb47/ecs-logging-core/src/main/java/co/elastic/logging/EcsJsonSerializer.java#L192-L208

ruflin · 2019-09-18T14:12:44Z

schemas/error.yml

@@ -28,3 +28,11 @@
      type: keyword
      description: >
        Error code describing the error.
+
+    - name: stack_trace
+      level: core


Let's make this extended.

ruflin · 2019-09-18T14:15:54Z

@felixbarny You opened unintentionally an other can of worms with this. Overall, ++ on the proposal and yes every field can also be array in lucene. So lets get to the other problem: What does actually error mean in ECS? Is it the error from the event like a stack trace or error in the sense of APM or is it an error that can happen during processing (for example from the ingest pipeline). Unfortunately we use it for both at the moment. As long as only 1 error happens, all is fine, but if we start mixing the two in one event, things become awkward.

felixbarny · 2019-09-18T15:34:01Z

@webmat already hinted that it probably makes sense to store pipeline errors in a pipeline namespace. I agree with that. The pipeline should have a reserved namespace so that it does not conflict or otherwise interfere with the rest of the ECS spec.

For the time being, I think we can live with the fact that error.message is also used if there are processing errors. Luckily, in that case, it adds a message instead of overriding it, making error.message an array. It's awkward but it's not the end of the world.

An error can be an APM error or an error logged by a logger. Or even both. In the Java agent, we are creating APM errors when logger.error("message", exception); is called. We then annotate the log with the error.id used by APM so that the log line can be correlated with the APM error.

mustafacagataytulun · 2019-09-19T06:15:28Z

Having the whole stacktrace in one JSON string forces the contents to be in a single line, which is not very readable to say the least.

Actually, it is readable in Kibana because line endings (\r\n) are included in stack trace (at least when using C#). Kibana elegantly shows them line-by-line. However, as I look at your example, @felixbarny, you refer the readability of raw JSON. Fair enough. :)

It takes some processing overhead to merge the array back to a string but rendering as an array does not cause overhead. At least not with my implementation here 🙂:
https://github.com/elastic/java-ecs-logging/pull/28/files#diff-02b493ca1092b92c519dc566f2ee10eeR192-R208

I was actually referring to the conversion of the single string of stack trace into an array line-by-line, because programming languages generally gives you the whole stack trace as a single string.

Here is another consideration (it comes to my mind when I read @ruflin's comment): Semantically speaking, making stack_trace field an array may be interpreted as each element of the array is a different stack trace. But it can also be my misinterpretation of course. :)

ruflin

I'm good with merging this as is. I think all the other discussions we can follow up. Also if array or not: In Lucene everything is an array anyways, so it's more a question of documentation.

webmat · 2019-09-23T14:42:50Z

The error field set should only encapsulate the metadata of the error, such as the stack trace. So I agree with this addition.

webmat · 2019-09-23T14:48:30Z

As for the array v string discussion: I think it would make sense keep this as the original string for now. If we want an array representation in Elasticsearch, I think we need to merge this idea with that of the structured / parsed exception details.

From ECS point of view, the log readability discussion is unrelated, IMO. You're free to do what you want in your log file format, as long as it looks like a single block of text when it's saved to Elasticsearch.

In other words, if you need readable JSON logs (do these exist? ;-) ), feel free to split into an array in there, but then re-join them to a block of text prior to indexing.

weltenwort · 2019-09-23T15:24:20Z

Even though every field can be an array in Elasticsearch there's still a semantic difference between "line1\nline2" and ["line1", "line2"] in terms of analysis and querying, right? The former might work better with various Elasticsearch features like custom tokenization or proximity, slop and other span-based queries.

felixbarny · 2019-09-24T14:36:37Z

I think this is actually pretty readable and probably good enough so that most people don't need fallback plain-text logs anymore.

{"@timestamp":"2019-08-06T12:09:12.375Z", "log.level": "INFO", "message":"Tomcat started on port(s): 8080 (http) with context path ''", "service.name":"spring-petclinic","process.thread.name":"restartedMain","log.logger":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer"}
{"@timestamp":"2019-08-06T12:09:12.379Z", "log.level": "INFO", "message":"Started PetClinicApplication in 7.095 seconds (JVM running for 9.082)", "service.name":"spring-petclinic","process.thread.name":"restartedMain","log.logger":"org.springframework.samples.petclinic.PetClinicApplication"}
{"@timestamp":"2019-08-06T14:08:40.199Z", "log.level":"DEBUG", "message":"init find form", "service.name":"spring-petclinic","process.thread.name":"http-nio-8080-exec-8","log.logger":"org.springframework.samples.petclinic.owner.OwnerController","transaction.id":"28b7fb8d5aba51f1","trace.id":"2869b25b5469590610fea49ac04af7da"}
{"@timestamp":"2019-09-17T13:16:48.038Z", "log.level":"ERROR", "message":"Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown] with root cause", "process.thread.name":"http-nio-8080-exec-1","log.logger":"org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]","log.origin":{"file.name":"DirectJDKLog.java","function":"log","file.line":175},"error.type":"java.lang.RuntimeException","error.message":"Expected: controller used to showcase what happens when an exception is thrown","error.stack_trace":[
	"java.lang.RuntimeException: Expected: controller used to showcase what happens when an exception is thrown",
	"\tat org.springframework.samples.petclinic.system.CrashController.triggerException(CrashController.java:33)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)",
	"\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)",
	"\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)",
	"\tat java.lang.reflect.Method.invoke(Method.java:498)",
	"\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)",
	"\tat java.lang.Thread.run(Thread.java:748)"]}

The benefit of not parsing out the different elements of a stack trace is that IDEs would still link to the source files, as they would with normal stack traces, because they recognize the structure.

Of course, having plain-text logs as a fallback may not really be needed to begin with but I think it does help people to transition to centralized logging.

feel free to split into an array in there, but then re-join them to a block of text prior to indexing.

I can live with that but I'd also like to discuss options on how to make that simpler in Filebeat. At the moment, Filebeat does not really like multi-line JSONs. Also, a built-in processor which transforms an array to a string would be cool in order to make that use-case more of a first-class citizen. Will take this to the beats repo.

Personally, I think it makes sense to normalize it to be a scalar string just from a data consistency standpoint. When looking at a log event in discover, I'd much rather see error.stack_trace as opposed to error.stack_trace.0, error.stack_trace.1, ...

@pgomulka you raised some concerns about that, wdyt?

there's still a semantic difference between "line1\nline2" and ["line1", "line2"] in terms of analysis and querying, right?

In terms of analyzing and querying it would be the same in this case as error.stack_trace is not indexed at all because the field can be quite large.

ecs/schemas/error.yml

Line 42 in 58d5bc6

index: false

pgomulka · 2019-09-24T17:56:34Z

my main general concern (not only to error field but other fields too) is readability of these logs.
I like the way it looks now in your example - Compact and human readable.

I am worried however if ECS will not diverge into more machine oriented protocol. This might not be terribly bad if we assume that there always will have to be a process in the middle with a rich logic to transform human readable logs into ECS schema.
If however ECS will be a default schema for JSON logs (either Elasticsearch server or any other app) then we verbosity and readability might become an issue.
As pointed out earlier, JSON format already makes reading logs harder, it might put people off from trying to make their app's logs ECS compliant. My impression is that this is the goal in elastic ecosystem at least?

webmat · 2019-09-25T13:02:59Z

Elasticsearch supports plaintext and NDJSON log output, correct?

NDJSON logs are by definition a single JSON object per line. And regardless of the semantics of whether we strive for NDJSON there, I think JSON logs should be optimized for programmatic intake in general.

If a user wants human readability they should pick the plaintext log format.

Finally, if a developer needs the JSON logs to be more human-readable, they are free to set compact=false in their log4j configs, and have the whole thing prettified for human-readable JSON events.

In short, I think the JSON log outputs we offer should be NDJSON.

pgomulka · 2019-09-25T13:21:01Z

@webmat at the moment Elasticsearch supports both plaintext and multiline JSON logs (error stacktraces being multiline like in examples by @felixbarny ). The plan at the moment was to remove plaintext logs at some point and keep only JSON logs. This is already done for elasticsearch in docker.

If we maintain both plaintext and JSON we might skip the readability of JSON logs at all. However keeping both forces us to maintain duplicated configs and would make it hard to populate all the information in both logs. ES doesn't generate all too much logs, but disk space usage might be an issue to consider too.
I feel like this is open for a discussion though .

In regards to compact=false I don't think we would need this. If any application decides to use JSON logs as the only output, it would need them to be readable.
If it would maintain both plaintext and JSON, it might keep plaintext readable and JSON as machine intake easy as possible.

Add error.stack_trace

a752dca

felixbarny requested a review from webmat September 16, 2019 13:23

Update PR number in changelog

acd0c48

felixbarny mentioned this pull request Sep 16, 2019

Using ECS to track the source of errors or exceptions #154

Open

felixbarny mentioned this pull request Sep 16, 2019

Formatting of stacktraces elastic/ecs-logging-java#25

Closed

felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 16, 2019

Serialize stack traces in error.stack_trace

06393d6

depends on elastic/ecs#562

felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 16, 2019

Serialize stack traces in error.stack_trace

1607cd4

depends on elastic/ecs#562

felixbarny mentioned this pull request Sep 16, 2019

Serialize stack traces in error.stack_trace elastic/ecs-logging-java#28

Merged

felixbarny added a commit to felixbarny/java-ecs-logging that referenced this pull request Sep 18, 2019

Serialize stack traces in error.stack_trace

3bce698

depends on elastic/ecs#562

ruflin reviewed Sep 18, 2019

View reviewed changes

Make error.stack_trace extended

4e61af8

ruflin approved these changes Sep 19, 2019

View reviewed changes

felixbarny merged commit 4109312 into elastic:master Sep 19, 2019

felixbarny deleted the stack-trace branch September 19, 2019 09:27

felixbarny mentioned this pull request Sep 20, 2019

[Logs UI] Support error.stack_trace elastic/kibana#46211

Closed

felixbarny mentioned this pull request Sep 25, 2019

New event fields to be able to store info from log4j, log4net, nlog, ... frequently used log frameworks #97

Closed

felixbarny mentioned this pull request Jul 29, 2020

Support for 'log.origin.module' #785

Closed

Mpdreamz mentioned this pull request Feb 10, 2023

Support serializing exceptions like Serilog.Sinks.Elasticsearch elastic/ecs-dotnet#158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add error.stack_trace #562

Add error.stack_trace #562

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

mustafacagataytulun commented Sep 18, 2019

felixbarny commented Sep 18, 2019 •

edited

Loading

ruflin Sep 18, 2019

ruflin commented Sep 18, 2019

felixbarny commented Sep 18, 2019

mustafacagataytulun commented Sep 19, 2019

ruflin left a comment

webmat commented Sep 23, 2019

webmat commented Sep 23, 2019

weltenwort commented Sep 23, 2019

felixbarny commented Sep 24, 2019 •

edited

Loading

pgomulka commented Sep 24, 2019

webmat commented Sep 25, 2019 •

edited

Loading

pgomulka commented Sep 25, 2019

Add error.stack_trace #562

Add error.stack_trace #562

Conversation

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

felixbarny commented Sep 16, 2019

mustafacagataytulun commented Sep 18, 2019

felixbarny commented Sep 18, 2019 • edited Loading

ruflin Sep 18, 2019

Choose a reason for hiding this comment

ruflin commented Sep 18, 2019

felixbarny commented Sep 18, 2019

mustafacagataytulun commented Sep 19, 2019

ruflin left a comment

Choose a reason for hiding this comment

webmat commented Sep 23, 2019

webmat commented Sep 23, 2019

weltenwort commented Sep 23, 2019

felixbarny commented Sep 24, 2019 • edited Loading

pgomulka commented Sep 24, 2019

webmat commented Sep 25, 2019 • edited Loading

pgomulka commented Sep 25, 2019

felixbarny commented Sep 18, 2019 •

edited

Loading

felixbarny commented Sep 24, 2019 •

edited

Loading

webmat commented Sep 25, 2019 •

edited

Loading