Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON logging can't be ingested into Elasticsearch #7610

Open
jimmyjones2 opened this issue Jul 7, 2017 · 6 comments
Open

JSON logging can't be ingested into Elasticsearch #7610

jimmyjones2 opened this issue Jul 7, 2017 · 6 comments

Comments

@jimmyjones2
Copy link

Using Logstash 5.4.3:

echo '{"foo":1}' | logstash-5.4.3/bin/logstash -f abc.conf --log.format json
echo '{"foo":"bar"}' | logstash-5.4.3/bin/logstash -f abc.conf --log.format json

with the following config:

input {
  stdin {
    codec => json
  }
}
output {
  elasticsearch {
  }
}

Produces the following JSON log

{"level":"WARN","loggerName":"logstash.outputs.elasticsearch","timeMillis":1499431239669,"thread":"[main]>worker2","logEvent":{"message":"Could not index event to Elasticsearch.","status":400,"action":["index",{"_id":null,"_index":"logstash-2017.07.07","_type":"logs","_routing":null},{"metaClass":{"metaClass":{"metaClass":{"action":"[\"index\", {:_id=>nil, :_index=>\"logstash-2017.07.07\", :_type=>\"logs\", :_routing=>nil}, 2017-07-07T12:40:39.591Z localhost.localdomain %{message}]","response":{"index":{"_index":"logstash-2017.07.07","_type":"logs","_id":"AV0dEP_riOfCotFy8tMk","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse [foo]","caused_by":{"type":"number_format_exception","reason":"For input string: \"bar\""}}}}}}}}]}}

Which can't be ingested into Elasticsearch because there are mixed types in an array which is unsupported by Elasticsearch:

"action":["index",{"_id":null,"

Given the best place for all logs is Elasticsearch, this makes me sad!

@andrewvc
Copy link
Contributor

andrewvc commented Jul 7, 2017

@jimmyjones2 this is an excellent point! We should provide some good mappings for putting LS logs into ES. I think once the modules feature gets merged we can do this.

It's not, I wound say, a bug that logs can't automatically be put into elasticsearch with the default settings, though it certainly is inconvenient. We tend not to recommend that arbitrary JSON be put into ES with the default mappings for this very reason, type clashes.

The best practice here is to create an ES mapping with dynamic: false (documented here and explicitly declare the fields of a known type ('level', 'loggerName', 'timeMillis', 'thread', 'logEvent.message'). Additionally, it makes sense to index the raw JSON into an _all type text field for searching the full message.

The reason is that logs can be very arbitrary, and we don't want to bloat our mappings with arbitrary keys in ES.

Does this make sense?

@jimmyjones2
Copy link
Author

@andrewvc Makes sense, thanks!

@anton-johansson
Copy link

anton-johansson commented Nov 20, 2018

Any development on this? I just realized this when trying to push service logs (logs from the ELK-stack itself) into Elasticsearch. It's just an unending loop of error logs from Logstash.

A Logstash warning appears and should be sent to Elasticsearch. It cannot, due to the problem this issue describes, and therefore a new message is logged. Same thing happens to that message. And so on. :(

I'll have to revert to plain logs for now.

@dramshaw-zymergen
Copy link

+1

@slmingol
Copy link

Can we get a status update on this? It was reported 3+ yrs ago and haven't seen any movement on it since.

@jeffspahr
Copy link

I reread this issue today and realized it wasn't clearly settled that this is a Logstash bug.

It's not, I wound say, a bug that logs can't automatically be put into elasticsearch with the default settings, though it certainly is inconvenient.

Elastic recommends that you do ingest json directly and supports this across the rest of its stack by enabling json logs in Elasticsearch, Beats, and Kibana. Logstash is the only component of the Elastic Stack that writes json logs that can't be natively ingested into Elasticsearch. Here's one example of Elastic recommending this approach for general applications:
https://www.elastic.co/blog/structured-logging-filebeat

We tend not to recommend that arbitrary JSON be put into ES with the default mappings for this very reason, type clashes.

The issue here isn't that different apps are writing different types to the same field which is definitely an issue that an Elastic Stack operator needs to deal with. The issue is Logstash and only Logstash is writing arrays with mixed types or changing data types for that field (switching back and forth between objects and keywords). If any other app did this, we'd ask them to fix the issue in the app. Components of the Elastic Stack should be held to just as high if not a higher standard on practicing what Elastic preaches around structured logs.

The reason is that logs can be very arbitrary, and we don't want to bloat our mappings with arbitrary keys in ES.

This is solved through using a schema like ECS. This is how Elasticsearch solved this problem in 8.x. It would make sense for Logstash and other products in the Elastic Stack to take the same approach.
elastic/elasticsearch#46119

This issue would be more appropriately tagged as a bug than as an enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants