zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

ldcsaa · 2019-08-06T07:09:25Z

From one day onwards, my zipkin-dependencies job (storage: ES) run fail, and output logs like these, and how to resolve it ?
My zipkin-server version: 2.12.9.
both zipkin-dependencies version 2.1.0 and 2.3.1 throw these exceptions.

(heap memory confing: -Xmx6g -Xms6g, I think it's enough)

exception.log

codefromthecrypt · 2019-08-07T00:52:08Z

someone with more spark experience could mention what is likely to be usable by spark for jobs and best ways to profile. for example data is copied a couple of times. without knowing the size of your data it is hard to tell. you can check the elasticsearch-hadoop forum for tips as this is a straightforward job using their library. I suspect you will get someone suggesting to not use single jvm when processing a lot of data. in this case it is probably wise to come prepared with how much data is in the daily index and which daily index still works. ex you can always reprocess days to find out which was the breaking point.

…

On Tue, Aug 6, 2019, 5:09 PM Bruce Liang ***@***.***> wrote: From one day onwards, my zipkin-dependencies job (storage: ES) run fail, and output logs like these, and how to resolve it ? (and my heap memory confing: -Xmx6g -Xms6g, I think it's enough) exception.log <https://github.com/openzipkin/zipkin-dependencies/files/3470765/exception.log> — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#143?email_source=notifications&email_token=AAAPVVYXS3ULHD4SJI27RC3QDEPSLA5CNFSM4IJTGENKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDRTY6Q>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAPVVYDHZJO634NVRWOYS3QDEPSLANCNFSM4IJTGENA> .

aaf1 · 2019-12-05T13:48:13Z

hello i have the same problem, my zipkin index size ~9-11 Gb. Must i set heap > then index size?

shakuzen · 2019-12-11T09:22:47Z

Must i set heap > then index size?

Not in my experience with the zipkin-dependencies job, but I'm not a Spark expert either.

jorgheymans · 2020-04-30T15:00:45Z

FWIW we're running zipkin-dependencies with jdk8 and default heap, biggest index size we've seen for now is 2.5GB and it passed fine. Perhaps the complexity / size of the trace or span data plays a role ?

jorgheymans · 2020-05-20T21:53:51Z

Coming back to this, we ingested about 8.5GB of span data for a day recently, and even with a heap of 12G i can not get this processed it always OOMs. Obviously (right?) the heap dump contains mostly the trace data so analysing it is pointless.

I started digging into the depths of Spark tuning and discovered there's a whole world of optimizations possible: https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption . I will try and get to the bottom of this, and see what options there are to make this go through.

codefromthecrypt · 2020-05-21T01:03:47Z

yeah I am surprised that it needs to buffer in memory.. doesnt sound very streaming to me.. I forget the status of the Kafka alternative. would be nice to have something that can work in standalone mode and do aggregation without buffering so much as only thing needed to buffer is trace by trace ideally cc @jeqo

…

On Thu, May 21, 2020, 5:54 AM Jorg Heymans ***@***.***> wrote: Coming back to this, we ingested about 8.5GB of span data for a day recently, and even with a heap of 12G i can not get this processed it always OOMs. Obviously the heap dump just contains mostly the trace data so analysing it is pointless. I started digging into the depths of Spark tuning and discovered there's a whole world of optimizations possible: https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption . I will try and get to the bottom of this, and see what options there are to make this go through. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#143 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAPVVYE5EP74VOC7ZWD7M3RSRGPZANCNFSM4IJTGENA> .

jorgheymans · 2020-06-01T18:30:56Z

spark-streaming is a different thing https://spark.apache.org/docs/latest/streaming-programming-guide.html , that is not what this job is doing (but maybe it should or could).

Hooking up jconsole shows that in order to analyze about 5.5Gb of trace data, you need up to 10Gb of memory:

Toying around with the kryo serializer as recommended here did not improve things greatly:

I am going to try this week with different index sizes and see if the 2x rule in terms of heap holds. We could then document it as a recommendation. Still, i can imagine that 10Gb of trace data is not all that big, many sites will have a lot more ...

codefromthecrypt · 2020-06-02T00:22:14Z

it is crazy to me so much memory is needed. it hints manually scrolling the data could be far better in case of no cluster.

…

On Tue, Jun 2, 2020, 2:31 AM Jorg Heymans ***@***.***> wrote: spark-streaming is a different thing https://spark.apache.org/docs/latest/streaming-programming-guide.html , that is not what this job is doing (but maybe it should or could). Hooking up jconsole shows that in order to analyze about 5.5Gb of trace data, you need up to 10Gb of memory: [image: zipkin-deps-default-5 5G] <https://user-images.githubusercontent.com/193792/83441073-3a3d8e00-a446-11ea-8d18-8a9531e97be6.png> Toying around with the kryo serializer as recommended here <https://spark.apache.org/docs/latest/tuning.html#data-serialization> did not improve things greatly: [image: zipkin-deps-kryo-5 5G] <https://user-images.githubusercontent.com/193792/83441221-74a72b00-a446-11ea-81a8-ce19b8b20110.png> I am going to try this week with different index sizes and see if the 2x rule in terms of heap holds. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#143 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAPVV7VXFR7EOZ3JTBKMX3RUPXW3ANCNFSM4IJTGENA> .

jorgheymans self-assigned this May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

ldcsaa commented Aug 6, 2019 •

edited

Loading

codefromthecrypt commented Aug 7, 2019 via email

aaf1 commented Dec 5, 2019

shakuzen commented Dec 11, 2019

jorgheymans commented Apr 30, 2020

jorgheymans commented May 20, 2020 •

edited

Loading

codefromthecrypt commented May 21, 2020 via email

jorgheymans commented Jun 1, 2020 •

edited

Loading

codefromthecrypt commented Jun 2, 2020 via email

zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143

Comments

ldcsaa commented Aug 6, 2019 • edited Loading

codefromthecrypt commented Aug 7, 2019 via email

aaf1 commented Dec 5, 2019

shakuzen commented Dec 11, 2019

jorgheymans commented Apr 30, 2020

jorgheymans commented May 20, 2020 • edited Loading

codefromthecrypt commented May 21, 2020 via email

jorgheymans commented Jun 1, 2020 • edited Loading

codefromthecrypt commented Jun 2, 2020 via email

ldcsaa commented Aug 6, 2019 •

edited

Loading

jorgheymans commented May 20, 2020 •

edited

Loading

jorgheymans commented Jun 1, 2020 •

edited

Loading