-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zipkin-dependencies (storage: ES) exception -> java.lang.OutOfMemoryError: Java heap space #143
Comments
someone with more spark experience could mention what is likely to be
usable by spark for jobs and best ways to profile. for example data is
copied a couple of times. without knowing the size of your data it is hard
to tell. you can check the elasticsearch-hadoop forum for tips as this is a
straightforward job using their library. I suspect you will get someone
suggesting to not use single jvm when processing a lot of data. in this
case it is probably wise to come prepared with how much data is in the
daily index and which daily index still works. ex you can always reprocess
days to find out which was the breaking point.
…On Tue, Aug 6, 2019, 5:09 PM Bruce Liang ***@***.***> wrote:
From one day onwards, my zipkin-dependencies job (storage: ES) run fail,
and output logs like these, and how to resolve it ?
(and my heap memory confing: -Xmx6g -Xms6g, I think it's enough)
exception.log
<https://github.com/openzipkin/zipkin-dependencies/files/3470765/exception.log>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#143?email_source=notifications&email_token=AAAPVVYXS3ULHD4SJI27RC3QDEPSLA5CNFSM4IJTGENKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDRTY6Q>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAPVVYDHZJO634NVRWOYS3QDEPSLANCNFSM4IJTGENA>
.
|
hello i have the same problem, my zipkin index size ~9-11 Gb. Must i set heap > then index size? |
Not in my experience with the zipkin-dependencies job, but I'm not a Spark expert either. |
FWIW we're running zipkin-dependencies with jdk8 and default heap, biggest index size we've seen for now is 2.5GB and it passed fine. Perhaps the complexity / size of the trace or span data plays a role ? |
Coming back to this, we ingested about 8.5GB of span data for a day recently, and even with a heap of 12G i can not get this processed it always OOMs. Obviously (right?) the heap dump contains mostly the trace data so analysing it is pointless. I started digging into the depths of Spark tuning and discovered there's a whole world of optimizations possible: https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption . I will try and get to the bottom of this, and see what options there are to make this go through. |
yeah I am surprised that it needs to buffer in memory.. doesnt sound very
streaming to me..
I forget the status of the Kafka alternative. would be nice to have
something that can work in standalone mode and do aggregation without
buffering so much as only thing needed to buffer is trace by trace ideally
cc @jeqo
…On Thu, May 21, 2020, 5:54 AM Jorg Heymans ***@***.***> wrote:
Coming back to this, we ingested about 8.5GB of span data for a day
recently, and even with a heap of 12G i can not get this processed it
always OOMs. Obviously the heap dump just contains mostly the trace data so
analysing it is pointless.
I started digging into the depths of Spark tuning and discovered there's a
whole world of optimizations possible:
https://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption
. I will try and get to the bottom of this, and see what options there are
to make this go through.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#143 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPVVYE5EP74VOC7ZWD7M3RSRGPZANCNFSM4IJTGENA>
.
|
spark-streaming is a different thing https://spark.apache.org/docs/latest/streaming-programming-guide.html , that is not what this job is doing (but maybe it should or could). Hooking up jconsole shows that in order to analyze about 5.5Gb of trace data, you need up to 10Gb of memory: Toying around with the kryo serializer as recommended here did not improve things greatly: I am going to try this week with different index sizes and see if the 2x rule in terms of heap holds. We could then document it as a recommendation. Still, i can imagine that 10Gb of trace data is not all that big, many sites will have a lot more ... |
it is crazy to me so much memory is needed. it hints manually scrolling the
data could be far better in case of no cluster.
…On Tue, Jun 2, 2020, 2:31 AM Jorg Heymans ***@***.***> wrote:
spark-streaming is a different thing
https://spark.apache.org/docs/latest/streaming-programming-guide.html ,
that is not what this job is doing (but maybe it should or could).
Hooking up jconsole shows that in order to analyze about 5.5Gb of trace
data, you need up to 10Gb of memory:
[image: zipkin-deps-default-5 5G]
<https://user-images.githubusercontent.com/193792/83441073-3a3d8e00-a446-11ea-8d18-8a9531e97be6.png>
Toying around with the kryo serializer as recommended here
<https://spark.apache.org/docs/latest/tuning.html#data-serialization> did
not improve things greatly:
[image: zipkin-deps-kryo-5 5G]
<https://user-images.githubusercontent.com/193792/83441221-74a72b00-a446-11ea-81a8-ce19b8b20110.png>
I am going to try this week with different index sizes and see if the 2x
rule in terms of heap holds.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#143 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAPVV7VXFR7EOZ3JTBKMX3RUPXW3ANCNFSM4IJTGENA>
.
|
From one day onwards, my zipkin-dependencies job (storage: ES) run fail, and output logs like these, and how to resolve it ?
My zipkin-server version: 2.12.9.
both zipkin-dependencies version 2.1.0 and 2.3.1 throw these exceptions.
(heap memory confing: -Xmx6g -Xms6g, I think it's enough)
exception.log
The text was updated successfully, but these errors were encountered: