Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Supporting Linux Perf events with Opentracing #36

Open
hkshaw1990 opened this issue Jan 31, 2017 · 5 comments
Open

Supporting Linux Perf events with Opentracing #36

hkshaw1990 opened this issue Jan 31, 2017 · 5 comments

Comments

@hkshaw1990
Copy link

This issue is to discuss a proposal on adding perf events support within a span. Perf events (https://perf.wiki.kernel.org/index.php/Tutorial#Events) are platform side counters which can provide valuable insights on what is happening inside a span and can help us in identifying resource bottlenecks.

We would like to hear your thoughts and suggestions on the same.

We did a small proof-of-concept implementation with opentracing-go and zipkin-go-opentracing to show how the eventual integration may look.

Opentracing-go api will have a PerfString event as an addition to the span creation function. And, the backend Zipkin collector will be responsible for starting, reading, logging and closing the required counter.

To try out the opentracing-go with perfevents, this commit can be used :
platform-tracing/opentracing-go@9391ff1
And, to try out zipkin with perfevents, this commit can be used :
platform-tracing/zipkin-go-opentracing@1f9fb2a

A "perfevents" library has been added to zipkin-go-opentracing which assists in creating, maintaining and reading the perf events like cpu-cycles, instructions, cache-misses etc. An example usage in a sample application:
var eventName opentracing.PerfString
eventName = "cpu-cycles"
parent := tracer.StartSpan("Parent", eventName)

And, we can see the output as shared in the attachment.
zipkin_github

@bhs
Copy link
Contributor

bhs commented Jan 31, 2017

This is fun stuff, thank you for bringing it up.

@hkshaw1990 I think the crux of this is finding a way to record only the performance data that's relevant when the associated Span is "active" or in the foreground. This reminds me of the https://github.com/opentracing-contrib/java-spanmanager work that @sjoerdtalsma did... the tricky thing is that a Span is not necessarily "active" just because it hasn't been finish()ed yet. Have you thought through this aspect of things yet?

@hkshaw1990
Copy link
Author

Hi @bensigelman, thanks for referring to this point. I agree, that a span is not necessarily active even if it hasn't finished. But, the way the perf events are opened is to associate these counters with the (self) process. And a span is running in the context of a process. So, when a span is inactive, probably due to some I/O call, n/w call etc, then, the process is inactive too. So, the counters won't count for those inactive periods. The active/inactive period can be inferred by looking at the cpu-cycles consumed by the span over a given period of time. I hope that addresses your concerns.

@hkshaw1990
Copy link
Author

Hi @bensigelman to add to the above comment, I want to add something more here. There are multiple ways on how we can start a perf event within a span.

One is to create the event for a process (as mentioned in the above comment), wherein, the event will be responsible to count only for the process that is being monitored. Whenever, the process schedules out of the cpu, the event stops counting.

There are other useful perf events which can be used to monitor the system, and is not necessarily attached to a single process. These kind of events need root access to be created. What this implies is they will need a different way to be created than the above approach. We will need to look into this to find out a way to monitor these events too.

With that said, the provided commits and git branches in comment no.1 implement the first approach of opening events for the self process (which works as it is, we don't need any special access permissions).

Also, note that there is a certain overhead in creating and reading these events. One simple example that we tried over multiple runs with a simple span with and without monitoring a perf event :
Time spent in the span without perf event ~ 8 us (microseconds)
Time spent in the span with a perf event ~ 17 us (microseconds).

Thoughts/Suggestions?

@bhs
Copy link
Contributor

bhs commented Feb 3, 2017

@hkshaw1990 sorry for the delay, I'm too busy these days :-/

For languages that support/encourage concurrency (Go, Java, C++, arguably Node.js), process-level measurements are better than nothing but not as helpful as stats for the current thread / goroutine / etc.

As for actually implementing this: it seems as though you were imagining this would be turned on or off on a per-Span basis. Another approach might be to enable it for the entire Tracer, perhaps by creating a Tracer adapter that wraps an existing implementation and adds these tags to all Spans, or perhaps to all Spans that take longer than X µs. I might suggest that instead, as users would presumably be more interested in these numbers for Spans that took too long.

@hkshaw1990
Copy link
Author

As discussed with @bensigelman (please point out if I missed something) on gitter, here is a brief summary :
The perfevents are started at the beginning of the span (if specified) and closed at the end of the span. This gives us the data for that span. However, another span (or spans) can be started on the same machine by the root span (for which some perfevents are created). This will cause the perfevents to also count for the new spans and account that events data in the parent span. This will be consistent with the time (latency) taken by the span but it doesn't give a precise perfevents data for that span (since, there might be double counting).
Another suggestion from @bensigelman was to log the perfevents data only when a span crosses a certain threshold latency. That way the users won't be overwhelmed by the amount of data they log. Rather, they can restrict it to spans which cause real delays.

The above suggestions are being planned to be taken care of in the subsequent versions. Have sent some basic PRs to support perfevents to these repos :
opentracing/opentracing-go#135
openzipkin-contrib/zipkin-go-opentracing#50
A new repo has been created (thanks to @bensigelman) to track the perfevents package :
https://github.com/opentracing-contrib/perfevents and here is the first PR to support basic hardware events :
opentracing-contrib/perfevents#1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants