-
Notifications
You must be signed in to change notification settings - Fork 316
Add support for perfevents and an Observer API #135
Conversation
Perf events can provide additional insights into what's happening within a span. Although, latency of a span can help us in identifying the bottle necks within a trace, perf events can compliment that in identifying as to where was the bottle neck and which platform resource was busy for that span. This patch is in addition to the perf events support to zipkin-go-opentracing. Signed-off-by: Hemant Kumar <[email protected]>
Issue in reference : opentracing/specification#36 |
tracer.go
Outdated
// Perfevent is the platform metric's name(s) or an event name | ||
// supported by perfevents package | ||
// (https://github.com/opentracing-contrib/perfevents/go) | ||
Perfevent string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nit: PerfEvent
- it's not clear what tracer is expected to do when receiving this option in the StartOptions. What is the significant of the string value? What is the domain of accepted values? Should be it an enum?
- instead of extending the StartOption API, can this instead be passed as one of the Tags? We already have precedent for that,
sampling.priority
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yurishkuro, thanks for the review.
- Sure, will change it to PerfEvent
- Yeah. The comments should have been clearer. PerfEvent is a list of perf events in the form of a comma separated string (e.g., "cache-misses,cpu-cycles,instructions"). The tracer is expected to start the events in PerfEvent using the "perfevents" package (https://github.com/opentracing-contrib/perfevents/go) . On span Finish(), the events must be closed. The domain of the accepted values is the set of events supported by the "perfevents" package. Having it as an enum may pose difficulties since, we are also planning to support some dynamic events and other metrics taken from the kernel, in future. Hence, having it as an enum may pose challenges.
- I agree. Let me check this out and if it proves to be sufficient, will update this PR.
Signed-off-by: Hemant Kumar <[email protected]>
Hi @yurishkuro |
// package (https://github.com/opentracing-contrib/perfevents/go) . On span | ||
// Finish(), the events must be closed. | ||
|
||
PerfEvent = stringTagName("perfevents") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it perhaps make more sense to create a Tracer
that wraps a given "real" Tracer
and provides the perfevents for each span therein?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhs it might be useful to think about "tracer plugin" architecture. For example, when I tried a POC of emitting metrics via a decorator tracer (jaegertracing/jaeger-client-go#84), it worked but was quite inefficient, so I decided to go with an observer pattern, which means the implementation is now going to be specific to our tracer (even though the observer API itself isn't - jaegertracing/jaeger-client-go#94). The perf events monitoring seems to be in a similar category that could probably be done simply as an observer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I think that makes plenty of sense. I would like to avoid things in opentracing-contrib
that require action by Tracer
implementors... it just seems unrealistic / unsustainable.
More generally I see a benefit (due to intra-process propagation) from separating OT API and SPI interfaces. It looks good on paper, but I think it's tricky to do right with sampling and noop impls taken into account.
Sorry for this tangent, @hkshaw1990 ... it's for a good cause :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkshaw1990 could you take a look at jaegertracing/jaeger-client-go#94 w.r.t. using a similar API to drive perf event collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro and @bhs while I do like the idea of having an observer interface, but I will have to see how the perfevents will fit in there. Will investigate jaegertracing/jaeger-client-go#94 and get back to you on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I went through the PRs (jaegertracing/jaeger-client-go#94 and jaegertracing/jaeger-client-go#103 ) regarding an "observer" interface. I did a small prototype implementation for the perfevents package as well, defining the Observer and SpanObserver API in opentracing-go, an observer implementation in the perfevents package and used zipkin to register the oberservers. While this model worked great for starting the perfevents (onSetTag), reading the values and closing the event descriptors (onFinish) and maintaining the count (in the SpanObserver struct), we got issues while logging these event values for a span . The reason being, we couldn't access the values in the (client) tracer or in a sample app which creates the perfevents observer. The perf event values are per-span. Could you please point out if we missed anything obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, your use cases is slightly different from what I created the observer for, i.e. I used it for emitting metrics, something that's triggered by span events but does not have to come back to span as data. I think if the OnStartSpan() method would pass the actual span to the observer, then you can store it in the SpanObserver created for that span (circular ref - hope go can manage) and write to that span from OnFinish().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yurishkuro, sending the actual span to OnStartSpan() method works for this case. I have updated this PR to add a new Observer API in opentracing-go which is largely based on your work (jaegertracing/jaeger-client-go#94). We will need the previous commit which adds a PerfEvent tag though, to identify the user specified perf events through tags. Let me know what do you think.
Metrics are useful to gain insights in a distributed application. But there can be a lot of metrics in different domains. Adding such metrics to any one (client) tracer breaks cross platform compatibility. This commit adds a new observer and span observer API which defines a standard interface to add such metrics. To expose the metrics, one would have to implement the observer interface. The (client) tracers would just have to install callbacks for methods such as StartSpan, SetOperationName, SetTag and Finish. The registered callbacks would then be called on the span events if an observer is created. This is based on the work done by Yuri here : jaegertracing/jaeger-client-go#94 Signed-off-by: Yuri Shkuro <[email protected]> Signed-off-by: Hemant Kumar <[email protected]>
observer.go
Outdated
OnFinish(options FinishOptions) | ||
} | ||
|
||
// observer is a dispatcher to other observers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this and below are implementation details that shouldn't be in the opentracing-go (especially since they are private and as such unusable anyway).
We need more people to look at the observer API before committing it. @bhs you mentioned a separate SPI, any thoughts? At the moment this observer API is somewhat standalone, i.e. it can only be plugged into the tracer via vendor-specific initialization process (e.g. in Jaeger via tracer options).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro thanks for the review. I have added a commit which removes the implementation details out of opentracing-go. Thanks for pointing this out.
Signed-off-by: Hemant Kumar <[email protected]>
observer.go
Outdated
@@ -0,0 +1,15 @@ | |||
package opentracing | |||
|
|||
// Observer can be registered with the Tracer to recieve notifications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Sorry for the insane delay on this PR -- mea culpa.
-
"Registered" how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @bhs, I have added a commit explaining how its registered, in the comments, for the observer api.
Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hkshaw1990 I think the comments are a bit too specific. I would keep it simpler:
// Observer can be registered with the Tracer to receive notifications about new Spans.
// Tracers are not required to support the Observer API. The actual registration of the
// observer is implementation specific, e.g. it might look like this:
//
// observer := myobserver.New()
// tracer := client.NewTracer(..., client.WithObserver(observer))
@bhs I prefer not to introduce an official API for registering the observer, because if they are passed to the Tracer constructor the internal data structure can be immutable and not require a mutex for access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro that makes perfect sense; all I really care about is that the expectations are well-documented.
@hkshaw1990 I would feel a little more comfortable about this PR if there was some sort of clear disclaimer about the Observer
being an alpha API that's subject to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro I have added your suggestions and made the comments a bit more generic.
@bhs I have added a "Note" about the Observer
API being in alpha.
// Callback called from opentracing.Span.SetOperationName() | ||
OnSetOperationName(operationName string) | ||
// Callback called from opentracing.Span.SetTag() | ||
OnSetTag(key string, value interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be better and easier to pass the operation name and set of tags into OnFinish
(and eliminate these first two methods). That way the observer doesn't need to have its own copy of the Tag map (and the associated mutex, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would require the tracer implementation to keep track of all tags, which is not always the case, for example when the trace is unsampled, or when the tracer interprets and/or remaps some of the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro yes, but since this feature is something Tracers opt-in to, I thought that still might make the most sense... if they have a different "noop" (sic) impl depending on whether there's an observer, so be it.
Otherwise the observer must do the potentially wasteful accounting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a tradeoff, I suppose. If OnFinish() requires all this extra data, now the tracer must do potentially wasteful accounting, even if none of the observers actually need the tags. It also makes the assumption that observers are only interested in tags at the end (e.g. sampling.priority
tag is clearly time-dependent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/shrug
... I am primarily thinking about the OT->metrics use case where I'm assuming the tags are of interest. I do see your point that it's a tradeoff and that there are use cases for either.
@hkshaw1990 in terms of unblocking your own progress, I guess you could move the Observer interface into whatever Tracer you want to modify to depend on Observer. That way you wouldn't get blocked behind these sorts of tradeoffs where we lack information.
In parallel I'd love to see work on coupling OpenTracing and Prometheus (for example), and that would give us another caller to help refine the right interface. At the moment this feels like we're designing in a bit of a vacuum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In parallel I'd love to see work on coupling OpenTracing and Prometheus (for example), and that would give us another caller to help refine the right interface. At the moment this feels like we're designing in a bit of a vacuum.
@bhs do you mean to use Prometheus (P) just for metrics? In our shared library jaeger-lib
we have an adapter from go-kit's metrics to jaeger-client metrics API, specifically with the intention of allowing implementations like P to be pluggable. Thus my POC for metrics could be made to work with P fairly easily, I might spike it in the near future (what stopped me so far was the learning curve of setting up the P). But specifically to your point of whether the Observer is the sufficient interface for that - yes, it is, since I already have it working with other metrics implementations (e.g. expvar, in the hotrod example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro sorry for the delay.
I meant to use Prometheus (p8s???) for metrics only, yes. That was just an example, though. go-kit's metrics facade is mostly fine, too...
I realize this PR may seem a bit stalled... My remaining concern is just that we don't know whether to do a per-SetTag (etc) observer callback, or a single callback at Finish time. I still lean towards the latter based on my understanding of the potential use-cases.
@tedsuo you have been thinking about prometheus stuff due to kubecon talks, etc... I wanted to cc you here since this is very relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep let's get more opinions.
If we did the tags-on-finish approach, we'd need to force yet another implementation detail on tracers - the tags would have to passed in either a map or an array of some structs. Our tracers internally keep tags in a list, not a map.
How do we proceed with this? In order to keep the API compatibility perhaps it would make sense to add other events like logs, so that the observer fully reflects the span life cycle. @bhs regarding the discussion about getting tags one at a time or all in the end, perhaps those should be simply two different types of observers, and tracer implementation may choose which one to support. Wdyt? |
// which should call the registered (observer) callbacks. | ||
type SpanObserver interface { | ||
// Callback called from opentracing.Span.SetOperationName() | ||
OnSetOperationName(operationName string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the operation name could be set multiple times throughout the life cycle of a span. You can only know the operation name for sure in OnFinish
. In my Java-based approach for something similar, the operationName is a parameter of SpanEventListener#onFinish
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny hi, I am not too sure about making OnSetOperationName() mandatory for the observers, but @yurishkuro might be a better person to answer this since, jaeger-client-go's RPC metrics has a callback registered for this event. However, if its just about logging/setting the operation name for a span, I do agree with you. But, if an observer wants to do something critical based on the operation name, then, probably, this should remain as is?
In Go that seems reasonable since interfaces don't even need to be part of the main OT package. I'm still skeptical about the interpretation of tags as a multimap (rather than "non-multi"-map, I mean), but we don't need to get into that here. :) |
Moving this PR to opentracing-contrib/go-observer (opentracing-contrib/go-observer#1) and closing this. |
Perf events can provide additional insights into what's happening within
a span. Although, latency of a span can help us in identifying the
bottle necks within a trace, perf events can compliment that in
identifying as to where was the bottle neck and which platform resource
was busy for that span. This patch is in addition to the perf events
support to zipkin-go-opentracing.