-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a page illustrating common instrumentation use cases #34
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
// https://github.com/ghiculescu/jekyll-table-of-contents | ||
(function($){ | ||
$.fn.toc = function(options) { | ||
var defaults = { | ||
noBackToTopLinks: false, | ||
title: '<i>Jump to...</i>', | ||
minimumHeaders: 3, | ||
headers: 'h1, h2, h3, h4, h5, h6', | ||
listType: 'ol', // values: [ol|ul] | ||
showEffect: 'show', // values: [show|slideDown|fadeIn|none] | ||
showSpeed: 'slow' // set to 0 to deactivate effect | ||
}, | ||
settings = $.extend(defaults, options); | ||
|
||
function fixedEncodeURIComponent (str) { | ||
return encodeURIComponent(str).replace(/[!'()*]/g, function(c) { | ||
return '%' + c.charCodeAt(0).toString(16); | ||
}); | ||
} | ||
|
||
var headers = $(settings.headers).filter(function() { | ||
// get all headers with an ID | ||
var previousSiblingName = $(this).prev().attr( "name" ); | ||
if (!this.id && previousSiblingName) { | ||
this.id = $(this).attr( "id", previousSiblingName.replace(/\./g, "-") ); | ||
} | ||
return this.id; | ||
}), output = $(this); | ||
if (!headers.length || headers.length < settings.minimumHeaders || !output.length) { | ||
$(this).hide(); | ||
return; | ||
} | ||
|
||
if (0 === settings.showSpeed) { | ||
settings.showEffect = 'none'; | ||
} | ||
|
||
var render = { | ||
show: function() { output.hide().html(html).show(settings.showSpeed); }, | ||
slideDown: function() { output.hide().html(html).slideDown(settings.showSpeed); }, | ||
fadeIn: function() { output.hide().html(html).fadeIn(settings.showSpeed); }, | ||
none: function() { output.html(html); } | ||
}; | ||
|
||
var get_level = function(ele) { return parseInt(ele.nodeName.replace("H", ""), 10); } | ||
var highest_level = headers.map(function(_, ele) { return get_level(ele); }).get().sort()[0]; | ||
var return_to_top = '<i class="icon-arrow-up back-to-top"> </i>'; | ||
|
||
var level = get_level(headers[0]), | ||
this_level, | ||
html = settings.title + " <"+settings.listType+">"; | ||
headers.on('click', function() { | ||
if (!settings.noBackToTopLinks) { | ||
window.location.hash = this.id; | ||
} | ||
}) | ||
.addClass('clickable-header') | ||
.each(function(_, header) { | ||
this_level = get_level(header); | ||
if (!settings.noBackToTopLinks && this_level === highest_level) { | ||
$(header).addClass('top-level-header').after(return_to_top); | ||
} | ||
if (this_level === level) // same level as before; same indenting | ||
html += "<li><a href='#" + fixedEncodeURIComponent(header.id) + "'>" + header.innerHTML + "</a>"; | ||
else if (this_level <= level){ // higher level than before; end parent ol | ||
for(i = this_level; i < level; i++) { | ||
html += "</li></"+settings.listType+">" | ||
} | ||
html += "<li><a href='#" + fixedEncodeURIComponent(header.id) + "'>" + header.innerHTML + "</a>"; | ||
} | ||
else if (this_level > level) { // lower level than before; expand the previous to contain a ol | ||
for(i = this_level; i > level; i--) { | ||
html += "<"+settings.listType+"><li>" | ||
} | ||
html += "<a href='#" + fixedEncodeURIComponent(header.id) + "'>" + header.innerHTML + "</a>"; | ||
} | ||
level = this_level; // update for the next one | ||
}); | ||
html += "</"+settings.listType+">"; | ||
if (!settings.noBackToTopLinks) { | ||
$(document).on('click', '.back-to-top', function() { | ||
$(window).scrollTop(0); | ||
window.location.hash = ''; | ||
}); | ||
} | ||
|
||
render[settings.showEffect](); | ||
}; | ||
})(jQuery); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
--- | ||
layout: page | ||
title: Instrumentation for Common Use Cases | ||
--- | ||
<div id="toc"></div> | ||
|
||
This page aims to illustrate common use cases that developers who instrument their applications and libraries with OpenTracing API need to deal with. | ||
|
||
### Tracing a Function | ||
|
||
```python | ||
def top_level_function(): | ||
span1 = tracer.start_trace('top_level_function') | ||
try: | ||
. . . # business logic | ||
finally: | ||
span1.finish() | ||
``` | ||
|
||
As a follow-up, suppose that as part of the business logic above we call another `function2` that we also want to trace. In order to attach that function to the ongoing trace, we need a way to access `span1`. We discuss how it can be done later, for now let's assume we have a helper function `get_current_span` for that: | ||
|
||
```python | ||
def function2(): | ||
span2 = get_current_span().start_child('function2') \ | ||
if get_current_span() else None | ||
try: | ||
. . . # business logic | ||
finally: | ||
if span2: | ||
span2.finish() | ||
``` | ||
|
||
We assume that for whatever reason develper does not want to start a new trace in this function if one hasn't been started by the caller already, so we account for `get_current_span` potentially returning `None`. | ||
|
||
These two examples are intentionally naive. Usually developers will not want to pollute their business functions directly with tracing code, but use other means like a [function decorator in Python](https://github.com/uber-common/opentracing-python-instrumentation/blob/master/opentracing_instrumentation/local_span.py#L59): | ||
|
||
```python | ||
@traced_function | ||
def top_level_function(): | ||
... # business logic | ||
``` | ||
|
||
### Tracing Server Endpoints | ||
|
||
When a server wants to trace execution of a request, it generally needs to go through these steps: | ||
|
||
1. Attempt to deserialize Trace Context from the incoming request, in case the trace has already been started by the client. | ||
1. Depending on the outcome, either start a new trace, or join the existing trace using deserialized Trace Context. This operation starts a new Span. | ||
1. Store the newly created span in some _request context_ that is propagated throughout the application, either by application code, or by the RPC framework. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth adding "4. Finally, close the trace using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dankosaur good catch, updated. |
||
1. Finally, close the span using `span.finish()` when the server has finished processing the request. | ||
|
||
#### Deserializing Trace Conext from Incoming Request | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/Conext/Context/ |
||
|
||
Let's assume that we have an HTTP server, and the Trace Context is propagated from the client via HTTP headers, accessible via `request.headers`: | ||
|
||
```python | ||
context = tracer.trace_context_from_text( | ||
trace_context_id=request.headers, | ||
trace_attributes=request.headers | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem I have with this API design is really nothing more than the use of the words serialisation and deserialisation. While serialisation is obviously a consequence to this behind the scenes, it's not actually what the user cares about here, and therefore it's not intuitive API. What this is really about is the transforming the state from the protocol's object to the tracing's context. And I would say that the serialisation is the writing of bytes or characters behind the scenes into streams/buffers. Maybe we should be looking at how other frameworks and SPI handle such transformer objects. ( This discussion also exists in https://github.com/opentracing/opentracing-java/pull/7/files#r50617464 ) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems this doc is now out of date or soon will be due to opentracing/opentracing-go#40 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is the policy of this office not to comment on ongoing investigations :-) I wanted these examples to reflect the current API and to be used as a benchmark for proposals to change the API. |
||
``` | ||
|
||
Here we set both arguments of the decoding method to the `headers` map. The Tracer object knows which headers it needs to read in order to reconstruct Trace Context, which comprises a trace context ID and trace attributes. | ||
|
||
#### Starting a Server-Side Span | ||
|
||
The `context` object above can be `None`, if the Tracer did not find relevant headers in the incoming request, either because the client did not send them, or maybe the client is running with a different tracer implementation (e.g. Zipkin vs. AppDash). In this case the server needs to start a brand new trace, otherwise it should join the existing trace: | ||
|
||
```python | ||
if context is None: | ||
span = tracer.start_trace(operation_name=operation) | ||
else: | ||
span = tracer.join_trace(operation_name=operation, | ||
parent_trace_context=context) | ||
span.set_tag('http.method', request.method) | ||
span.set_tag('http.url', request.full_url) | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a convenience There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, this was raised in https://github.com/opentracing/opentracing-java/pull/7/files#r50501336 Having the following API would be more intuitive and less intrusive instrumentation code
Whether the method is called startTrace(..) or trace(..) or createSpan(..) is more on the bikeshed side of things. My vote would be on createSpan(..) since that's what it does, implicit in it returns a Span object. Whether a trace is started or joined to is simply a consequence to the action. |
||
|
||
The `operation` above refers to the name the server wants to give to the Span. For example, if the HTTP request was a POST against `/save_user/123`, the operation name can be set to `post:/save_user/`. OpenTracing API does not dictate how application calls the spans. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good note. |
||
|
||
The `set_tag` calls are examples of recording additional information in the span about the request. | ||
|
||
#### In-Process Request Context Propagation | ||
|
||
Request context propagation refers to application's ability to associate a certain _context_ with the incoming request such that this context is accessible in all other layers of the application within the same process. It can be used to provide application layers with access to request-specific values such as the identity of the end user, authorization tokens, and the request's deadline. It can also be used for transporting the current tracing Span. | ||
|
||
The methods of request context propagation are outside the scope of the OpenTracing API, but it is worth mentioning them here to better understand the following sections. There are two commonly used methods of propagation: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. supernit: |
||
|
||
1. Explicit propagation - the application code is structured to pass around a certain context object. For example, see https://blog.golang.org/context. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: For example (code snippet) then link. that makes lazy readers understand more |
||
1. Implicit propagation - the context is stored in platform-specific storage that allows it to be retrieved from any place in the application. Often used by RPC frameworks by utilizing such mechanisms as thread-local or continuation-local storage, or even global variables (in case of single-threaded processes). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reader of this doc could be an API or an SPI person. For example, the latter will likely build implicit with an explicit context. Not sure the best way to say this, but this made me think we shouldn't imply these are different solutions. Explicit or Implicit here is mainly how you get a handle on a context, ex via passing around, or via Dependency injection, or static. Maybe library vs framework is really the heart of this topic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my mind the main difference is that explicit propagation is done by the end user, and the business application needs to have all its method accept a context object. With implicit propagation the application does not need to change to have the propagation, because it is done by "magic" in a platform-specific way. Implicit propagation always has a performance penalty, while explicit propagation leaks an infrastructure concern into the application code. And in some languages like Go, implicit propagation is simply not possible (horrible hacks of identifying goroutine by the concatenation of its stack trace notwithstanding). Open to suggestions (or PRs) on how to clarify all of this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on what you said, I think the only thing that could clarify better
than what you have already is to switch the order.. implicit first, then in
the explicit section note that in some languages like go, implicit is not
possible.
That focuses a little better as I think I was reacting more to the nuances
of implicit things vs the idea that in some cases you can't do implicit
(even if you want to).
|
||
|
||
### Tracing Client Calls | ||
|
||
When an application acts as an RPC client, it is expected to start a new tracing Span before making an outgoing request, and serialize the new span's Trace Context into that request. The following example shows how it can be done for an HTTP request. | ||
|
||
```python | ||
def traced_request(request, operation, http_client): | ||
# retrieve current span from propagated request context | ||
parent_span = get_current_span() | ||
|
||
# start a new child span or a brand new trace if no parent | ||
if parent_span is None: | ||
span = tracer.start_trace(operation_name=operation) | ||
else: | ||
span = parent_span.start_child(operation_name=operation) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the API can do this as a one-liner.
This also co-incidentally removes the need for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also +1 to combining start_trace and start_child |
||
span.set_tag('server.http.url', request.full_url) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ugh.. s/server.http.url/http.url/ please I understand why you are saying this, but ex in zipkin if there's actually an clash, it is tagged with an endpoint which compensates. On single-endpoint spans, this won't clash. This is likely to be read by people and I'd hate to make them stop and think about this part too much, or worse use it. |
||
|
||
# encode Trace Context into HTTP request headers | ||
h_ctx, h_attr = tracer.trace_context_to_text( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this section will need complete rewrite I think, but I get where it is at. |
||
trace_context=span.trace_context) | ||
|
||
for key, value in h_ctx.iteritems(): | ||
request.add_header(key, value) | ||
if h_attr: | ||
for key, value in h_attr.iteritems(): | ||
request.add_header(key, value) | ||
|
||
# define a callback where we can finish the span | ||
def on_done(future): | ||
if future.exception(): | ||
span.log_event_with_payload('exception', exception) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know there is still some debate on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm in favour of
It's a nice clean simple API, that supports the different use-cases we've seen. We need to iron out the plumbing api before trying to add porcelain. |
||
else: | ||
span.set_tag('http.status_code', future.result().status_code) | ||
span.finish() | ||
|
||
try: | ||
future = http_client.execute(request) | ||
future.add_done_callback(on_done) | ||
return future | ||
except Exception e: | ||
span.log_event_with_payload('exception', e) | ||
span.finish() | ||
raise | ||
``` | ||
|
||
* The `get_current_span()` function is not a part of the OpenTracing API. It is meant to represent some util method of retrieving the current span from the current request context propagated implicitly (as is often the case in Python). | ||
* The encoding function `trace_context_to_text` returns two maps, one representing the encoding of the trace context identity, the other the trace context attributes. We copy both maps into the HTTP request headers. | ||
* We assume the HTTP client is asynchronous, so it returns a Future, and we need to add an on-completion callback to be able to finish the current child span. | ||
* If HTTP client returns a future with exception, we log the exception to the span with `log_event_with_payload` method. | ||
* Because the HTTP client may throw an exception even before returning a Future, we use a try/catch block to ensure that we finish the span in all circumstances. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add... to ensure it is reported and avoid leaking resources. Ex. this is commonly the reason, but that might not be obvious. |
||
|
||
|
||
### Using Distributed Context Propagation | ||
|
||
The client and server examples above took care of propagating Trace Context on the wire, which includes any Trace Attributes. The client may use the Trace Attributes to pass additional data to the server and any other downstream server it might call. | ||
|
||
```python | ||
|
||
# client side | ||
span.trace_context.set_attribute('auth-token', '.....') | ||
|
||
# server side (one or more levels down from the client) | ||
token = span.trace_context.get_attribute('auth-token') | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does scare me. The OpenTracing API is going to get abused on this front. Whether Distributed Context Propagation was going to be used by the application in this manner was the question i was asking in opentracing/opentracing-java#7 (comment) The example from that comment was: by taking the correlation id out of the span_context (what's still called trace_context here) and putting it into the logging context (what log4j-1 calls the MDC and what log4j-2 calls the thread context) so that systems like Zipkin and Kibana can be used in combination. I would rather see the logging example used here instead of "auth-token". The logging example keeps the suggestion of how to use the context propagation at the infrustructure/systems level, rather than it-can-be-used-for-whatever-you-like in your application code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the example of auth-token is an unnecessarily scary one. Also, this assumes the primary context propagation api is OpenTracing, when in reality many frameworks have one to do things like this directly. I agree we should use an example more likely for a user to call, or at least not inviting discussions about security (by choosing a different field). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. auth-token is something that was explicitly raised as a use case for distributed context propagation. On that, and on the scope of propagation functions in OT, please see opentracing/opentracing-go#33 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Commented there.
|
||
|
||
### Logging Events | ||
|
||
We have already used `log_event_with_payload` in the client span use case. Events can be logged without a payload, and not just where the span is being created / finished. For example, the application may record a cache miss event in the middle of execution, as long as it can get access to the current span from request context: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add ~ "These events also have a timestamp associated with them, which distinguishes them from a tag applying to the entire span." ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, and I also added a whole use case (unsupported atm) about recording externally captured timestamps. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where's the example of recording externally captured timestamps? |
||
|
||
```python | ||
|
||
span = get_current_span() | ||
span.log_event('cache-miss') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sweet. cc @marcingrzejszczak |
||
``` | ||
|
||
The tracer automatically records a timestamp of the event, in contrast with tags that apply to the entire span. In the future versions of the API it will be possible to associate an externally provided timestamp with the event, e.g. see [pull request #38](https://github.com/opentracing/opentracing-go/pull/38). | ||
|
||
### Recording Spans With External Timestamps | ||
|
||
There are scenarios when it is impractical to incorporate an OpenTracing compatible tracer into a service, for various reasons. For example, a user may have a log file of what's essentially span data coming from a black-box process (e.g. HAProxy). In order to get the data into an OpenTracing-compatible system, the API needs a way to record spans with extrenally captured timestamps. The current version of API does not have that capability: see [issue #20](https://github.com/opentracing/opentracing.github.io/issues/20). | ||
|
||
### Setting "Debug" Mode | ||
|
||
Most tracing systems apply sampling to minimize the amount of trace data sent to the system. Sometimes developers want to have a way to ensure that a particular trace is going to be recorded (sampled) by the tracing system, e.g. by including a special parameter in the HTTP request, like `debug=true`. The OpenTracing API does not have any insight into sampling techniques used by the implementation, so there is no explicit API to force it. However, the implementations are advised to recognized the `debug` Trace Attribute and take measures to record the span. In order to pass this attribute to tracing systems that rely on pre-trace sampling, the following approach can be used: | ||
|
||
```python | ||
|
||
if request.get('debug'): | ||
trace_context = tracer.new_root_trace_context() | ||
trace_context.set_attribute('debug', True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest changing this to sampled flag, not debug. The latter is very rarely implemented, and quite complex compared to the former. cc @kristofa @marcingrzejszczak This topic is simultaneously quite important, and rarely implemented the same way. The debug flag is complex and only differs from the sampled flag in so far as it is passed to the collection tier. In many cases, collection tier is accepting 100%, so this complexity isn't warranted. As such, Brave doesn't even implement this. If it were, you could imagine that collectors could choose to not honor debug anyway. In brave, In Finagle, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intention here was to provide a way for application code to mark the trace as "debug" (whatever that means) in a vendor-agnostic way. Of course, the primary meaning I wanted was to ensure the trace is sampled, but OpenTracing API itself never talks about sampling anywhere, leaving it as an implementation detail of the underlying tracing system. So I am reluctant to use the sampling reference as the standard tag name, whereas BTW, in our internal Zipkin-compatible implementation the debug flag is used to raise both DEBUG and SAMPLED bits in the Flags, the former to persist in storage and potentially display differently in the UI, the latter to override sampling decision. @bensigelman do you have thoughts on ^^^? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for clarifying your intent. makes sense to me
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DEBUG isn't intuitive for me either. I understand the SAMPLED isn't either. What about FORCE or FORCE_TRACING ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
At any rate, it seems we are choosing words that aren't defined in OT as an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ironically, the proposed API changes in PR40 make the currently documented approach invalid anyway, so we can mark it as "open unsolved". Once PR40 is approved, we'll need to find a new approach, which is likely not to involve using tags to force the sampling, but rather a method parameter. "Force tracing" sgtm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah let's defer this part of the decision. Another option that is more flexible for end-tracers would be something like a "Trace Priority" flag or level. |
||
span = tracer.start_span_with_context( | ||
operation_name=operation, | ||
trace_context=trace_context) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why doesn't this use
opentracing-javascript
?! :)