-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Promote OTel hooks from contrib to in-the-box #175
Comments
I fully endorse moving the metrics, traces and logs into this repo. The consumers later have the chance to include or exclude the data that we collect. If this is accepted, I can work on moving it here. |
1 - @open-feature/sdk-dotnet-approvers & @open-feature/sdk-dotnet-maintainers: if this makes sense, I'll abandon open-feature/dotnet-sdk-contrib#122 and move the OTEL stuff here. Anyone against? 2 - What are the steps to deprecate the library in case of approval? |
Hooks were introduced to OpenFeature to support use cases such as telemetry without bloating the SDK with dependencies that not everyone needs. In this case, Microsoft has added native support for OTel within the runtime, so dependency bloat isn't a concern. I think it's worth looking into. In terms of the implementation, are you proposing that the hook is bundled with the rest of the SDK or would OTel be automatically configured within the SDK without requiring a hook at all? Bundling the hook seems like the most straightforward approach to me. The automatic approach is compelling but I think it would introduce additional challenges we would have to consider. What do you think @askpt @austindrenski @toddbaert? |
@beeme1mr wrote #175 (comment):
When I opened this issue last week, I was just proposing the minimally invasive approach of lifting the hooks as-is into the SDK, but with another week of using the SDK in production, I'm finding the lack of complete telemetry from the SDK to be a real peeve. I'm personally finding it challenging to answer questions about perf, overhead, etc., related to the impact of using the OpenFeature SDK on a range of luke-warn to scorchingly-hot paths. So, I'm in favor of fully instrumenting the OpenFeature SDK without hooks:
There's prior art for doing this without taking any direct dependencies from the MassTransit project (see: MassTransit/MassTransit@7e9ec24, open-telemetry/opentelemetry-dotnet-contrib#788), but while impressive that they were able to do this (and laudable from the as-few-dependencies-as-possible-please perspective), I would prefer that we just accept a direct dependency on My reasoning for this is that, given the I'm open to hearing quibbles with this^ opinion, but as stated elsewhere, I have confidence in the The downside to going the MassTransit route is two-fold:
I'm not saying that either of these^ are outright blockers. The MassTransit folks have shown that its more than possible to do this in a way that produces excellent telemetry. But as a consumer of MassTransit, I often wish they would've just taken the dependency on So, I'm open to hearing other opinions, but I'm just highly skeptical that avoiding (Note: none of what I've written here is in anyway advocating for a batteries-included, opinionated approach to adding dependencies. I just view tldr; my vote is to instrument the SDK directly without hooks (leaving hooks to contribs and users) |
Could you elaborate on this a bit more? What would you like to capture that can't be collected by a hook? It's also worth noting that a provider can include hooks, so that could be a nice way to ensure consistency across implementations. Just to be clear, I'm not opposed to including native OTel support. I actually think it may be a good idea since the runtime has such good support for it. I just want to make sure we consider the impact and alternatives before committing to the change. |
@beeme1mr wrote #175 (comment):
Note Disclosure up-front: There are a couple of camps out there w.r.t. how much trace telemetry is too much trace telemetry, but I tend to land toward the right-hand tail of it. My rough rule-of-thumb is If you take this too far (e.g. as I've done on occasion), you end up with a real-world app where a root trace might contain dozens of thousands of child spans. Personally, I love having that much data (especially when the app is a background service, not serving direct user traffic, and I can live with a little tracing overhead in exchange for deep, exhaustive performance telemetry). But this has many, many downsides, from the sheer cost of collecting that much data, to the potential runtime overhead, to the fact that major telemetry SaaS providers (e.g. Datadog) have a tendency to crash when loading a trace with more than Obviously, this^ would be going wayyyy overboard for a library like the OpenFeature SDK, and that's not what I'm suggesting. (Among other things, I'm only happy with that much telemetry when I'm in control of it, i.e., if a third-party library produced that much telemetry on its own, I'd be peeved.) At the same time, in order to get to a place where I, as an app owner, can choose to collect an excessive amount of telemetry, I need the libraries that I consume to produce at least some traces of their own. So my ideal outcome from this issue would be for the OpenFeature SDK to produce at least the following traces:
the effect being something like this (where the number of traces we produce is 3 + the number of before hooks + the number of after hooks:
and potentially a few more spans if we make them sufficiently opt-in (e.g. feature flags for feature flags telemetry, lol): get_boolean_details |-------------------------------------|
+calculate_context |-|
+calculate_hooks |-|
trigger_before_hooks |--------|
trigger_before_hook |--|
trigger_before_hook |--|
trigger_before_hook |--|
evaluate |----------------|
trigger_after_hooks |--------|
trigger_after_hook |--|
trigger_after_hook |--|
trigger_after_hook |--| What I'm looking to get out of this is to set consistent telemetry expectations for consumers of the OpenFeature SDK, irrespective of what an individual contrib package chooses to do on their own. For example, I want to be able to add a new contrib hooks package, and if it causes perf regressions, I want the base OpenFeature SDK to be sufficiently instrumented that I can determine that even if the contrib package provides none of their own telemetry. This example^ is where my idea about the If we handle starting those spans here in the SDK, then a hook provider (whether from the contrib repo, or a one-off implemented in the consumer app) can add tags, record exceptions, and set the status on the ambient activity (i.e. I'm not entirely sure this^ will read as convincing/easy-consensus as I think it will look once we put together some draft code.
I (hopefully) answered this above, but just to bottom-line it: tldr; I'm proposing that the OpenFeature SDK should itself wrap hooks in telemetry spans so that consumers can observe the impact of:
|
I have been exploring better the current otel-contrib implementation.
P.S.: I think the point 3 is the same as @austindrenski pointed out but he types way faster than me 🤣 |
Hey @askpt @austindrenski, thanks for your input. I can see how that could be valuable in some situations, but it would likely be overkill most of the time. If we went that route, I would recommend not capturing at that granularity by default. To provide a bit more context behind the current implementation, please check out the feedback we received when initially adding feature flag semantics to the OTel spec. The TL;DR is the OTel team felt that a feature flag evaluation should have a negligible impact on performance in most situations. In OpenFeature, some providers may perform a remote evaluation, but those calls can typically be captured using out-of-the-box instrumentation (e.g. HTTP, gRPC). |
Absolutely. At a minimum, any telemetry we build into the library at this layer will be gated behind the usual OTel opt-in, i.e., consumer apps will need to explicitly call So by default, adding a package reference for Taking it a step further, I described two groups (i.e. var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.WithMetrics(static builder => builder
// exporters
.AddOtlpExporter()
// instrumentation
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddOpenFeature())
.WithTracing(static builder => builder
// exporters
.AddOtlpExporter()
// instrumentation
.AddAspNetCoreInstrumentation(static options => options.RecordException = true)
.AddHttpClientInstrumentation(static options => options.RecordException = true)
.AddOpenFeature(static options =>
{
options.IncludeContexts = true;
options.IncludeFlags = true;
options.IncludeHooks = true;
})); I want granular telemetry, but I'm incredibly sensitive to my position being an outlier/very much related to the specific workloads I'm building/maintaining. So very committed to finding the right balance of out-of-the-box telemetry support, without burdening consumers with unwanted noise.
That's very interesting to hear, will review the earlier discussion, thanks for the xref! |
I think this is worth exploring a bit more. I really like the idea. The only downside I see is that a provider would no longer be able to ship with OpenTelemetry support pre-configured, unless I'm missing something. Perhaps there's an elegant way to support that use case too. |
Requirements
Following open-feature/dotnet-sdk-contrib#132, the
OpenFeature.Contrib.Hooks.Otel
package now only depends onOpenFeature
andOpenTelemetry.Api
, the latter of which in turn has just a single dependency onSystem.Diagnostics.DiagnosticSource
.I'm proposing that we obsolete the
OpenFeature.Contrib.Hooks.Otel
package and incorporate its contained code into the mainOpenFeature
project to provide first-class telemetry support in line with other popular projects in the .NET ecosystem.Is this a good idea?
From
open-telemetry/opentelemetry-dotnet
(emphasis added):From
open-telemetry/opentelemetry-dotnet
(emphasis added):What if we think
OpenTelemetry.Api
is just too scary?I personally have a lot of faith in the
open-telemetry/opentelemetry-dotnet
team, and thus place a lot of trust in their promise to maintainOpenTelemetry.Api
as a lightweight abstractions package, but for good measure, I do want to call out that we could accomplish this proposed in-the-box'ing without taking a dependency onOpenTelemetry.Api
.To do this we would instead take a direct dependency on
System.Diagnostics.DiagnosticSource
, and then reimplement-the-wheel forActivityExtensions.RecordException(...)
,ActivityExtensions.SetStatus(...)
, etc.Again, I'm not endorsing this, but it's an option (e.g. among many other reasons, it would be exceptionally peevish if our impl drifts from the upstream impl and exceptions recorded by another library have different formats in Datadog/Dynatrace/etc).
See: open-feature/dotnet-sdk-contrib#132
The text was updated successfully, but these errors were encountered: