Allowance for using Stored Operation IDs for Semantic conventions for GraphQL Server #1011

magicmark · 2023-07-10T15:20:17Z

What are you trying to achieve?

As an alternative to using the GraphQL query name as the span name, I'd like to use its hash (stored operation ID) instead.

Context

We're looking at aligning our GraphQL infra with the following convention:

https://opentelemetry.io/docs/specs/otel/trace/semantic_conventions/instrumentation/graphql/

However, we use document IDs (a sha256 of the query) to uniquely identify queries across our infra. Our org does not allow arbitrary queries at all - every graphql request is known ahead of time, and the query will live somewhere in our codebase.

Why does this matter in the context of the span name?

Query names are not unique (different queries can be defined with the same name across different platforms, or even older/multiple versions of the same query) leading to confusion when trying to find the query in source code.

My concern is this:

Imagine a situation where one of our oncall engineers (who is unfamiliar with the GQL stack) is trying to diagnose a slow or failing operation - they might reasonably use the operation name (query name) as the lookup key when trying to find the query in our codebase to debug further. As mentioned above, this is very susceptible to finding the wrong thing.

"Just add the operation ID as a tag/attribute"

While it is possible to do this, there's still a footgun here in that folks have to know to click into the sapn to copy the operation ID instead of the span name.

Generally we try and build a path for folks to guide them to do the right thing by default - and the "right thing by default" here would be putting the operation name as the span name imo.

The ask

For GraphQL Servers that respect operation IDs only, the span name guidance could allow for using this instead?

FAQs

What about batching

This is actually a different but related point: the first span to our GQL server will be an HTTP request containing multiple queries

https://www.apollographql.com/blog/apollo-client/performance/query-batching

In a case where a server supports batching, I would propose this (where the span name is inside the square brackets)

-[ POST /graphql ]
----[ some-long-sha-256-hash-foo ]
----[ some-long-sha-256-hash-bar ]
----[ some-long-sha-256-hash-baz ]

What about Automatic Persisted Queries?

Apollo support Automatic Persisted Queries (APQ) which is a sort of hybrid. The server supports arbitrary queries, with the query name being the correct and only way to identify them. But subsequent requests have their raw query bodies substituted for IDs instead (which the server has stored in a lookup table). My understanding is that in this setup, IDs are not intended to be used as globally unique identifiers - and more similar to a cache key instead.

tl;dr servers using Apollo APQ should keep the status quo.

The text was updated successfully, but these errors were encountered:

maraisr · 2023-07-17T23:23:02Z

What a great start to the discussion, thanks! ❤️

I'd like to offer you my perspective and say yeah, I think it is entirely reasonable to include the query hash in the span name, though with a preference to the OperationName.

Much like with REST, and including the resource name as the span name, eg PUT /my/resource. The equivalent for GraphQL could either be the OperationName or the query identifier.

Both could work, as in to put the query name in the span name, with its id as an attribute or the inverse.

[ 123ms ] |--------------|    * POST /graphql
[ 123ms ]     |----|          * 79c2f6ee6f496da71e2262cf786b6c13
[ 123ms ]        |----|       * Query.ProductCard

Though can say, I do prefer the query name as the span name, so a quick glance I know what is going on.

benjie · 2023-07-22T05:04:36Z

Just a thought, but why not both? OperationName.hash has the advantage of being human readable (mostly) whilst also fixing the uniqueness problem.

trask · 2023-07-24T02:44:19Z

@open-telemetry/technical-committee can you transfer this issue to https://github.com/open-telemetry/semantic-conventions? thx

Oberon00 · 2023-07-24T10:13:18Z

I am not sure this needs to be moved, as it seems to be a general question about the span name concept. The span name is not supposed to uniquely identify a code location (although in principle, it can). If you want that, there are code.* attributes

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span

The span name concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name SHOULD be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances while still being human-readable. That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality. Generality SHOULD be prioritized over human-readability.

SonjaChevre · 2023-07-26T06:33:40Z

hey @magicmark! just fyi I have created an issue to suggest working on better GraphQL semantic conventions - and linked your interesting use case: #182

Feel free to add feedback in there as well!

github-actions bot assigned carlosalberto Jul 10, 2023

SonjaChevre mentioned this issue Jul 26, 2023

Improve GraphQL semantic conventions #182

Open

trask transferred this issue from open-telemetry/opentelemetry-specification May 7, 2024

github-actions bot assigned reyang May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowance for using Stored Operation IDs for Semantic conventions for GraphQL Server #1011

Allowance for using Stored Operation IDs for Semantic conventions for GraphQL Server #1011

magicmark commented Jul 10, 2023 •

edited

Loading

maraisr commented Jul 17, 2023

benjie commented Jul 22, 2023

trask commented Jul 24, 2023

Oberon00 commented Jul 24, 2023 •

edited

Loading

SonjaChevre commented Jul 26, 2023

Allowance for using Stored Operation IDs for Semantic conventions for GraphQL Server #1011

Allowance for using Stored Operation IDs for Semantic conventions for GraphQL Server #1011

Comments

magicmark commented Jul 10, 2023 • edited Loading

The ask

FAQs

maraisr commented Jul 17, 2023

benjie commented Jul 22, 2023

trask commented Jul 24, 2023

Oberon00 commented Jul 24, 2023 • edited Loading

SonjaChevre commented Jul 26, 2023

magicmark commented Jul 10, 2023 •

edited

Loading

Oberon00 commented Jul 24, 2023 •

edited

Loading