feat(instrumentation-fetch)! Passthrough original request to `applyCustomAttributes` #5281

chancancode · 2024-12-18T00:56:23Z

Note

This PR currently contains an extra commit from #5268. I will rebase once that is merged.

Which problem is this PR solving?

Currently, the browser fetch() instrumentation unconditionally clones the response object for every request, in order to preserve the ability for the applyCustomAttributes to consume the response body. This cloned response doesn't get consumed/released until the body has been streamed and read entirely, which forces the browser to buffer and retain the entire response body, creating unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash.

Fixes #4888

Short description of the changes

The offending code was added in #2497. This commit effectively reverts that commit and breaks the feature. This is certainly a breaking change, and we can discuss the mitigations below.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Unit tests

Checklist:

Followed the style guidelines of this project
Unit tests have been added
Documentation has been updated

Background – the browser `fetch()` API

When fetch() was added to the browser, they made some deliberate design choices/improvements:

Everything is async/promise-based (as opposed to event-based in XHR).
Breakdown the request/response lifecycle/phases more granularly (as opposed to a single onreadystatechange hook/event).
Work better/more efficiently with large/long-running response streams, reduce unnecessary buffering.

Concretely:

fetch() returns a promise that resolves with a Response object, which can happen as soon as the headers are fully received.
There are a bunch of conveniences on the Response object for working with the body, all of which are async. In addition, those APIs are methods that returns the data, rather than exposing the body as a property on the response itself. This allows the browser to stream the response in the background, and have more liberty in how the implement/fulfill those APIs. For example, in await response.json(), a browser could choose to use a streaming JSON parser and discard/move the underlying bytes as they arrive, reducing the need for buffering/copying bytes unnecessarily.
To facilitate the above, no matter which ones of the APIs are used to consume the response body, it can only be consumed once per Response object. This is necessary to achieve the goals in the previous bullet point – if you are allowed to call await response.json() followed by await response.text() sometime later, the optimization in (2) is no longer possible, and the browser must buffer the response bytes for the lifetime of the response object, and each call to those APIs will have to copy the underlying data.
The Response.clone() API provides a way to opt-in to buffering/copying the response body in cases where there are multiple consumers. This must be called before the response body has been read.

Background – the original feature request #2497

As mentioned above, the current instrumentation code unconditionally call .clone() on every response:

opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts

Line 375 in 8ab52d5

const resClone4Hook = response.clone();

It then holds onto the cloned response object, until the response body has been fully streamed/read:

opentelemetry-js/experimental/packages/opentelemetry-instrumentation-fetch/src/fetch.ts

Line 383 in 8ab52d5

endSpanOnSuccess(span, resClone4Hook);

This was introduced way back in #2497. Effectively, this code "reserves the right" to read the response body in applyCustomAttributesOnSpan(), which doesn't get called until the entire response is available, forcing the browser to buffer and hold onto the underlying bytes in the meantime.

Note

There are actually two .clone()s here. resClone predates #2497, and was added in #2203 to ensure the span timing covers the whole lifecycle of the request/response, rather than just up to the point where the headers are sent. This .clone() is still problematic in that it forces the browser to copy the bytes as there are now more than one consumer, but since it uses the streaming-based API which discard the read bytes as soon as they are become available, it at least shouldn't contribute to the memory pressure problem which is the focus of this PR. We should probably still revisit that to see if there are better ways to accomplish this by hooking into resource timing API, but it's nontrivial and out-of-scope here. Removing the second clone alone should be sufficient to address the memory pressure problem reported in #4888.

This behavior/"feature" is fundamentally unsound, as it creates unnecessary memory pressure for all fetch() requests. This may be negligible in small responses, but when streaming a large response body this can be especially problematic. Worse, in case of long-running response streams, this is effectively a memory leak. At the extremes, as reported in #4888, this can cause the browser tab to run against its allowed memory limit and crash.

What now?

I don't see how we can reasonably support this feature without running into the fundamental problem described above. To that end, and considering that this is still an experimental package, I suggest that we drop it the feature and send it back to the drawing board.

The original feature request was too sparse on details. It was unclear what it was trying to accomplish – specifically why would it be useful to be able to read the response body in applyCustomAttributes(). I could speculate on the use cases, but I also think the design of the feature is itself flawed – applyCustomAttributes() is meant to be a synchronous hook, but by design, doing anything with the response body necessitates async operations. This means code like the test added in #2497 is also fundamentally broken, and probably only working due to this internal delay, which is certainly unintentional/unreliable. Future refactors (e.g. the fix for #5122) could very well remove the need for this delay and the code will subtly break.

So, I think the best course of action is to open an issue representing the breakage/feature request, and make a note in the CHANGELOG pointing to the issue, so we can evaluate what, if any, the replacement APIs should look like.

codecov · 2024-12-18T01:00:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.63%. Comparing base (90afa28) to head (ae44a47).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5281      +/-   ##
==========================================
+ Coverage   94.62%   94.63%   +0.01%     
==========================================
  Files         323      323              
  Lines        8068     8068              
  Branches     1637     1637              
==========================================
+ Hits         7634     7635       +1     
+ Misses        434      433       -1

see 1 file with indirect coverage changes

chancancode · 2024-12-18T01:04:58Z

Tests are failing because this PR currently targets next, and #5277 hasn't landed in that branch yet.

dyladan · 2024-12-18T17:40:57Z

Feature was originally contributed by @echoontheway. There isn't a description of the original use case in the PR https://github.com/open-telemetry/opentelemetry-js/pull/2497/files

It has been raised several times in the past that this feature causes memory thrash and leaks. In my opinion, the motivating use case would have to be very convincing in order to keep this feature when it has caused so many issues.

echoontheway · 2024-12-19T04:19:01Z

It has been raised several times in the past that this feature causes memory thrash and leaks. In my opinion, the motivating use case would have to be very convincing in order to keep this feature when it has caused so many issues.

the PR was raised then to log the response of fetch api to review the data for business maintainers.

i think it's ok to remove the feature for the moment considering memory pressure for long-running response streams.

Refactor `fetch()` tests for improved clearity, type safety and realism (relative to the real-world `fetch` behavior). See the inline comments on PR open-telemetry#5268 for a detailed explaination of the changes.

chancancode · 2024-12-20T22:04:06Z

Next steps (for me):

Wait for [instrumentation-fetch] refactor fetch() tests for clarity, type safety and realism #5268 to be merged
Rebase
Open a new feature-request issue for the removed functionality: [instrumentation-fetch] Support adding attributes to http spans based on the response body #5293
Link to the issue from the CHANGELOG entry

…CustomAttributes` hook Previously, the fetch instrumentation code unconditionally clones every `fetch()` response in order to preserve the ability for the `applyCustomAttributes` hook to consume the response body. This is fundamentally unsound, as it forces the browser to buffer and retain the response body until it is fully received and read, which crates unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash. Fixes open-telemetry#4888

chancancode requested a review from a team as a code owner December 18, 2024 00:56

chancancode changed the title ~~feat(instrumentation-fetch!) [Discussion] Passthrough original request to applyCustomAttributes~~ feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes Dec 18, 2024

chancancode force-pushed the fix-4888 branch from 541dbe5 to 4b9a12e Compare December 19, 2024 02:51

chancancode changed the base branch from next to main December 19, 2024 02:53

[instrumentation-fetch] refactor fetch tests for clearity/safety

5ee5d8f

Refactor `fetch()` tests for improved clearity, type safety and realism (relative to the real-world `fetch` behavior). See the inline comments on PR open-telemetry#5268 for a detailed explaination of the changes.

chancancode mentioned this pull request Dec 21, 2024

[instrumentation-fetch] Support adding attributes to http spans based on the response body #5293

Open

chancancode force-pushed the fix-4888 branch from 4b9a12e to ae44a47 Compare December 21, 2024 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(instrumentation-fetch)! Passthrough original request to `applyCustomAttributes` #5281

feat(instrumentation-fetch)! Passthrough original request to `applyCustomAttributes` #5281

chancancode commented Dec 18, 2024 •

edited

Loading

codecov bot commented Dec 18, 2024 •

edited

Loading

chancancode commented Dec 18, 2024

dyladan commented Dec 18, 2024

echoontheway commented Dec 19, 2024

chancancode commented Dec 20, 2024 •

edited

Loading

feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes #5281

Are you sure you want to change the base?

feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes #5281

Conversation

chancancode commented Dec 18, 2024 • edited Loading

Which problem is this PR solving?

Short description of the changes

Type of change

How Has This Been Tested?

Checklist:

Background – the browser fetch() API

Background – the original feature request #2497

What now?

codecov bot commented Dec 18, 2024 • edited Loading

Codecov Report

chancancode commented Dec 18, 2024

dyladan commented Dec 18, 2024

echoontheway commented Dec 19, 2024

chancancode commented Dec 20, 2024 • edited Loading

feat(instrumentation-fetch)! Passthrough original request to `applyCustomAttributes` #5281

feat(instrumentation-fetch)! Passthrough original request to `applyCustomAttributes` #5281

chancancode commented Dec 18, 2024 •

edited

Loading

Background – the browser `fetch()` API

codecov bot commented Dec 18, 2024 •

edited

Loading

chancancode commented Dec 20, 2024 •

edited

Loading