Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes #5281

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chancancode
Copy link
Contributor

@chancancode chancancode commented Dec 18, 2024

Note

This PR currently contains an extra commit from #5268. I will rebase once that is merged.

Which problem is this PR solving?

Currently, the browser fetch() instrumentation unconditionally clones the response object for every request, in order to preserve the ability for the applyCustomAttributes to consume the response body. This cloned response doesn't get consumed/released until the body has been streamed and read entirely, which forces the browser to buffer and retain the entire response body, creating unnecessary memory pressure on large or long-running response streams. In extreme cases, this is effectively a memory leak and can cause the browser tab to crash.

Fixes #4888

Short description of the changes

The offending code was added in #2497. This commit effectively reverts that commit and breaks the feature. This is certainly a breaking change, and we can discuss the mitigations below.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Unit tests

Checklist:

  • Followed the style guidelines of this project
  • Unit tests have been added
  • Documentation has been updated

Background – the browser fetch() API

When fetch() was added to the browser, they made some deliberate design choices/improvements:

  1. Everything is async/promise-based (as opposed to event-based in XHR).
  2. Breakdown the request/response lifecycle/phases more granularly (as opposed to a single onreadystatechange hook/event).
  3. Work better/more efficiently with large/long-running response streams, reduce unnecessary buffering.

Concretely:

  1. fetch() returns a promise that resolves with a Response object, which can happen as soon as the headers are fully received.
  2. There are a bunch of conveniences on the Response object for working with the body, all of which are async. In addition, those APIs are methods that returns the data, rather than exposing the body as a property on the response itself. This allows the browser to stream the response in the background, and have more liberty in how the implement/fulfill those APIs. For example, in await response.json(), a browser could choose to use a streaming JSON parser and discard/move the underlying bytes as they arrive, reducing the need for buffering/copying bytes unnecessarily.
  3. To facilitate the above, no matter which ones of the APIs are used to consume the response body, it can only be consumed once per Response object. This is necessary to achieve the goals in the previous bullet point – if you are allowed to call await response.json() followed by await response.text() sometime later, the optimization in (2) is no longer possible, and the browser must buffer the response bytes for the lifetime of the response object, and each call to those APIs will have to copy the underlying data.
  4. The Response.clone() API provides a way to opt-in to buffering/copying the response body in cases where there are multiple consumers. This must be called before the response body has been read.

Background – the original feature request #2497

As mentioned above, the current instrumentation code unconditionally call .clone() on every response:

It then holds onto the cloned response object, until the response body has been fully streamed/read:

This was introduced way back in #2497. Effectively, this code "reserves the right" to read the response body in applyCustomAttributesOnSpan(), which doesn't get called until the entire response is available, forcing the browser to buffer and hold onto the underlying bytes in the meantime.

Note

There are actually two .clone()s here. resClone predates #2497, and was added in #2203 to ensure the span timing covers the whole lifecycle of the request/response, rather than just up to the point where the headers are sent. This .clone() is still problematic in that it forces the browser to copy the bytes as there are now more than one consumer, but since it uses the streaming-based API which discard the read bytes as soon as they are become available, it at least shouldn't contribute to the memory pressure problem which is the focus of this PR. We should probably still revisit that to see if there are better ways to accomplish this by hooking into resource timing API, but it's nontrivial and out-of-scope here. Removing the second clone alone should be sufficient to address the memory pressure problem reported in #4888.

This behavior/"feature" is fundamentally unsound, as it creates unnecessary memory pressure for all fetch() requests. This may be negligible in small responses, but when streaming a large response body this can be especially problematic. Worse, in case of long-running response streams, this is effectively a memory leak. At the extremes, as reported in #4888, this can cause the browser tab to run against its allowed memory limit and crash.

What now?

I don't see how we can reasonably support this feature without running into the fundamental problem described above. To that end, and considering that this is still an experimental package, I suggest that we drop it the feature and send it back to the drawing board.

The original feature request was too sparse on details. It was unclear what it was trying to accomplish – specifically why would it be useful to be able to read the response body in applyCustomAttributes(). I could speculate on the use cases, but I also think the design of the feature is itself flawed – applyCustomAttributes() is meant to be a synchronous hook, but by design, doing anything with the response body necessitates async operations. This means code like the test added in #2497 is also fundamentally broken, and probably only working due to this internal delay, which is certainly unintentional/unreliable. Future refactors (e.g. the fix for #5122) could very well remove the need for this delay and the code will subtly break.

So, I think the best course of action is to open an issue representing the breakage/feature request, and make a note in the CHANGELOG pointing to the issue, so we can evaluate what, if any, the replacement APIs should look like.

@chancancode chancancode requested a review from a team as a code owner December 18, 2024 00:56
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.63%. Comparing base (90afa28) to head (ae44a47).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5281      +/-   ##
==========================================
+ Coverage   94.62%   94.63%   +0.01%     
==========================================
  Files         323      323              
  Lines        8068     8068              
  Branches     1637     1637              
==========================================
+ Hits         7634     7635       +1     
+ Misses        434      433       -1     

see 1 file with indirect coverage changes

@chancancode
Copy link
Contributor Author

Tests are failing because this PR currently targets next, and #5277 hasn't landed in that branch yet.

@chancancode chancancode changed the title feat(instrumentation-fetch!) [Discussion] Passthrough original request to applyCustomAttributes feat(instrumentation-fetch)! Passthrough original request to applyCustomAttributes Dec 18, 2024
@dyladan
Copy link
Member

dyladan commented Dec 18, 2024

Feature was originally contributed by @echoontheway. There isn't a description of the original use case in the PR https://github.com/open-telemetry/opentelemetry-js/pull/2497/files

It has been raised several times in the past that this feature causes memory thrash and leaks. In my opinion, the motivating use case would have to be very convincing in order to keep this feature when it has caused so many issues.

@chancancode chancancode changed the base branch from next to main December 19, 2024 02:53
@echoontheway
Copy link
Contributor

It has been raised several times in the past that this feature causes memory thrash and leaks. In my opinion, the motivating use case would have to be very convincing in order to keep this feature when it has caused so many issues.

the PR was raised then to log the response of fetch api to review the data for business maintainers.
pic
i think it's ok to remove the feature for the moment considering memory pressure for long-running response streams.

Refactor `fetch()` tests for improved clearity, type safety and
realism (relative to the real-world `fetch` behavior).

See the inline comments on PR open-telemetry#5268 for a detailed explaination of
the changes.
@chancancode
Copy link
Contributor Author

chancancode commented Dec 20, 2024

Next steps (for me):

…CustomAttributes` hook

Previously, the fetch instrumentation code unconditionally clones
every `fetch()` response in order to preserve the ability for the
`applyCustomAttributes` hook to consume the response body. This is
fundamentally unsound, as it forces the browser to buffer and
retain the response body until it is fully received and read, which
crates unnecessary memory pressure on large or long-running response
streams. In extreme cases, this is effectively a memory leak and can
cause the browser tab to crash.

Fixes open-telemetry#4888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in the fetch instrumentation when used against an infinite fetch request resulting in memory leak
3 participants