Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory spikes when using plugins with @graphql-hive/gateway@^1.0.8 #2

Closed
jaffemd opened this issue Oct 16, 2024 · 4 comments
Closed
Labels
question Further information is requested

Comments

@jaffemd
Copy link

jaffemd commented Oct 16, 2024

We recently upgraded from graphql-mesh v0 to @graphql-mesh/compose-cli v1 + Hive Gateway as recommended by the migration guide.

Here are our relevant dependencies and versions:

"dependencies": {
    "@envelop/core": "^5.0.1",
    "@graphql-hive/gateway": "^1.0.8",
    "@graphql-mesh/compose-cli": "^1.0.2",
    "@graphql-mesh/supergraph": "^0.8.6",
    "graphql": "^16.8.1",
    "graphql-yoga": "^5.6.0",
  },

Here's our config:

import { defineConfig } from "@graphql-hive/gateway";
import plugins from "@app/plugins/init";

export const gatewayConfig = defineConfig({
  supergraph: "supergraph.graphql",
  port: 8000,
  plugins: () => [...plugins],
  executionCancellation: true,
  upstreamCancellation: true,
  pollingInterval: oneYearInMs,
});

Before V1, memory usage was a plateau and stable. After upgrading to use hive gateway, we immediately observed unstable memory utilization.

image-20241014-135005

Zooming in, every 15 to 30 minutes, there is a sharp spike in memory.

image-20241014-134758

Our only clue was this release note in mesh v0.98.7 that referenced memory leaks from plugins.

We're using a mix of homegrown plugins that perform various functions such as datadog tracing and graphql-armor vendor plugins. We ran a short experiment to turn them all off and didn't observe any memory spikes:
Screenshot 2024-10-16 at 10 30 50 AM

To isolate against the possibility of the content of our plugins being the issue, we created a barebones empty plugin to see if we still saw a memory spike just with that, and we did. Using the below config, with just a plugin that hooks into onFetch and onExecute, we still saw a memory spike after around 20 minutes.

export const gatewayConfig = defineConfig({
  supergraph: "supergraph.graphql",
  port: 8000,
  plugins: () => [
    {
      onFetch: () => {},
      onExecute: () => {},
    }
  ],
  executionCancellation: true,
  upstreamCancellation: true,
  pollingInterval: oneYearInMs,
});
Screenshot 2024-10-16 at 10 31 43 AM

This made it seem clear to us that there must be some memory leakage going on within the plugin infrastructure that could be similar to the issue referenced in the graphql-mesh release notes.

@enisdenjo
Copy link
Member

Hey there! Thanks for reporting. For me to debug this in depth I'd need to a bit more about the test env you have and create a benchmark that's replicating the behaviour in order for to pin-point the issue.

Can you tell me:

  1. Which Node version are you using?
  2. Does the traffic change for the 15-20min spike, what is happening during that time?
  3. Is there consistently a spike every 15-20mins or at random times? Also during low traffic?
  4. How are you performing the test? Constant VUS over time or are you using some sort of real traffic?

@jaffemd
Copy link
Author

jaffemd commented Oct 17, 2024

@enisdenjo Thank you for the quick reply!

  1. Which Node version are you using?

We're running the gateway on a docker container in a kubernetes pod. The docker container is running node 20.14.0.

  1. Does the traffic change for the 15-20min spike, what is happening during that time?
  2. Is there consistently a spike every 15-20mins or at random times? Also during low traffic?
  3. How are you performing the test? Constant VUS over time or are you using some sort of real traffic?

This is with real traffic. It's overall constant, but we do get generally lower traffic overnight. The spikes are consistently every 15-20 minutes, but not exactly.
Screenshot 2024-10-17 at 8 20 53 AM

Screenshot 2024-10-17 at 8 18 43 AM

@dotansimha dotansimha transferred this issue from graphql-hive/console Oct 21, 2024
@enisdenjo
Copy link
Member

We're running the gateway on a docker container in a kubernetes pod. The docker container is running node 20.14.0.

We had some fights with Node and memory spikes in the past. Am wondering whether that's the case here too? Can we start by updating the Node in the container to the upcoming LTS (starting tomorrow) v22.10.0?

It's overall constant, but we do get generally lower traffic overnight. The spikes are consistently every 15-20 minutes, but not exactly.

Spikes are also during lower traffic?

@jaffemd
Copy link
Author

jaffemd commented Oct 21, 2024

Upgrading node to 22.10.0 fixed our issue. Thank you!

@jaffemd jaffemd closed this as completed Oct 21, 2024
@enisdenjo enisdenjo added the question Further information is requested label Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants