How to troubleshoot memory issues in etcd #82

iamnst19 · 2024-06-26T09:05:07Z

Hi, I would like to know how we can troubleshoot memory issue in etcd and how and how to mitigate such memory issues?

Quentin-M · 2024-06-26T09:13:48Z

Hey!

Like you said - you'd be looking at etcd itself - as the operator's own memory usage is going to be very minimal, best to refer to their repository / docs / code. Etcd is started as an embedded server though as part of the etcd-cloud-operator, so it may first seem as if the operator is taking up memory.

iamnst19 · 2024-07-10T18:21:21Z

I think the memory spike is due to S3 backup. How do I disable S3 backup? Also how and where do I need to add profiling --> https://github.com/google/pprof to check the memory profile?

Quentin-M · 2024-07-10T22:49:14Z

Th snapshot providers streams the data from etcd towards the snapshot destination, so I'd think it'd be ok if everything is implemented alright - unless etcd itself has a memory spike as part of the save somehow. Do you have a memory chart?

Disabling S3 snapshots is not recommended as this will cripple your ability to do disaster recovery, unless you enable the file backup provider with a separate and reliable storage to use. By default, the operator requires a snapshot provider.

To enable pprof, you'd want to inject it in the main here behind a command-line flag:

import (
  pprof "net/http/pprof"
)

if flagPprof != nil && len(flagPprof) > 0 {
  go func() {
    zap.S().Infof("enabling pprof on %s", flagPprof)
    pprof.ListenAndServe(flagPprof, nil)
  }
}

iamnst19 · 2024-07-11T05:49:28Z

The baseline has shifted and memory is heaping and I can see that these spike happening during the backup to S3 can I like make an adjustment to this

snapshot:
    provider: s3 # This should be configured to S3 in any real environments.
    interval: 30m
    ttl: 24h

So the backup is not very aggressive? Maybe increase the interval or reduce the TTL. If then what need to be the desired values here?

iamnst19 · 2024-07-12T10:32:01Z

Ideally this backup activity should be happening in non peak hours. How to set the time to do the backup once in a week during off peak hours?

iamnst19 · 2024-07-23T19:51:36Z

Can you please help here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to troubleshoot memory issues in etcd #82

How to troubleshoot memory issues in etcd #82

iamnst19 commented Jun 26, 2024

Quentin-M commented Jun 26, 2024

iamnst19 commented Jul 10, 2024

Quentin-M commented Jul 10, 2024

iamnst19 commented Jul 11, 2024

iamnst19 commented Jul 12, 2024

iamnst19 commented Jul 23, 2024

How to troubleshoot memory issues in etcd #82

How to troubleshoot memory issues in etcd #82

Comments

iamnst19 commented Jun 26, 2024

Quentin-M commented Jun 26, 2024

iamnst19 commented Jul 10, 2024

Quentin-M commented Jul 10, 2024

iamnst19 commented Jul 11, 2024

iamnst19 commented Jul 12, 2024

iamnst19 commented Jul 23, 2024