Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cgroups v2 support to resourcemanager #1744

Merged
merged 11 commits into from
Sep 22, 2023

Conversation

Tristan-Wilson
Copy link
Member

If the --node.resource-mgmt.mem-limit-percent command line option is set, now Nitro will auto-detect whether memory stats can be checked using cgroups v2 or v1, instead of only supporting v1.

Testing done

Same testing was done as in #1738 after resetting my system to use cgroups v2:

sudo grubby --update-kernel=ALL --remove-args="systemd.unified_cgroup_hierarchy"

Additionally confirmed that cgroups v2 was correctly detected by checking log messages:

sequencer_1                    | INFO [07-07|23:46:49.218] Cgroups v2 detected, enabling memory limit RPC throttling 

@Tristan-Wilson Tristan-Wilson requested a review from anodar July 8, 2023 00:00
@cla-bot cla-bot bot added the s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA. label Jul 8, 2023
if err != nil {
return err
}
_, err = fmt.Fprintf(statsFile, `total_cache 1029980160
if _, err = fmt.Fprintf(statsFile, `total_cache 1029980160
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

_, err = ....
return err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

inactiveRe: regexp.MustCompile(`total_inactive_file (\d+)`),
}

c := newCgroupsMemoryLimitChecker(testFiles, 95)
_, err := c.isLimitExceeded()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline. Same below

if _, err := ... ; err == nil {
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_, err := c.isLimitExceeded()
if err == nil {
t.Error("Should fail open if can't read files")
}

err = updateFakeCgroupv1Files(c, 1000, 1000, 51)
err = updateFakeCgroupFiles(c, 1000, 1000, 51)
if err != nil {
t.Error(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When errors are not nil, there's probably no use of continuing the test, so this should be Fatal instead.

Same below/up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I had used the wrong one by mistake.

return err
}
return nil
}

func TestCgroupsv1MemoryLimit(t *testing.T) {
func TestCgroupsMemoryLimit(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test basically has duplicate code for 3 subtests. You can just describe each as a test-case and de-duplicate the code. (Sorry overlooked on the first review).

https://go.dev/blog/subtests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Tristan-Wilson Tristan-Wilson requested a review from anodar July 17, 2023 21:47
// it creates a trivialLimitChecker that does no checks.
// can check system limits. Currently Cgroups V1 and V2 are supported.
// If no supported mechanism is discovered, it logs an error and
// fails open, ie it creates a trivialLimitChecker that does no checks.
func newLimitChecker(conf *Config) limitChecker {
Copy link
Contributor

@anodar anodar Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've mentioned in original review where this was introduced, please return concrete type here, as returning interfaces are not idiomatic in Go. You can change this function to:

func newLimitChecker(conf *Config) (cgroupsMemoryLimitChecker, error) {
       if c := newCgroupsMemoryLimitChecker(cgroupsV1MemoryFiles, conf.MemoryLimitPercent); isSupported(c) {
		return c, nil
	}
      if c := newCgroupsMemoryLimitChecker(cgroupsV2MemoryFiles, conf.MemoryLimitPercent); isSupported(c) {
            return c, nil
      }
      return nil, errNotSupported
}

Then in the call site you can call this, check with error.Is(err, errNotSupported) to instantiate trivialLimitChecker{} when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored it as suggested.

_, err := c.isLimitExceeded()
if err == nil {
t.Error("Should fail open if can't read files")
func TestCgroupsFailIfCantOpen(t *testing.T) {
Copy link
Contributor

@anodar anodar Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func TestCgroupsv1MemoryLimit(t *testing.T) {

  for _, tc := range []struct {
        desc string
        inactive int
        want bool
} {
    {  
       desc: "limit should be exceeded", 
      inactive: 51,
      want: true,

    },
    {  
       desc: "limit should not be exceeded", 
      inactive: 51,
      want: false,

    },
} {
   t.Run(tc.desc, func(t *testing.T) {
         dir := makeCgroupsTestDir(t.TempDir())
         if err := updateFakeCgroupFiles(c, 1000, 1000, tc.inactive); err != nil {
      		t.Fatal("Updating cgroup files: %v", err)
         }
         exceeded, err := c.isLimitExceeded()
         if err != nil {
            t.Fatalf("Checking if limit exceeded: %v", err)
         }
         if exceeded != tc.want {
            t.Errorf("isLimitExceeded() = %t, want %t", exceeded, tc.want)
         }
	}
   })
}
}

You can either also have TestCgroupsFailIfCantOpen as another case in the test, or leave as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll use this pattern more often. Took your code largely as suggested.

}
if exceeded {
t.Error("Expected under limit")
t.Fatal("Expected under limit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter too much in this case, but once you change this to being table-driven tests, this should be Errorf.

https://google.github.io/styleguide/go/decisions#keep-going

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's interesting to know. I had always followed the opposite pattern (fail fast), but I'll try to use this where it makes sense.

anodar
anodar previously approved these changes Aug 10, 2023
Copy link
Contributor

@anodar anodar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments.

return newHttpServer(srv, newLimitChecker(conf)), nil
var c limitChecker
var err error
c, err = newCgroupsMemoryLimitCheckerIfSupported(conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: any reason why not just c,err := ... ; and drop declaration of those variables (two lines above this one) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have c be of the limitChecker interface type so I can assign the trivialLimitChecker to it. I moved the declaration err into the same line as the limit checker creation.

func newLimitChecker(conf *Config) limitChecker {
// newCgroupsMemoryLimitCheckerIfSupported attempts to auto-discover whether
// Cgroups V1 or V2 is supported for checking system memory limits.
func newCgroupsMemoryLimitCheckerIfSupported(conf *Config) (*cgroupsMemoryLimitChecker, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: go tends to encourage shorter names. You can drop "IfSupported" suffix and just add in the comment above the function that this returns errNotSupported if it's not supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to rename it to since newCgroupsMemoryLimitChecker is already the factory method for the CgroupsMemoryLimitChecker and this version handles auto-detecting cgroups v1 or v2.

testFiles := makeCgroupsTestDir(t.TempDir())
c := newCgroupsMemoryLimitChecker(testFiles, 95)
var err error
if _, err = c.isLimitExceeded(); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here, you don't need to define err before, you can just do

if _, err := ... ; err == nil {
}

It's more concise and limits the scope of err variable (less error prone in general that way for consequent error checks)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Tristan-Wilson Tristan-Wilson merged commit affaf25 into master Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s Automatically added by the CLA bot if the creator of a PR is registered as having signed the CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants