Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos: enrich systemd.nspawn.* options #74316

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tekeri
Copy link
Contributor

@tekeri tekeri commented Nov 27, 2019

Motivation for this change

Add a practical systemd-nspawn configuration options.

Possibly resolves #51076 and #40367 .

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
Notify maintainers

cc @c0bw3b @volth @arianvp @Mic92

@ofborg ofborg bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Nov 27, 2019
@tekeri tekeri force-pushed the enrich-systemd-nspawn branch from 2bd368e to 3415eb0 Compare November 27, 2019 10:08
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Nov 27, 2019
@Ma27 Ma27 self-requested a review November 27, 2019 11:05
Copy link
Member

@andir andir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Could you make sure we have a test for this feature?

@tekeri
Copy link
Contributor Author

tekeri commented Nov 28, 2019

I'm new to NixOS tests, so it takes a while.

@tekeri tekeri force-pushed the enrich-systemd-nspawn branch from 3415eb0 to 08abb1d Compare December 27, 2019 06:43
@ofborg ofborg bot added 2.status: merge conflict This PR has merge conflicts with the target branch and removed 2.status: merge conflict This PR has merge conflicts with the target branch labels Dec 27, 2019
@tekeri tekeri force-pushed the enrich-systemd-nspawn branch from f1b0fa3 to 23ab76d Compare January 14, 2020 08:29
@tekeri tekeri force-pushed the enrich-systemd-nspawn branch from 23ab76d to f425b77 Compare January 21, 2020 09:46
@tekeri
Copy link
Contributor Author

tekeri commented Jan 21, 2020

@andir test is ready

@ofborg ofborg bot added 10.rebuild-linux: 1-10 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jan 21, 2020
@tekeri tekeri requested a review from Ma27 January 22, 2020 04:48
nixos/tests/systemd-nspawn-options.nix Outdated Show resolved Hide resolved
nixos/tests/systemd-nspawn-options.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
@tekeri
Copy link
Contributor Author

tekeri commented Jan 23, 2020

Thanks, I'll fix them.

@tekeri tekeri force-pushed the enrich-systemd-nspawn branch from f425b77 to 1007deb Compare February 6, 2020 09:43
@tekeri tekeri requested a review from Ma27 February 17, 2020 07:00
Copy link
Member

@Ma27 Ma27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work on this and your patience! I have found a few more notes, but apart from that, the change seems pretty fine. Will test it as soon as I have time to :)

nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/modules/system/boot/systemd-nspawn.nix Outdated Show resolved Hide resolved
nixos/tests/systemd-nspawn-options.nix Show resolved Hide resolved
@flokli
Copy link
Contributor

flokli commented Jul 11, 2020

@tekeri this needs another rebase.

@Ma27 did you get a chance to test this in the meantime?

@Ma27
Copy link
Member

Ma27 commented Jul 11, 2020

@flokli I'm sorry, no. But added it on my todolist for tomorrow 😅

@Ma27 Ma27 self-assigned this Jul 12, 2020
@tekeri
Copy link
Contributor Author

tekeri commented Jul 13, 2020

pushed, while github is down

@Ma27 Ma27 force-pushed the enrich-systemd-nspawn branch from 8d61e3d to 701dec4 Compare July 13, 2020 21:14
default = "";
description = ''
Optional mount options. See
<link xlink:href="url">https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html#Mount%20Options</link>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the href attribute contain a URL

Copy link
Contributor Author

@tekeri tekeri Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you tell me how to make a link to the 'Mount options' section in the manual?

systemd.services."systemd-nspawn@".serviceConfig.ExecStart = [
"" # deliberately empty. signals systemd to override the ExecStart
# Only difference between upstream is that we do not pass the -U flag
"${config.systemd.package}/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --network-veth --settings=override --machine=%i"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flokli do you know if the systemd bug which made this necessary is fixed? Also, I'd prefer to set disableUnprivilegedNspawn to true by default since it's been a valid workaround for us.

]
);
default = [];
example = [ "/tmp" { path = "/var/tmp"; mountOption = ""; } ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use literalExample for examples that contain a list or an attr-path for better formatting in the manual.

};
in

import ./make-test-python.nix ({ pkgs, ...} : {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why nscd inside the container breaks:

machine # [   13.949141] systemd-nspawn[786]: [  OK  ] Reached target Basic System.
machine # [   13.953850] systemd-nspawn[786]:          Starting Name Service Cache Daemon...
machine # [   13.960114] systemd-nspawn[786]:          Starting resolvconf update...
machine # [   13.982618] systemd-nspawn[786]: [FAILED] Failed to start Name Service Cache Daemon.
machine # [   13.984544] systemd-nspawn[786]: See 'systemctl status nscd.service' for details.
machine # [   13.988963] systemd-nspawn[786]: [  OK  ] Reached target Host and Network Name Lookups.
machine # [   13.992789] systemd-nspawn[786]: [  OK  ] Reached target User and Group Name Lookups.
machine # [   14.002572] systemd-nspawn[786]:          Starting Login Service...
machine # [   14.054966] systemd-nspawn[786]: [FAILED] Failed to start Login Service.
machine # [   14.057025] systemd-nspawn[786]: See 'systemctl status systemd-logind.service' for details.
machine # [   14.061889] systemd-nspawn[786]: [  OK  ] Stopped Login Service.
machine # [   14.068607] systemd-nspawn[786]:          Starting Login Service...
machine # [   14.094932] systemd-nspawn[786]: [  OK  ] Stopped Name Service Cache Daemon.
machine # [   14.101640] systemd-nspawn[786]:          Starting Name Service Cache Daemon...
machine # [   14.129406] systemd-nspawn[786]: [FAILED] Failed to start Name Service Cache Daemon.
machine # [   14.131745] systemd-nspawn[786]: See 'systemctl status nscd.service' for details.
machine # [   14.154355] systemd-nspawn[786]: [FAILED] Failed to start Login Service.
machine # [   14.156893] systemd-nspawn[786]: See 'systemctl status systemd-logind.service' for details.
machine # [   14.163836] systemd-nspawn[786]: [  OK  ] Stopped Login Service.
machine # [   14.178079] systemd-nspawn[786]:          Starting Login Service...
machine # [   14.243987] systemd-nspawn[786]: [  OK  ] Stopped Name Service Cache Daemon.
machine # [   14.249399] systemd-nspawn[786]:          Starting Name Service Cache Daemon...
machine # [   14.258114] systemd-nspawn[786]: [FAILED] Failed to start Login Service.
machine # [   14.261136] systemd-nspawn[786]: See 'systemctl status systemd-logind.service' for details.

# assert "Overlay=+/usr::/usr" in conf

# Checks filesConfig, networkConfig
assert "ReadOnly=false" in conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think that we need this. As soon as some upstream options change we have to fix this test accordingly. Those kind of assertions don't belong in an integration test IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually an unit test to check whether the boolean value is rendered as intended. Also, these lines gives an idea of how the options works.
No problem, however, to remove those assertions.

"touch /var/lib/machines/test-container/etc/os-release",
"machinectl start test-container",
"machinectl terminate test-container",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a more detailed test to make sure that the machine actually works? Some ideas:

  • Wait at least for multi-user.target.
  • Check if networking is fine.
  • Check if bind-mounts and restrictions do work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This changeset is intended to be used with nixops-container backend. NixOS/nixops#1226
I'm internally using it and It works over 1 year with networking and bind-mounts.

@ryantm ryantm added 2.status: merge conflict This PR has merge conflicts with the target branch and removed 2.status: merge conflict This PR has merge conflicts with the target branch labels Oct 3, 2020
@tekeri
Copy link
Contributor Author

tekeri commented Oct 28, 2020

I'll fix the conflict and add tests when I have spare time.

@bqv
Copy link
Contributor

bqv commented Nov 25, 2020

Interest

@hmenke
Copy link
Member

hmenke commented Dec 30, 2020

I haven't reviewed this PR in detail but I'm strongly in favor of a refactor of systemd.nspawn. Right now it's barely usable. I had to add some boilerplate which is nowhere to be found in the documentation to make the container boot at all.

{
  # Boilerplate
  systemd.targets.machines.enable = true;
  systemd.services."systemd-nspawn@archlinux" = {
    enable = true;
    wantedBy = [ "machines.target" ];
  };

  # Actual container
  systemd.nspawn."archlinux" = {
    enable = true; # does this even do anything?
    execConfig = {
      Boot = true;
    };
  };
}

But even with these options, the container still refuses to shut down properly on reboot and I'm stuck at the dreaded

[    **] A stop job is running for systemd-n…@archlinux.service (7s / 1min 30s)

In the journal it leaves

Dec 30 19:21:48 nixos systemd[1]: Stopping [email protected]...
Dec 30 19:21:48 nixos systemd-nspawn[499]: archlinux login: Trying to halt container. Send SIGTERM again to trigger immed>
Dec 30 19:21:48 nixos systemd-nspawn[499]: systemd 245.5-2-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPAR>
Dec 30 19:21:48 nixos systemd-nspawn[499]: Detected virtualization systemd-nspawn.
Dec 30 19:21:48 nixos systemd-nspawn[499]: Detected architecture x86-64.
Dec 30 19:23:18 nixos systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.

@arianvp
Copy link
Member

arianvp commented Dec 31, 2020 via email

@arianvp
Copy link
Member

arianvp commented Dec 31, 2020

Let me rephrase:

I think having systemd.nspawn.<name>.enable imply systemd.services.'systemd-nspawn@<name>'.enable makes sense. We should add that.

the explicit wantedBy is needed because NixOS doesn't handle the upstream [Install] sections in unit files

So we should also add that.

That you needed

 systemd.targets.machines.enable = true;

surprises me though. enable = true is the default for all systemd units.

@hmenke
Copy link
Member

hmenke commented Dec 31, 2020

@arianvp Indeed, I just checked again and systemd.targets.machines.enable = true; is not necessary. However, there is another problem with the systemd integration of NixOS. By specifying

{
  systemd.services."systemd-nspawn@archlinux".enable = true;
}

NixOS does not instantiate the [email protected] unit but instead creates a completely new unit named [email protected] which of course doesn't inherit anything from the uninstantiated unit. So this way of enabling the container is clearly wrong. To see this better, simply compare and contrast the output of systemctl cat [email protected] with systemctl cat [email protected] (or have a look at https://github.com/systemd/systemd/blob/master/units/systemd-nspawn%40.service.in). This is quite problematic because now essential options for the proper working of the container are missing.

I've opened a new issue for that: #108054

@tekeri
Copy link
Contributor Author

tekeri commented Jan 4, 2021

@hmenke I think systemd-nspawn@$machine.service is necessary for integration with machinectl start $machine.
That is currently implemented as a call of systemctl start systemd-nspawn@$machine.service and no way to override the behavior.

@stale
Copy link

stale bot commented Jul 4, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2021
@wegank wegank marked this pull request as draft March 20, 2024 15:08
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
9 participants