Skip to content
This repository has been archived by the owner on Oct 26, 2023. It is now read-only.

Commit

Permalink
Added unprivileged port mapping support within the host network using…
Browse files Browse the repository at this point in the history
… proot

Signed-off-by: Max Goltzsche <[email protected]>
  • Loading branch information
mgoltzsche committed Oct 28, 2018
1 parent ad34d7a commit e4d1722
Show file tree
Hide file tree
Showing 15 changed files with 233 additions and 129 deletions.
69 changes: 57 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ ctnr [![Build Status](https://travis-ci.org/mgoltzsche/ctnr.svg?branch=master)](
=

ctnr is a CLI built on top of [runc](https://github.com/opencontainers/runc)
to manage and build OCI images as well as containers.
to manage and build OCI images as well as containers on Linux.
ctnr aims to ease system container creation and execution as unprivileged user.
Also ctnr is a tool to experiment with runc features.

Expand All @@ -26,10 +26,11 @@ Concerning accessibility, usability and security a rootless container engine has
- **Containers can be run by unprivileged users.**
_Required in restrictive environments and useful for graphical applications._
- **Container images can be built in almost every Linux environment.**
_More flexibility in unprivileged CI/CD builds - nesting unprivileged containers still doesn't work (see experiments below)._
_More flexibility in unprivileged builds - nesting containers is still limited (see [experiments](experiments.md))._
- **A higher degree and more flexible level of security.**
_Less likely for an attacker to gain root access when run as unprivileged user._
_User/group-based container access control._
_Separation of responsibilities._


### Limitations & challenges
Expand All @@ -39,8 +40,8 @@ Container execution as unprivileged user is limited:

**Container networks cannot be configured.**
As a result in a restrictive environment without root access only the host network can be used.
As a workaround ports could be mapped to higher free ranges on the host network and back using [PRoot](https://github.com/rootless-containers/PRoot)*.
Alternatively a daemon process could manage networks for unprivileged users.
As a workaround ports can be mapped on the host network using [PRoot](https://github.com/rootless-containers/PRoot)*.
Alternatively a daemon process could manage networks for unprivileged users (TBD).

This comment has been minimized.

Copy link
@AkihiroSuda

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 1, 2018

Author Owner

@AkihiroSuda
I quickly scanned it but did not try it yet. I ll definitely look into it.
To my understanding so far assigning a new IP address to a tun/tap device still requires root privileges?!
Also mapping ports on the host network efficiently still requires root privileges (at least to configure iptables using the portmap CNI plugin)?!
Due to these assumptions and since CNI is a common interface but its default plugins require root privileges I am thinking about such a daemon process run by root.
However if it is possible somehow to also assign an IP address to a tun/tap interface as unprivileged user that would be the default solution (as a CNI plugin) in order to provide high accessibility.
I ll try out slirp4netns...

This comment has been minimized.

Copy link
@AkihiroSuda

AkihiroSuda Nov 2, 2018

Contributor

You dont need root because it uses userns

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 3, 2018

Author Owner

I looked into it and tried it out - awesome!!! That will be integrated - ideally as CNI plugin!
I see it would even allow port mapping on the host using socat or similar.
This would actually also be a way communication between two netns/pods would be possible but not as efficient as within the pod.

This comment has been minimized.

Copy link
@AkihiroSuda

AkihiroSuda Nov 3, 2018

Contributor

You can create sub-netns and run the unmodified CNI bridge plugin under slirp4netns.

So pods can communicate without extra overhead if you create pods under single slirp4netns.

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 3, 2018

Author Owner

even better. I ll look into it...

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 7, 2018

Author Owner

I created the slirp-cni-plugin repo and tried to reimplement the slirp4netns functionality in Go using nsenter.

Unfortunately it provides no connectivity yet. An IP is already assigned though. It seems I am missing to trigger another kernel feature to get the full user-mode network functionality?! I need to dig into slirp4netns in more detail...

I understand now that the slirp4netns process must be running to hold the tap device handle since it is gone otherwise. Is there a way the tap device could remain intact after the plugin process terminated?

This comment has been minimized.

Copy link
@AkihiroSuda

AkihiroSuda Nov 8, 2018

Contributor

your implementation seems lacking slirp (full TCP/IP stack, not just nsenter) and it needs keep running

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 11, 2018

Author Owner

oups, I see there is more to it. The process actually copies ethernet frames around - another rather important reason why it needs to be kept running.
I ll let slirp-cni-plugin use slirp4netns as is...

This comment has been minimized.

Copy link
@mgoltzsche

mgoltzsche Nov 18, 2018

Author Owner

You're right: also the bridge works in a user namespace! Of course now I want to implement this pattern in a more usable way and I would be glad if you could have a look at my considerations.



**Inside the container a process' or file's user cannot be changed.**
Expand All @@ -56,8 +57,18 @@ For more details see Aleksa Sarai's [summary](https://rootlesscontaine.rs/) of t
\* _[PRoot](https://github.com/rootless-containers/PRoot) is a binary that hooks its child processes' kernel-space system calls using `ptrace` to simulate them in the user-space. This is more reliable but slower than hooking libc calls using `LD_PRELOAD` as [fakechroot](https://github.com/dex4er/fakechroot) does it._


## Build
## Installation
Download the binary:
```
wget -O ctnr https://github.com/mgoltzsche/ctnr/releases/download/v0.7.0-alpha/ctnr.linux-amd64 &&
chmod +x ctnr &&
sudo mv ctnr /usr/local/bin/
```
If you need [PRoot](https://github.com/rootless-containers/PRoot) or [CNI plugins](https://github.com/containernetworking/plugins)
you can build them by calling `make proot cni-plugins` within this repository's directory.


## Build
Build the binary `dist/bin/ctnr` as well as `dist/bin/cni-plugins` on a Linux machine with git, make and docker:
```
git clone https://github.com/mgoltzsche/ctnr.git
Expand Down Expand Up @@ -141,6 +152,46 @@ $ ctnr run example/cowsay hello from container
```


### Port mapping
ctnr supports port mapping using the `-p, --publish` option.
Unprivileged users can use the `--proot` option in addition.

#### Port mapping as root using a contained CNI network
When a container is run as root in a contained network (`--network default`, default as root)
the [portmap CNI plugin](https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap)
is used to map ports from a specified IP or the host network to the container.

Map the container network's port 80 to port 8080 on the host:
```
$ sudo ctnr run -p 8080:80 docker://alpine:3.8 nc -l -p 80 -e echo hello from container
```
Connectivity test on the host on another shell:
```
$ nc 127.0.0.1 8080
hello from container
```

#### Port mapping as unprivileged user using proot
Unprivileged users can enable the `--proot` option to map ports
within the host network namespace on a syscall level.

Map `bind`/`connect` syscalls with port 80 to port 8080:
```
$ ctnr run --proot -p 8080:80 docker://alpine:3.8 nc -l -p 80 -e echo hello from container
```
You can now also run another container using the same port as long as you don't
map it on the same host port (proot maps it to a random free port and back within the container):
```
$ ctnr run --proot docker://alpine:3.8 /bin/sh -c 'nc -l -p 80 -e echo hello & sleep 1; timeout -t 1 nc 127.0.0.1 80'
hello
```
Connectivity test on the host on another shell:
```
$ nc 127.0.0.1 8080
hello from container
```


## OCI specs and this implementation

An *[OCI image](https://github.com/opencontainers/image-spec/tree/v1.0.0)* provides a base [configuration](https://github.com/opencontainers/image-spec/blob/v1.0.0/config.md) and file system to create an OCI bundle from. The file system consists of a list of layers represented by tar files each containing the diff to its predecessor.
Expand Down Expand Up @@ -183,19 +234,13 @@ to either use an external runc binary or use libcontainer (no runtime dependenci
- system.Context aware processes, unpacking/packing images
- improved multi-user support (store per user group, file permissions, lock location)
- CLI integration tests
- rootless networking (using proot port mapping or a network daemon run by root)
- advanced rootless networking (using a network daemon run by root)
- separate OCI CNI network hook binary
- support starting a rootless container with a user other than 0 (using proot)
- health check
- improved Docker Compose support
- service discovery integration (hook / DNS; consul, etcd)
- detached mode
- systemd integration (cgroup, startup notification)
- **1.0 release**
- advanced logging
- support additional read-only image stores


## Experiments

[Experiments with nested containers](experiments.md)
16 changes: 8 additions & 8 deletions bundle/bundlebuilder.go → bundle/builder/bundlebuilder.go
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
package bundle
package builder

import (
"os"
"path/filepath"

"github.com/cyphar/filepath-securejoin"
"github.com/mgoltzsche/ctnr/pkg/generate"
"github.com/mgoltzsche/ctnr/bundle"
"github.com/openSUSE/umoci/pkg/fseval"
"github.com/pkg/errors"
)

type BundleBuilder struct {
id string
*generate.SpecBuilder
image BundleImage
*SpecBuilder
image bundle.BundleImage
managedFiles map[string]bool
}

func Builder(id string) *BundleBuilder {
specgen := generate.NewSpecBuilder()
specgen := NewSpecBuilder()
specgen.SetRootPath("rootfs")
b := &BundleBuilder{"", &specgen, nil, map[string]bool{}}
b.SetID(id)
Expand All @@ -31,10 +31,10 @@ func (b *BundleBuilder) SetID(id string) {
}
b.id = id
b.SetHostname(id)
b.AddAnnotation(ANNOTATION_BUNDLE_ID, id)
b.AddAnnotation(bundle.ANNOTATION_BUNDLE_ID, id)
}

func (b *BundleBuilder) SetImage(image BundleImage) {
func (b *BundleBuilder) SetImage(image bundle.BundleImage) {
b.ApplyImage(image.Config())
b.image = image
}
Expand All @@ -48,7 +48,7 @@ func (b *BundleBuilder) AddBindMountConfig(path string) {
b.AddBindMount(filepath.Join("mount", path), path, opts)
}

func (b *BundleBuilder) Build(bundle *LockedBundle) (err error) {
func (b *BundleBuilder) Build(bundle *bundle.LockedBundle) (err error) {
// Prepare rootfs
if err = bundle.UpdateRootfs(b.image); err != nil {
return errors.Wrap(err, "build bundle")
Expand Down
3 changes: 1 addition & 2 deletions pkg/generate/hook.go → bundle/builder/hook.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.

package generate
package builder

import (
"encoding/json"
Expand Down Expand Up @@ -178,7 +178,6 @@ func (b *HookBuilder) Build(spec *generate.Generator) (err error) {
}

// Add hook args metadata as annotation to parse it when it should be modified
// TODO: better parse hook args directly by using same code the hook uses
j, err := json.Marshal(b.hook)
if err != nil {
return
Expand Down
81 changes: 63 additions & 18 deletions pkg/generate/generate.go → bundle/builder/specbuilder.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.

package generate
package builder

import (
"os"
Expand All @@ -33,11 +33,17 @@ import (

type SpecBuilder struct {
generate.Generator
entrypoint []string
cmd []string
user idutils.User
prootPath string
rootless bool
entrypoint []string
cmd []string
user idutils.User
customSeccomp bool
proot *prootOptions
rootless bool
}

type prootOptions struct {
Path string
Ports []string
}

func NewSpecBuilder() SpecBuilder {
Expand Down Expand Up @@ -110,6 +116,7 @@ func (b *SpecBuilder) SetLinuxSeccompUnconfined() {
profile.DefaultAction = rspecs.ActAllow
profile.Syscalls = nil
spec.Linux.Seccomp = profile
b.customSeccomp = true
}

func (b *SpecBuilder) SetLinuxSeccomp(profile *rspecs.LinuxSeccomp) {
Expand All @@ -118,6 +125,7 @@ func (b *SpecBuilder) SetLinuxSeccomp(profile *rspecs.LinuxSeccomp) {
spec.Linux = &rspecs.Linux{}
}
spec.Linux.Seccomp = profile
b.customSeccomp = true
}

func (b *SpecBuilder) AddExposedPorts(ports []string) {
Expand Down Expand Up @@ -149,26 +157,32 @@ func (b *SpecBuilder) AddExposedPorts(ports []string) {
}

func (b *SpecBuilder) SetPRootPath(prootPath string) {
b.prootPath = prootPath
if b.proot == nil {
b.proot = &prootOptions{}
}
b.proot.Path = prootPath
// This has been derived from https://github.com/AkihiroSuda/runrootless/blob/b9a7df0120a7fee15c0223fd0fbc8c3885edd9b3/bundle/spec.go
b.AddTmpfsMount("/dev/proot", []string{"exec", "mode=755", "size=32256k"})
b.AddBindMount(prootPath, "/dev/proot/proot", []string{"bind", "ro"})
b.AddProcessEnv("PROOT_TMP_DIR", "/dev/proot")
b.AddProcessEnv("PROOT_NO_SECCOMP", "1")
b.AddProcessCapability("CAP_" + capability.CAP_SYS_PTRACE.String())
b.applyEntrypoint()
b.SetLinuxSeccompDefault()
}

func (b *SpecBuilder) AddPRootPortMapping(published, target string) {
if b.proot == nil {
b.proot = &prootOptions{}
}
b.proot.Ports = append(b.proot.Ports, published+":"+target)
}

func (b *SpecBuilder) SetProcessEntrypoint(v []string) {
b.entrypoint = v
b.cmd = nil
b.applyEntrypoint()
}

func (b *SpecBuilder) SetProcessCmd(v []string) {
b.cmd = v
b.applyEntrypoint()
}

func (b *SpecBuilder) applyEntrypoint() {
Expand All @@ -184,8 +198,18 @@ func (b *SpecBuilder) applyEntrypoint() {
} else {
args = []string{}
}
if b.prootPath != "" {
args = append([]string{"/dev/proot/proot", "-0"}, args...)
if b.proot != nil {
prootArgs := []string{"/dev/proot/proot", "--kill-on-exit", "-n"}
user := b.user.String()
if user == "0:0" {
prootArgs = append(prootArgs, "-0")
} else {
prootArgs = append(prootArgs, "-i", b.user.String())
}
for _, port := range b.proot.Ports {
prootArgs = append(prootArgs, "-p", port)
}
args = append(prootArgs, args...)
}
b.SetProcessArgs(args)
}
Expand Down Expand Up @@ -248,30 +272,51 @@ func (b *SpecBuilder) ApplyImage(img *ispecs.Image) {

// Returns the generated spec with resolved user/group names
func (b *SpecBuilder) Spec(rootfs string) (spec *rspecs.Spec, err error) {
// Resolve user name
usr, err := b.user.Resolve(rootfs)
if err != nil {
return
}
if b.rootless && (usr.Uid != 0 || usr.Gid != 0) {
return nil, errors.Errorf("rootless containers support UID/GID 0 only but %q provided", b.user.String())
}
b.user = usr.User()
if usr.Uid > 1<<32 {
return nil, errors.Errorf("uid %d exceeds range", usr.Uid)
}
if usr.Gid > 1<<32 {
return nil, errors.Errorf("gid %d exceeds range", usr.Gid)
}

// Check uid/gid constraints and proot support
if b.proot != nil {
if b.proot.Path == "" {
return nil, errors.New("proot user or port mappings specified but no proot path provided")
}
usr = idutils.UserIds{} // use 0 in native mapping
} else if b.rootless && (usr.Uid != 0 || usr.Gid != 0) {
return nil, errors.Errorf("rootless container: only user 0:0 supported but %s provided. hint: enable proot as a workaround", b.user.String())
}

// Apply entrypoint/command (using proot)
b.applyEntrypoint()

// Apply process uid/gid
b.SetProcessUID(uint32(usr.Uid))
b.SetProcessGID(uint32(usr.Gid))
// TODO: set additional gids
sp := b.Generator.Spec()

// Apply native process uid/gid mapping
if b.rootless {
b.ClearLinuxUIDMappings()
b.ClearLinuxGIDMappings()
b.AddLinuxUIDMapping(uint32(os.Geteuid()), uint32(usr.Uid), 1)
b.AddLinuxGIDMapping(uint32(os.Getegid()), uint32(usr.Gid), 1)
}
return sp, nil

// Generate default seccomp profile
if !b.customSeccomp {
b.SetLinuxSeccompDefault()
}

return b.Generator.Spec(), nil
}

func containsNamespace(ns rspecs.LinuxNamespaceType, l []rspecs.LinuxNamespace) bool {
Expand Down
Loading

0 comments on commit e4d1722

Please sign in to comment.