/dev/posts/

Switching from Docker to Podman

Published:

Updated:

Some notes about using Podman instead of Docker, on Linux. This has been tested on Podman v3.4.7.

The unexpected unintentional exposure of containers ports to the local network of Docker networking motivated me to try using Podman instead.

Update 2023-08-18: added warning about security impact of exposing the Podman API over TCP.

Table of content

Motivation

Docker

The standard Docker installation on Linux is implemented as system-wide daemon running as root which exposes a HTTP-based API. By default, this API is available on a Unix socket (/var/run/docker.sock) which is only reachable by some Unix group (the docker group): all the members of this group can create, launch, stop Docker images. The Docker CLI tools (eg. docker, docker-compose) work by talking to the Docker daemon through this API.

Docker process running as root:

root         863  0.0  0.2 1532664 16128 ?       Ssl  09:02   0:04 /usr/sbin/dockerd -H fd://

Docker socket, reachable from the docker group:

srw-rw---- 1 root docker 0  6 juil. 09:02 /var/run/docker.sock

Docker can be used to access arbitrary paths on the filesystem, bind reserved ports on the host. As a consequence all the users who can talk to the Docker service can be considered to be as powerful as root.

Example: using Docker for arbitrary access to the host filesystem

Arbitrary access to the host filesystem can for example be achieved by mounting the host filesystem in the container and executing some payload as root in the container:

jdoe@host$ docker run -it -rm -v /:/opt debian
root@guest# touch /opt/aa
root@guest# exit
jdoe@host$ ls -l /aa
-rw-r--r-- 1 root root 0  Jul 6 23:09

Example: using Docker to listen on an arbitrary port on the host

jdoe@host$ docker run -it -rm -p 80:80 debian

Moreover, all the ports on the container can be reached from the local network by default even if they are not exposed 😱.

Podman

Podman is compatible with Docker but it does not require a daemon and runs as the user instead of root:

Moreover, non-exposed network ports should not be accessible outside of the container (at least when using the networking model used when podman is not running as root, Slirp4netns).

Rootless Docker

Docker now has a rootless mode. In this mode, the user launches a Docker daemon (running as the user) which listens on $XDG_RUNTIME_DIR/docker.sock (/run/user/$UID/docker.sock) by default. This should fix most of the security issues related to the root Docker daemon.

In contrast to Podman, this is not daemon-less however.

Installing and using Podman

On Debian, we can install Podman with:

apt install podman containers-storage crun

The command line interface (CLI) is designed to compatible with docker:

podman run -it --rm docker.io/library/debian

In order to be able to search from the commandline, we might need to specify the repositories to use:

echo 'unqualified-search-registries=["docker.io"]' >> /etc/containers/registries.conf
podman search debian
INDEX NAME DESCRIPTION STARS OFFICIAL AUTOMATED
docker.io docker.io/library/ubuntu Ubuntu is a Debian-based Linux operating sys... 14541 [OK]
docker.io docker.io/library/debian Debian is a Linux distribution that's compos... 4362 [OK]
docker.io docker.io/library/neurodebian NeuroDebian provides neuroscience research s... 91 [OK]
docker.io docker.io/bitnami/debian-base-buildpack Debian base compilation image 2 [OK]
docker.io docker.io/mirantis/debian-build-ubuntu-xenial 0
docker.io docker.io/mirantis/debian-build-ubuntu-trusty 0
docker.io docker.io/osrf/debian_arm64 Debian arm64 Base Images 1
docker.io docker.io/rancher/debianconsole 1
... ... ... ... .... ....

Warning: qualified image name

It is recommended to use a fully qualified name for image names i.e. “docker.io/library/debian” instead of “debian”. It is probably a good idea to do this when working with Docker as well.

Docker-compose support

Docker tools (such as docker, Docker-compose) work by communicating with the Docker daemon using its API. Podman has an implementation of the Docker API which can be used to use existing tools such as Docker-compose on top of Podman.

A user Podman daemon must be run in order to expose this API:

podman system service --time=0

This creates a Unix socket listening on $XDG_RUNTIME_DIR/podman/podman.sock (/var/run/$UID/podman/podman.sock).

We can now use Docker tools such as the docker CLI command or docker-compose by asking them to talk to this socket instead of the Docker one:

export "DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock"
docker image         # => list images known by Podman
docker-compose build # => build using Podman

Warning: listening on TCP 💣

Serving the Podman API over a TCP socket instead of a Unix socket may expose Podman to SSRF, CSRF and DNS-rebinding attacks leading to arbitrary code execution on the host system! 😱

Usage with other tools

If you are using a tool which communicates with the (rootless) Docker daemon using UNIX socket but does not support customizing the socket path, you make it work using a symlink:

ln -s ${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/podman/podman.sock ${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/docker.sock

A simpler solution is to make the podman daemon listen on the path of the rootless Docker daemon:

podman system service --time=0 unix://${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/docker.sock

Pod support

In contrast to Docker, Podman supports Kubernetes-style pods as well. Pods are groups of containers designed to be executed together and share the same network stack (i.e. they can communicate with each others using localhost).

Let us run two containers in the same pod:

podman run -it --rm --pod new:foo docker.io/library/debian
podman run -it --rm --pod foo docker.io/library/debian

We can compare the namespaces[1] of host and the two containers:

$ ls -l /proc/self/ns # Host
total 0
lrwxrwxrwx 1 john john 0 Oct 13  22:45 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 mnt -> 'mnt:[4026531841]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 net -> 'net:[4026531840]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 user -> 'user:[4026531837]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 uts -> 'uts:[4026531838]'
$ ls -l /proc/37689/ns # First container
total 0
lrwxrwxrwx 1 john john 0 Oct 13  22:45 cgroup -> 'cgroup:[4026533611]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 ipc -> 'ipc:[4026533606]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 mnt -> 'mnt:[4026533609]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 net -> 'net:[4026533486]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 pid -> 'pid:[4026533610]'
lrwxrwxrwx 1 john john 0 Oct 13  22:46 pid_for_children -> 'pid:[4026533610]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:46 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 user -> 'user:[4026533488]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 uts -> 'uts:[4026533605]'
$ ls -l /proc/37779/ns # Second container, same pod
total 0
lrwxrwxrwx 1 john john 0 Oct 13  22:45 cgroup -> 'cgroup:[4026533614]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 ipc -> 'ipc:[4026533606]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 mnt -> 'mnt:[4026533612]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 net -> 'net:[4026533486]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 pid -> 'pid:[4026533613]'
lrwxrwxrwx 1 john john 0 Oct 13  22:46 pid_for_children -> 'pid:[4026533613]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:46 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 user -> 'user:[4026533488]'
lrwxrwxrwx 1 john john 0 Oct 13  22:45 uts -> 'uts:[4026533605]'

We see that (by default):

Note: this may change depending on the version. The latests manpage of podman pod create (look at the --share option) claims that the default is to share the ipc, net and uts namespaces (not the user namespace).

In other words, the containers in a pod:

Example of containers in a pod communication using abstract Unix sockets:

podman pod create foo
podman run -it --rm --pod foo docker.io/library/debian sh -c \
  'apt update && apt -y install socat && socat ABSTRACT-LISTEN:foo STDIO'
podman run -it --rm --pod foo docker.io/library/debian sh -c \
  'apt update && apt -y install socat && echo Hello | socat ABSTRACT-CONNECT:foo STDIO'

Extra tips

Override certificate configuration

Overriding the recognized CA certificates (for searching/downloading images) can be done by setting either the SSL_CERT_DIR or the SSL_CERT_FILE environment variable:

export SSL_CERT_DIR=~/certs
export SSL_CERT_FILE=~/proxy.pem

This can be useful if you are behind a (corporate) intercepting proxy.

journald logging

Journald logging can be enabled with --log-driver journald. When this is used, the container standard output and error is forwarded to journald (on the host).

Example:

podman run --name TEST --log-driver journald -d --rm docker.io/library/debian sh -c 'echo MESSAGE >&2'

Yields in journalctl -o json-pretty:

{
        "_TRANSPORT" : "journal",
        "_AUDIT_LOGINUID" : "1000",
        "_UID" : "1000",
        "_PID" : "36491",
        "_CMDLINE" : "/usr/bin/conmon --api-version 1 -c 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4 -u 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4 -r /usr/bin/crun -b /home/johndoe/.local/share/containers/storage/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata -p /run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/pidfile -n TEST --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -s -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/oci-log --conmon-pidfile /run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/johndoe/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4",
        "_SYSTEMD_UNIT" : "user@1000.service",
        "CODE_FILE" : "src/ctr_logging.c",
        "_SYSTEMD_USER_SLICE" : "user.slice",
        "_GID" : "1000",
        "PRIORITY" : "3",
        "_HOSTNAME" : "marvin",
        "_SOURCE_REALTIME_TIMESTAMP" : "1657233303306001",
        "MESSAGE" : "MESSAGE\n",
        "_CAP_EFFECTIVE" : "1ffffffffff",
        "CODE_FUNC" : "write_journald",
        "CONTAINER_ID" : "123222964b6a",
        "_SYSTEMD_SLICE" : "user-1000.slice",
        "_SYSTEMD_CGROUP" : "/user.slice/user-1000.slice/user@1000.service/user.slice/libpod-conmon-123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4.scope",
        "__MONOTONIC_TIMESTAMP" : "29765813544",
        "CONTAINER_ID_FULL" : "123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4",
        "CODE_LINE" : "264",
        "_MACHINE_ID" : "5a2d11534a4e4b2c8d021e881079158b",
        "_SYSTEMD_USER_UNIT" : "libpod-conmon-123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4.scope",
        "_SYSTEMD_OWNER_UID" : "1000",
        "_BOOT_ID" : "368a369994444fd081ae83804422cb59",
        "_SYSTEMD_INVOCATION_ID" : "d3c7fd17996f4e94a45f454c03e3fed9",
        "_COMM" : "conmon",
        "SYSLOG_IDENTIFIER" : "conmon",
        "_SELINUX_CONTEXT" : "unconfined\n",
        "CONTAINER_NAME" : "TEST",
        "__REALTIME_TIMESTAMP" : "1657233303306014",
        "_EXE" : "/usr/bin/conmon",
        "__CURSOR" : "s=15d56d36cf6f451ebba550bad0b0322b;i=21e3a;b=368a369994444fd081ae83804422cb59;m=6ee2e4528;t=5e33eb25a7f1e;x=7242824be9cd0867",
        "_AUDIT_SESSION" : "2"
}

Some notes:

References


  1. Linux namespaces are the central feature used by most containers implementations. They allow different processes to have separate virtual filesystems (VFS), network stacks, etc. ↩︎

  2. The network namespace includes the network stack (network interaces, IP, etc.) as well as Unix abstract sockets. Path-based Unix sockets are associated with the filesystem and are thus related to the mnt namespace. ↩︎