Switching from Docker to Podman
Published:
Updated:
Some notes about using Podman instead of Docker, on Linux. This has been tested on Podman v3.4.7.
The unexpected unintentional exposure of containers ports to the local network of Docker networking motivated me to try using Podman instead.
Update 2023-08-18: added warning about security impact of exposing the Podman API over TCP.
Update 2024-03-30: add information about Quadlet support.
Table of content
Motivation
Docker
The standard Docker installation on Linux is implemented as system-wide daemon running as root which exposes a HTTP-based API. By default, this API is available on a Unix socket (/var/run/docker.sock
) which is only reachable by some Unix group (the docker
group): all the members of this group can create, launch, stop Docker images. The Docker CLI tools (eg. docker
, docker-compose
) work by talking to the Docker daemon through this API.
Docker process running as root:
root 863 0.0 0.2 1532664 16128 ? Ssl 09:02 0:04 /usr/sbin/dockerd -H fd://
Docker socket, reachable from the docker
group:
srw-rw---- 1 root docker 0 6 juil. 09:02 /var/run/docker.sock
Docker can be used to access arbitrary paths on the filesystem, bind reserved ports on the host. As a consequence all the users who can talk to the Docker service can be considered to be as powerful as root.
Example: using Docker for arbitrary access to the host filesystem
Arbitrary access to the host filesystem can for example be achieved by mounting the host filesystem in the container and executing some payload as root in the container:
jdoe@host$ docker run -it -rm -v /:/opt debian
root@guest# touch /opt/aa
root@guest# exit
jdoe@host$ ls -l /aa
-rw-r--r-- 1 root root 0 Jul 6 23:09
Example: using Docker to listen on an arbitrary port on the host
jdoe@host$ docker run -it -rm -p 80:80 debian
Moreover, all the ports on the container can be reached from the local network by default even if they are not exposed 😱.
Podman
Podman is compatible with Docker but it does not require a daemon and runs as the user instead of root
:
- It does not use a daemon. This means it does not consume any memory when it is not used. This is quite interesting on a development machine However, a daemon can be run on demand to provide compatibility with Docker for tools which rely on the Docker API.
- It does not run as root but as the user. This means that it should not be possible to use it to get root-level acces on the host such as accessing files on the host the user does not have access to, or binding reserved ports on the host (see limitations of root-less podman).
Moreover, non-exposed network ports should not be accessible outside of the container (at least when using the networking model used when podman is not running as root, Slirp4netns).
Rootless Docker
Docker now has a rootless mode. In this mode, the user launches a Docker daemon (running as the user) which listens on $XDG_RUNTIME_DIR/docker.sock
(/run/user/$UID/docker.sock
) by default. This should fix most of the security issues related to the root Docker daemon.
In contrast to Podman, this is not daemon-less however.
Installing and using Podman
On Debian, we can install Podman with:
apt install podman containers-storage crun
The command line interface (CLI) is designed to compatible with docker
:
podman run -it --rm docker.io/library/debian
In order to be able to search from the commandline, we might need to specify the repositories to use:
echo 'unqualified-search-registries=["docker.io"]' >> /etc/containers/registries.conf
podman search debian
INDEX | NAME | DESCRIPTION | STARS | OFFICIAL | AUTOMATED |
---|---|---|---|---|---|
docker.io | docker.io/library/ubuntu | Ubuntu is a Debian-based Linux operating sys... | 14541 | [OK] | |
docker.io | docker.io/library/debian | Debian is a Linux distribution that's compos... | 4362 | [OK] | |
docker.io | docker.io/library/neurodebian | NeuroDebian provides neuroscience research s... | 91 | [OK] | |
docker.io | docker.io/bitnami/debian-base-buildpack | Debian base compilation image | 2 | [OK] | |
docker.io | docker.io/mirantis/debian-build-ubuntu-xenial | 0 | |||
docker.io | docker.io/mirantis/debian-build-ubuntu-trusty | 0 | |||
docker.io | docker.io/osrf/debian_arm64 | Debian arm64 Base Images | 1 | ||
docker.io | docker.io/rancher/debianconsole | 1 | |||
... | ... | ... | ... | .... | .... |
Warning: qualified image name
It is recommended to use a fully qualified name for image names i.e. “docker.io/library/debian” instead of “debian”. It is probably a good idea to do this when working with Docker as well.
Using Docker-compose
If you need support for Docker-compose (i.e. the compose specification), you can use podman-compose CLI tool instead of Docker compose:
podman-compose build
Alternatively, you can use the Docker-compose CLI tool with Podman by having Podman serve a Docker-compatible API. The Docker daemon exposes an HTTP-based API (over Unix socket, /run/docker.sock
or ${XDG_RUNTIME_DIR}/docker.sock
): this tool is used by docker
, docker-compose
, etc. Podman is daemon-less and does not need a API. However, Podman can expos a Docker-compatible API. This can be used to use existing tools such as Docker-compose on top of Podman.
A user Podman daemon must be run in order to expose this API:
podman system service --time=0
This creates a Unix socket listening on $XDG_RUNTIME_DIR/podman/podman.sock
(/var/run/$UID/podman/podman.sock
).
We can now use Docker tools such as the docker
CLI command or docker-compose
by asking them to talk to this socket instead of the Docker one:
export "DOCKER_HOST=unix://$XDG_RUNTIME_DIR/podman/podman.sock"
docker image # => list images known by Podman
docker-compose build # => build using Podman
docker-compose
have a harcoded to the docker
binary in order to build images. Therefore, if your docker-compose.yml
file needs to build docker images, you will have to create a symlink from docker
to podman
:
# Assumes ~/.bin is in your PATH:
ln -s "$(which podman)" ~/.bin/
Tip: fixing name resolution problems
If your compose services have problem resolving each other's hostnames, see "Can't resolve hostname of other service in docker-compose.yml".
In my case, this configuration helped (in ~/.config/containers/containers.conf
):
[network]
network_backend = "netavark"
Warning: listening on TCP 💣
Serving the Podman API over a TCP socket instead of a Unix socket may expose Podman to SSRF, CSRF and DNS-rebinding attacks leading to arbitrary code execution on the host system! 😱
Usage with other tools
If you are using a tool which communicates with the (rootless) Docker daemon using UNIX socket but does not support customizing the socket path, you make it work using a symlink:
ln -s ${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/podman/podman.sock ${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/docker.sock
A simpler solution is to make the podman daemon listen on the path of the rootless Docker daemon:
podman system service --time=0 unix://${XDG_RUNTIME_DIR:-/run/user/$(id -u)}/docker.sock
Quadlet
Podman since v4.4 provides some integration into systemd in order to support declarative container-based services (Quadlet). The following Podman unit files are supported:
- containers units (
.container
); - pod units (
.pod
); - kube units (
.kube
); - network units (
.network
); - volume units (
.volume
); - image units (
.image
).
Pod support
In contrast to Docker, Podman supports Kubernetes-style pods as well. Pods are groups of containers designed to be executed together and share the same network stack (i.e. they can communicate with each others using localhost).
Let us run two containers in the same pod:
podman run -it --rm --pod new:foo docker.io/library/debian
podman run -it --rm --pod foo docker.io/library/debian
We can compare the namespaces[1] of host and the two containers:
$ ls -l /proc/self/ns # Host
total 0
lrwxrwxrwx 1 john john 0 Oct 13 22:45 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 mnt -> 'mnt:[4026531841]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 net -> 'net:[4026531840]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 user -> 'user:[4026531837]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 uts -> 'uts:[4026531838]'
$ ls -l /proc/37689/ns # First container
total 0
lrwxrwxrwx 1 john john 0 Oct 13 22:45 cgroup -> 'cgroup:[4026533611]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 ipc -> 'ipc:[4026533606]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 mnt -> 'mnt:[4026533609]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 net -> 'net:[4026533486]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 pid -> 'pid:[4026533610]'
lrwxrwxrwx 1 john john 0 Oct 13 22:46 pid_for_children -> 'pid:[4026533610]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:46 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 user -> 'user:[4026533488]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 uts -> 'uts:[4026533605]'
$ ls -l /proc/37779/ns # Second container, same pod
total 0
lrwxrwxrwx 1 john john 0 Oct 13 22:45 cgroup -> 'cgroup:[4026533614]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 ipc -> 'ipc:[4026533606]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 mnt -> 'mnt:[4026533612]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 net -> 'net:[4026533486]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 pid -> 'pid:[4026533613]'
lrwxrwxrwx 1 john john 0 Oct 13 22:46 pid_for_children -> 'pid:[4026533613]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 time -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:46 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 user -> 'user:[4026533488]'
lrwxrwxrwx 1 john john 0 Oct 13 22:45 uts -> 'uts:[4026533605]'
We see that (by default):
- containers in the same pod have different
mnt
(obviously),cgroup
andpid
namespaces - containers in the same pod share the same
ipc
,net
[2],user
,uts
namespaces; - the
time
namespace is shared with the host.
Note: this may change depending on the version. The latests manpage of podman pod create
(look at the --share
option) claims that the default is to share the ipc, net and uts namespaces (not the user namespace).
In other words, the containers in a pod:
- have the same network stack and can communicate wih each others using localhost sockets or abstract Unix sockets;
- doe not share any files (by default) and so cannot communicate with each others using non-abstract Unix sockets;
- do not see each others processes.
Example of containers in a pod communication using abstract Unix sockets:
podman pod create foo
podman run -it --rm --pod foo docker.io/library/debian sh -c \
'apt update && apt -y install socat && socat ABSTRACT-LISTEN:foo STDIO'
podman run -it --rm --pod foo docker.io/library/debian sh -c \
'apt update && apt -y install socat && echo Hello | socat ABSTRACT-CONNECT:foo STDIO'
Conclusion
- The rootless mode can be interesting from a security point of view. It is now available with Docker as well.
- The daemon-less design integrates better with daemon supervisors such as systemd.
- A daemon can optionnally by used to provide a Docker-compatible API.
- Declarative containers, Pods, Kubes, etc. for integration of Podman workload as systemd units.
- Builtin support for Pods.
Extra tips
Override certificate configuration
Overriding the recognized CA certificates (for searching/downloading images) can be done by setting either the SSL_CERT_DIR
or the SSL_CERT_FILE
environment variable:
export SSL_CERT_DIR=~/certs
export SSL_CERT_FILE=~/proxy.pem
This can be useful if you are behind a (corporate) intercepting proxy.
journald logging
Journald logging can be enabled with --log-driver journald
. When this is used, the container standard output and error is forwarded to journald
(on the host).
Example:
podman run --name TEST --log-driver journald -d --rm docker.io/library/debian sh -c 'echo MESSAGE >&2'
Yields in journalctl -o json-pretty
:
{
"_TRANSPORT" : "journal",
"_AUDIT_LOGINUID" : "1000",
"_UID" : "1000",
"_PID" : "36491",
"_CMDLINE" : "/usr/bin/conmon --api-version 1 -c 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4 -u 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4 -r /usr/bin/crun -b /home/johndoe/.local/share/containers/storage/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata -p /run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/pidfile -n TEST --exit-dir /run/user/1000/libpod/tmp/exits --socket-dir-path /run/user/1000/libpod/tmp/socket -s -l journald --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/oci-log --conmon-pidfile /run/user/1000/containers/overlay-containers/123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /home/johndoe/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /run/user/1000/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg systemd --exit-command-arg --tmpdir --exit-command-arg /run/user/1000/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg journald --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4",
"_SYSTEMD_UNIT" : "user@1000.service",
"CODE_FILE" : "src/ctr_logging.c",
"_SYSTEMD_USER_SLICE" : "user.slice",
"_GID" : "1000",
"PRIORITY" : "3",
"_HOSTNAME" : "marvin",
"_SOURCE_REALTIME_TIMESTAMP" : "1657233303306001",
"MESSAGE" : "MESSAGE\n",
"_CAP_EFFECTIVE" : "1ffffffffff",
"CODE_FUNC" : "write_journald",
"CONTAINER_ID" : "123222964b6a",
"_SYSTEMD_SLICE" : "user-1000.slice",
"_SYSTEMD_CGROUP" : "/user.slice/user-1000.slice/user@1000.service/user.slice/libpod-conmon-123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4.scope",
"__MONOTONIC_TIMESTAMP" : "29765813544",
"CONTAINER_ID_FULL" : "123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4",
"CODE_LINE" : "264",
"_MACHINE_ID" : "5a2d11534a4e4b2c8d021e881079158b",
"_SYSTEMD_USER_UNIT" : "libpod-conmon-123222964b6ac659bb028a7d877f76b903bdc3f4702577106de81a5341ab8ad4.scope",
"_SYSTEMD_OWNER_UID" : "1000",
"_BOOT_ID" : "368a369994444fd081ae83804422cb59",
"_SYSTEMD_INVOCATION_ID" : "d3c7fd17996f4e94a45f454c03e3fed9",
"_COMM" : "conmon",
"SYSLOG_IDENTIFIER" : "conmon",
"_SELINUX_CONTEXT" : "unconfined\n",
"CONTAINER_NAME" : "TEST",
"__REALTIME_TIMESTAMP" : "1657233303306014",
"_EXE" : "/usr/bin/conmon",
"__CURSOR" : "s=15d56d36cf6f451ebba550bad0b0322b;i=21e3a;b=368a369994444fd081ae83804422cb59;m=6ee2e4528;t=5e33eb25a7f1e;x=7242824be9cd0867",
"_AUDIT_SESSION" : "2"
}
Some notes:
CONTAINER_NAME
gives the container name;CONTAINER_ID_FULL
gives the container ID;- logging is associated to the
conmon
program (_EXE
,SYSLOG_IDENTIFIER
,_CMDLINE
,_COMM
).
References
- Permissive forwarding rule leads to unintentional exposure of containers to external hosts
- Run the Docker daemon as a non-root user (Rootless mode)
- Podman documentation
- How are Docker image names parsed?
- Referencing Docker Images
- Shortcomings of Rootless Podman
- namespaces(7) — Linux manual page
- Podman and Buildah for Docker users
- Podman socket activation
Linux namespaces are the central feature used by most containers implementations. They allow different processes to have separate virtual filesystems (VFS), network stacks, etc. ↩︎
The network namespace includes the network stack (network interaces, IP, etc.) as well as Unix abstract sockets. Path-based Unix sockets are associated with the filesystem and are thus related to the
mnt
namespace. ↩︎