/dev/posts/

TUN/TAP interface (on Linux)

Published:

Updated:

Some notes about using the TUN/TAP interface, especially on Linux.

Table of content

Introduction

TUN/TAP is an operating-system interface for creating network interfaces managed by userspace. This is usually used to implement userspace Virtual Private Networks[1] (VPNs), for example with OpenVPN, OpenSSH (Tunnel configuration or -w argument), l2tpns, etc. This interface is exposed through the /dev/net/tun device file and related ioctls (TUNSETIFF, etc.).

+--------------------------------------------+
| Processes                                  |
+--------------------------------------------+
  ↕ Socket interface
+---------------------------------------------+
| Network Stack (kernel)                      |<--+
+---------------------------------------------+   |
  ↕ Eth. frame     ↕ Eth. frame    ↕ IP packet    |
+--------------+ +-------------+ +------------+   |
| enp2s0       | | tap0        | | tun0       |   |
+--------------+ +-------------+ +------------+   |
  ↑ Eth. frame     ↕ Eth. frame¹   ↕ IP packet¹   |
+--------------+ +-------------+ +------------+   |
| Driver       | | Process     | | Process    |   |
+--------------+ +-------------+ +------------+   |
  ↕ Eth. frame²    ↑               ↑              |
+--------------+   +---------------+--------------+
| Eth. Adapter  |  (encapsulated packets)
+--------------+
  ↕ Eth. frame
+--------------+
| Eth. Network |
+--------------+

Physical netdev Ethernet VPN IP VPN

¹: via /dev/net/tun ²: over PCI Express for example

TUN/TAP interfaces vs. normal network interfaces

Basic Usage

This C code snippet creates a virtual network interface tun0 controlled by the program:

int fd = open("/dev/net/tun", O_RDWR);
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
strncpy(ifr.ifr_name, "tun0", IFNAMSIZ);
ioctl(fd, TUNSETIFF, &ifr);

Note: I've omitted error handling for brevity in the code snippets. In real programs, you should check the result of each operation.

The program first opens the /dev/net/tun file: each file descriptor obtained this way can be used by the process to communicate with one virtual network device.

Then, we use the TUNSETIFF ioctl() in order to configure the network interface associated with this file descriptor:

After this, the program can communicate with the virtual network interface using the file descriptor:

When the the program closes the last file descriptor associated with the TUN/TAP interface, the systems destroys the interface. We can prevent the system from destroying the interface by making it made persistent with the TUNSETPERSIST ioctl().

TUN vs. TAP

There are two types of virtual network interfaces managed by /dev/net/tun:

TUN interfaces (L3)

TUN interfaces (IFF_TUN) transport layer 3 (L3) Protocol Data Units (PDUs):

If we have set IFF_NO_PI in the ifr_flags field, the version of the IP protocol (IPv4 or IPv6) is deduced from the IP version number in the packet. If IFF_NO_PI is not set, 4 bytes (struct tun_pi) are prepended to each packet (see below for details).

TAP interfaces (L2)

TAP interfaces (IFF_TUN) transport layer 2 (L2) PDUs:

By default, the TAP interface is an Ethernet interfaces but we ca n change the type of device with the TUNSETLINK ioctl(). For example, for we can request a virtual ATM network interface with:

ioctl(fd, TUNSETLINK, ARPHRD_ATM);

which gives:

5: tap0:  mtu 1500 qdisc noop state DOWN group default qlen 500
    link/atm 12:07:d9:23:19:7d brd ff:ff:ff:ff:ff:ff

I am not sure sure this is really useful however as it seems it only really works with Ethernet interfaces.

We can set the hardware address (for Ethernet interfaces, the MAC address) with SIOCSIFHWADDR (see man netdevice):

struct ifreq ifr;
ifr.ifr_hwaddr.sa_family = ARPHRD_ETHER;
// We need a placeholder socket for this:
inc sock = socket(PF_INET, SOCK_STREAM, 0);
strcpy(&ifr.ifr_name, iface_name);
memcpy(&ifr.ifr_hwaddr.sa_data, hwaddr, hwaddr_size);
ioctl(skfd, SIOCSIFHWADDR, &ifr);

Advanced considerations

Packet Information

If IFF_NO_PI is not used, each packet read to or written from the file descriptor is prepended with 4 bytes:

struct tun_pi {
  __u16  flags;
  __be16 proto;
};

The proto field is an EtherType. This it not very useful for TAP interface as far as I known. For TUN interfaces, this is usually ETH_P_IP or ETH_P_IPV6. If IFF_NO_PI is used, the IP version of packets sent by the process are derived from the first byte of the packet (for TUN interfaces).

Currently, the only flag defined is TUN_PKT_STRIP which is set by the kernel to signal the userspace program that the packet was truncated because the buffer was too small.

Persistent

The TUNSETPERSIST ioctl can be used to make the TUN/TAP interface persistent. In this mode, the interface won't be destroyed when the last process closes the associated /dev/net/tun file descriptor.

ioctl(tap_fd, TUNSETPERSIST, 1);

There is a race condition when setting TUNSETPERSIST. Instead, the IFF_TUN_EXCL can be used to ensure we have a new interface. If this flag is used and the interface already exists, we get a EBUSY error.

It is possible to create a persistent TUN/TAP interface using the ip tuntap command:

sudo ip tuntap add dev foo0 mode tun

User and group

It is possible to assign a persistent interface to a given user in order to five a non-root user access to a TUN/TAP interface:

ioctl(tap_fd, TUNSETPERSIST, 1);
ioctl(tap_fd, TUNSETOWNER, owner);

Or to a whole groupe:

ioctl(tap_fd, TUNSETPERSIST, 1);
ioctl(tap_fd, TUNSETGROUP, group);

We can use ip tuntap for this as well:

sudo ip tuntap add dev foo0 mode tun user john
sudo ip tuntap add dev foo1 mode tun group doe

Examples

TUN Reader

This simple program creates a TUN interface. Every packet sent to this interface is printed to the standard output (stderr) after parsing some fields:

Includes
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <netdb.h>

#include <netinet/in.h> // IPPROTO_*
#include <net/if.h> // ifreq
#include <linux/if_tun.h> // IFF_TUN, IFF_NO_PI

#include <sys/ioctl.h>
Utilities
#define BUFFLEN (4 * 1024)

const char HEX[] = {
  '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
  'a', 'b', 'c', 'd', 'e', 'f',
};

void hex(char* source, char* dest, ssize_t count)
{
  for (ssize_t i = 0; i < count; ++i) {
    unsigned char data = source[i];
    dest[2 * i] = HEX[data >> 4];
    dest[2 * i + 1] = HEX[data & 15];
  }
  dest[2 * count] = '\0';
}
Dumping packets
int has_ports(int protocol)
{
  switch(protocol) {
  case IPPROTO_UDP:
  case IPPROTO_UDPLITE:
  case IPPROTO_TCP:
    return 1;
  default:
    return 0;
  }
}

void dump_ports(int protocol, int count, const char* buffer)
{
  if (!has_ports(protocol))
    return;
  if (count < 4)
    return;
  uint16_t source_port;
  uint16_t dest_port;
  memcpy(&source_port, buffer, 2);
  source_port = htons(source_port);
  memcpy(&dest_port, buffer + 2, 2);
  dest_port = htons(dest_port);
  fprintf(stderr, " sport=%u, dport=%d\n", (unsigned) source_port, (unsigned) dest_port);
}

void dump_packet_ipv4(int count, char* buffer)
{
  if (count < 20) {
    fprintf(stderr, "IPv4 packet too short\n");
    return;
  }

  char buffer2[2*BUFFLEN + 1];
  hex(buffer, buffer2, count);

  int protocol = (unsigned char) buffer[9];
  struct protoent* protocol_entry = getprotobynumber(protocol);

  unsigned ttl = (unsigned char) buffer[8];

  fprintf(stderr, "IPv4: src=%u.%u.%u.%u dst=%u.%u.%u.%u proto=%u(%s) ttl=%u\n",
    (unsigned char) buffer[12], (unsigned char) buffer[13], (unsigned char) buffer[14], (unsigned char) buffer[15],
    (unsigned char) buffer[16], (unsigned char) buffer[17], (unsigned char) buffer[18], (unsigned char) buffer[19],
    (unsigned) protocol,
    protocol_entry == NULL ? "?" : protocol_entry->p_name, ttl
    );
  dump_ports(protocol, count - 20, buffer + 20);
  fprintf(stderr, " HEX: %s\n", buffer2);
}

void dump_packet_ipv6(int count, char* buffer)
{
  if (count < 40) {
    fprintf(stderr, "IPv6 packet too short\n");
    return;
  }

  char buffer2[2*BUFFLEN + 1];
  hex(buffer, buffer2, count);

  int protocol = (unsigned char) buffer[6];
  struct protoent* protocol_entry = getprotobynumber(protocol);

  char source_address[33];
  char destination_address[33];

  hex(buffer + 8, source_address, 16);
  hex(buffer + 24, destination_address, 16);

  int hop_limit = (unsigned char) buffer[7];

  fprintf(stderr, "IPv6: src=%s dst=%s proto=%u(%s) hop_limit=%i\n",
    source_address, destination_address,
    (unsigned) protocol,
    protocol_entry == NULL ? "?" : protocol_entry->p_name,
    hop_limit);
  dump_ports(protocol, count - 40, buffer + 40);
  fprintf(stderr, " HEX: %s\n", buffer2);
}

void dump_packet(int count, char* buffer)
{
  unsigned char version = ((unsigned char) buffer[0]) >> 4;
  if (version == 4) {
    dump_packet_ipv4(count, buffer);
  } else if (version == 6) {
    dump_packet_ipv6(count, buffer); 
  } else {
    fprintf(stderr, "Unknown packet version\n");
  }
}
int main(int argc, char** argv)
{
  if (argc != 2)
    return 1;
  const char* device_name = argv[1];
  if (strlen(device_name) + 1 > IFNAMSIZ)
    return 1;

  // Request a TUN device:
  int fd = open("/dev/net/tun", O_RDWR);
  if (fd == -1)
    return 1;
  struct ifreq ifr;
  memset(&ifr, 0, sizeof(ifr));
  ifr.ifr_flags = IFF_TUN | IFF_NO_PI;
  strncpy(ifr.ifr_name, device_name, IFNAMSIZ);
  int res = ioctl(fd, TUNSETIFF, &ifr);
  if (res == -1)
    return 1;

  char buffer[BUFFLEN];
  while (1) {
    // Read an IP packet:
    ssize_t count = read(fd, buffer, BUFFLEN);
    if (count < 0)
      return 1;
    dump_packet(count, buffer);
  }

  return 0;
}

Notice how read() on the file descriptor is used by the userspace program to get a packet sent by the network stack to the netork interface. If we wanted packets to come out the interface, we would have to write() it to the file descriptor.

The program needs to be run as root:

sudo ./tun_reader tun0

The interface is initially down and without a IPv4 address:

8: tun0: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 500
    link/none

Now we need to configure an IP address and bring the interface up before we can use it:

sudo ip addr add 203.0.113.1/24 dev tun0 &&
sudo ip link set up dev tun0
8: tun0:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 500
    link/none 
    inet 203.0.113.1/24 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::807f:6e64:ba6e:ee26/64 scope link stable-privacy 
       valid_lft forever preferred_lft forever

Alternatively, we could have configured the interface from the program:

We get this kind of output:

IPv6: src=fe80000000000000807f6e64ba6eee26 dst=ff020000000000000000000000000002 proto=58(ipv6-icmp) hop_limit=255
 HEX: 6000000000083afffe80000000000000807f6e64ba6eee26ff0200000000000000000000000000028500e5bd00000000
IPv6: src=fe80000000000000807f6e64ba6eee26 dst=ff020000000000000000000000000016 proto=0(ip) hop_limit=1
 HEX: 6000000000240001fe80000000000000807f6e64ba6eee26ff0200000000000000000000000000163a000502000001008f00d88d0000000104000000ff020000000000000000000000010003
IPv4: src=203.0.113.1 dst=224.0.0.22 proto=2(igmp) ttl=1
 HEX: 46c00028000040000102c7f7cb007101e0000016940400002200f9010000000104000000e00000fc
IPv4: src=203.0.113.1 dst=224.0.0.252 proto=17(udp) ttl=255
 sport=5355, dport=5355
 HEX: 45000034e8e40000ff11b5d5cb007101e00000fc14eb14eb00205ec50cca00000001000000000000066d617276696e0000ff0001
IPv6: src=fe80000000000000807f6e64ba6eee26 dst=ff020000000000000000000000010003 proto=17(udp) hop_limit=255
 sport=5355, dport=5355
 HEX: 600516ee001f11fffe80000000000000807f6e64ba6eee26ff02000000000000000000000001000314eb14eb001f75abedf000000001000000000000056c6f63616c0000060001
IPv6: src=fe80000000000000807f6e64ba6eee26 dst=ff020000000000000000000000010003 proto=17(udp) hop_limit=255
 sport=5355, dport=5355
 HEX: 600516ee002011fffe80000000000000807f6e64ba6eee26ff02000000000000000000000001000314eb14eb00204b0fa87d00000001000000000000066d617276696e0000ff0001
IPv4: src=203.0.113.1 dst=224.0.0.252 proto=17(udp) ttl=255
 sport=5355, dport=5355
 HEX: 45000033e8e50000ff11b5d5cb007101e00000fc14eb14eb001f1237c96700000001000000000000056c6f63616c0000060001

We easily recognize from the hex-dump that there are IP packet: they start with the IP version (0x6 or 0x4).

Dissecting with ScaPy

For more details, we can dissect IP packets from Python using ScaPy:

from scapy.all import *
IP(bytes.fromhex("45000033e8e50000ff11b5d5cb007101e00000fc14eb14eb001f1237c96700000001000000000000056c6f63616c0000060001")).display()
###[ IP ]### 
  version   = 4
  ihl       = 5
  tos       = 0x0
  len       = 51
  id        = 59621
  flags     = 
  frag      = 0
  ttl       = 255
  proto     = udp
  chksum    = 0xb5d5
  src       = 203.0.113.1
  dst       = 224.0.0.252
  \options   \
###[ UDP ]### 
     sport     = hostmon
     dport     = hostmon
     len       = 31
     chksum    = 0x1237
###[ Link Local Multicast Node Resolution - Query ]### 
        id        = 51559
        qr        = 0
        opcode    = QUERY
        c         = 0
        tc        = 0
        z         = 0
        rcode     = ok
        qdcount   = 1
        ancount   = 0
        nscount   = 0
        arcount   = 0
        \qd        \
         |###[ DNS Question Record ]### 
         |  qname     = 'local.'
         |  qtype     = SOA
         |  qclass    = IN
        an        = None
        ns        = None
        ar        = None

Now we can try to ping an addres on the link:

ping 203.0.113.2

Which gives the packets:

IPv4: src=203.0.113.1 dst=203.0.113.2 proto=icmp ttl=64
 HEX: 45000054ceab40004001f3f8cb007101cb00710208008d0b23420001f6ab9e5e00000000edd3060000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
IPv4: src=203.0.113.1 dst=203.0.113.2 proto=icmp ttl=64
 HEX: 45000054cf8340004001f320cb007101cb00710208007f9823420002f7ab9e5e00000000f945070000000000101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f3031323334353637
Dissecting with ScaPy

The first packet is dissected by ScaPy as:

###[ IP ]### 
  version   = 4
  ihl       = 5
  tos       = 0x0
  len       = 84
  id        = 52907
  flags     = DF
  frag      = 0
  ttl       = 64
  proto     = icmp
  chksum    = 0xf3f8
  src       = 203.0.113.1
  dst       = 203.0.113.2
  \options   \
###[ ICMP ]### 
     type      = echo-request
     code      = 0
     chksum    = 0x8d0b
     id        = 0x2342
     seq       = 0x1
###[ Raw ]### 
        load      = '\xf6\xab\x9e^\x00\x00\x00\x00\xed\xd3\x06\x00\x00\x00\x00\x00\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./01234567'

TAP Reader

Here is a simple program which create a TAP interface and dumps the Ethernet frames it receives from the kernel in hexadecimal form:

Includes
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <netdb.h>

#include <net/if.h> // ifreq
#include <linux/if_tun.h> // IFF_TUN, IFF_NO_PI
#include <linux/if_arp.h>

#include <sys/ioctl.h>
Utilities
#define BUFFLEN (4 * 1024)

const char HEX[] = {
  '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
  'a', 'b', 'c', 'd', 'e', 'f',
};

void hex(char* source, char* dest, ssize_t count)
{
  for (ssize_t i = 0; i < count; ++i) {
    unsigned char data = source[i];
    dest[2 * i] = HEX[data >> 4];
    dest[2 * i + 1] = HEX[data & 15];
  }
  dest[2 * count] = '\0';
}
int main(int argc, char** argv)
{
  if (argc != 2)
    return 1;
  const char* device_name = argv[1];
  if (strlen(device_name) + 1 > IFNAMSIZ)
    return 1;

  // Request a TAP device:
  int fd = open("/dev/net/tun", O_RDWR);
  if (fd == -1)
    return 1;
  struct ifreq ifr;
  memset(&ifr, 0, sizeof(ifr));
  ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
  strncpy(ifr.ifr_name, device_name, IFNAMSIZ);
  int res = ioctl(fd, TUNSETIFF, &ifr);
  if (res == -1)
    return 1;

  char buffer[BUFFLEN];
  char buffer2[2*BUFFLEN + 1];
  while (1) {

    // Read a frame:
    ssize_t count = read(fd, buffer, BUFFLEN);
    if (count < 0)
      return 1;

    // Dump frame:
    hex(buffer, buffer2, count);
    fprintf(stderr, "%s\n", buffer2);
  }

  return 0;
}

Let's run it:

sudo ./rap_reader tap0

We now have a TAP interface:

9: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7a:40:61:dc:b9:5b brd ff:ff:ff:ff:ff:ff

We can notice several differences compared to the TUN example:

Let's bring the interface up:

sudo ip addr add 203.0.113.1/24 dev tap0
sudo ip link set up dev tap0
9: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 7a:40:61:dc:b9:5b brd ff:ff:ff:ff:ff:ff
    inet 203.0.113.1/24 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fe80::7840:61ff:fedc:b95b/64 scope link tentative 
       valid_lft forever preferred_lft forever

Output:

01005e0000167a4061dcb95b080046c00030000040000102c7efcb007101e000001694040000220014050000000204000000e00000fc04000000e00000fb
3333000000167a4061dcb95b86dd600000000024000100000000000000000000000000000000ff0200000000000000000000000000163a000502000001008f00b5520000000104000000ff0200000000000000000001ffdcb95b
01005e0000fb7a4061dcb95b08004500004996344000ff11c871cb007101e00000fb14e914e90035b739000000000002000000000000055f69707073045f746370056c6f63616c00000c0001045f697070c012000c0001
01005e0000fb7a4061dcb95b08004500007696354000ff11c843cb007101e00000fb14e914e90062336400000000000200000002000001310331313301300332303307696e2d6164647204617270610000ff0001066d617276696e056c6f63616c0000ff0001c02a00010001000000780004cb007101c00c000c0001000000780002c02a

We have now Ethernet frames instead of raw IP packets coming in and out of the virtual interface:

Disecting with ScaPy

We can disect Ethernet frames with ScaPy:

from scapy.all import *
Ether(bytes.fromhex("01005e0000167a4061dcb95b080046c00030000040000102c7efcb007101e000001694040000220014050000000204000000e00000fc04000000e00000fb")).displa
###[ Ethernet ]### 
  dst       = 01:00:5e:00:00:16
  src       = 7a:40:61:dc:b9:5b
  type      = IPv4
###[ IP ]### 
     version   = 4
     ihl       = 6
     tos       = 0xc0
     len       = 48
     id        = 0
     flags     = DF
     frag      = 0
     ttl       = 1
     proto     = igmp
     chksum    = 0xc7ef
     src       = 203.0.113.1
     dst       = 224.0.0.22
     \options   \
      |###[ IP Option Router Alert ]### 
      |  copy_flag = 1
      |  optclass  = control
      |  option    = router_alert
      |  length    = 4
      |  alert     = router_shall_examine_packet
###[ Raw ]### 
        load      = '"\x00\x14\x05\x00\x00\x00\x02\x04\x00\x00\x00\xe0\x00\x00\xfc\x04\x00\x00\x00\xe0\x00\x00\xfb'

Now we can try to ping an address on the link:

ping 203.0.113.2

Which gives:

ffffffffffff7a4061dcb95b080600010800060400017a4061dcb95bcb007101000000000000cb007102
ffffffffffff7a4061dcb95b080600010800060400017a4061dcb95bcb007101000000000000cb007102
ffffffffffff7a4061dcb95b080600010800060400017a4061dcb95bcb007101000000000000cb007102
Disecting with ScaPy

The frames are decoded by ScaPy as:

###[ Ethernet ]### 
  dst       = ff:ff:ff:ff:ff:ff
  src       = 7a:40:61:dc:b9:5b
  type      = ARP
###[ ARP ]### 
     hwtype    = 0x1
     ptype     = IPv4
     hwlen     = 6
     plen      = 4
     op        = who-has
     hwsrc     = 7a:40:61:dc:b9:5b
     psrc      = 203.0.113.1
     hwdst     = 00:00:00:00:00:00
     pdst      = 203.0.113.2

References


  1. However, not all VPN software are based on TUN/TAP. For example, WireGuard is implemented in kernel and has its dedicated network interface type. Many tunnels based on PPP use pppd which use pppX network interfaces. These are managed through /dev/ppp and related ioctls). ↩︎

  2. When using an Ethernet or Wifi interface, several different machines can possibly be reached directly through this interface: as a consequence, L2 addressing is needed which specify to which host on the LAN we are sending a given frame. In contrast, only a single machine is directly reachable through POINTOPOINT network interfaces and there is thus no L2 addressing. TUN interfaces only transport L3 (IP) packets: they do not have L2 addressing and are thus POINTOPOINT interfaces. ↩︎