/dev/posts/

Better isolation for SimGridMC

Published:

Updated:

In an attempt to simplify the development around the SimGrid model-checker, we were thinking about moving the model-checker out in a different process. Another different approach would be to use a dynamic-linker isolation of the different components of the process. Here is a summary of the goals, problems and design issues surrounding these topics.

Table of content

Current state

SimGrid architecture

The design if the SimGrid simulator is based on the design of a operating system.

In a typical OS, we have a kernel managing a global state and a several userspace processes running on top of the kernel. The kernel schedules the execution of the different processes (and their threads) on the available CPUs. The kernel provides an API to the processes made of several system calls.

Process Process Process Process System calls OS kernel

SimGrid simulated a distributed system: it simulates a network and let the different processes of the simulated system use this simulated network. Each simulated process runs on top of the SimGrid kernel. The SimGrid kernel schedules the execution of the different processes on the available OS threads. The SimGrid kernel provides an API to the processes made of several simulation calls.

Process Process Process Process Simulation calls SimGrid kernel

In order to reduce the cost of context switching between the different processes, in the current implementation of SimGrid all the simulated processes and the SimGrid kernel are in the same OS process: there is no MMU-enforced separation of memory between the simulated processes but they are expected to only communicate between each other using only the means provided by the SimGrid kernel (the simulation calls) and should not share mutable memory.

Process Process Process Process Simulation calls SimGrid kernel System calls OS kernel

The SimGrid kernel has a dedicated stack and each simulated process has its own stack: cooperative multitasking (fibers, ucontext) is used to switch between the different contexts (SimGrid kernel/process) and is used by the SimGrid kernel to schedule the execution of the different processes.

The same (libc) heap is shared between the SimGrid kernel and the simulated processes.

SimGridMC architecture

The SimGrid model-checker is a dynamic analysis component for SimGrid. It explores the different possible interleavings of execution of the simulated processes (depending on the execution of their transitions i.e. the different possible orderings of their communications).

In order to do this, the MC saves at each node of the graph of the possible executions the state of the system:

Those states are then used to:

In the current implementation, the model-checker lives in the same process as the main SimGrid process (the SimGrid kernel and the processes):

Process Process Process Process Simulation calls SimGrid kernel Model-checker

Multiple heaps

However, the model-checker needs to maintain its own state: the state of the model-checker must not be saved, compared and restored with the rest of the state.

In order to do this, the state of the model-checker is maintained in a second heap:

This is implemented by overriding the malloc(), free() and friends in order to support multiple heap. A global variable is used to choose the current working heap:

// Simplified code
xbt_mheap_t __mmalloc_current_heap = NULL;

void *malloc(size_t n)
{
  return mmalloc(__mmalloc_current_heap, n);
}

void free(void *ptr)
{
  return mfree(__mmalloc_current_heap, ptr);
}

Limitation of the approach

The current implementation is complicated and not easy to understand and maintain:

A first motivation for modifying the architecture of SimGridMC, is to incraase the maintainability of the SimGridMC codebase.

Another related goal is to simplify the debugging experience (of the simulated application, the SimGrid kernel and the model-checker). For example, the current version of SimGridMC does not work under valgrind. A solution which would provide a more powerful debugging experience would be a valuable tool for the SimGridMC devs but more importantly for the users of SimGridMC.

Process-based isolation

For all these reasons, we would like to move the model-checker in a separate process: a model-checker process maintains the model-checker state and control the execution of a model-checked process.

Process Process Process Process Simulation calls SimGrid kernel Model-checking interface Model-checker

Memory snapshot/restoration

The snapshoting/restoration of the model-checked process memory can be done using /proc/${pid}/mem or process_vm_readv() and process_vm_writev().

As long as the OS threads are living on stacks which are not managed by the state snapshot/restoration mechanism, they will not be affected: we must take care that the OS threads switch to unmanaged stacks when we are doing the state snapshots/restorations.

Another solution would be to use ptrace() with PTRACE_GETREGSET and PTRACE_SETREGSET in order to snapshot/restore the registers of each thread but we would like to avoid this in order to be able to use ptrace() for debugging or other purposes.

File descriptors restoration

Linux does not provide a way to change the file descriptors of another process: the restoration of the file descriptors must be done in the taret OS process and cannot be done from the model-checker process. Cooperation of the model-checked process is needed for the file descriptors restoration.

We could abuse ptrace()-based syscall rewriting techniques or some sort of parasite injection in order to achieve this.

Dynamic-linker based isolation

Another idea would be to create a custom dynamic linker with namespace support in order to be able to link multiple instances of the same library and provide isolation between different parts of the process.

This could be used to:

Prior art in DCE with dlmopen()

It turns out that DCE (Direct Code Execution) already uses a similar approach to load multiples application instances along with Linux kernel implementations (and its network stack) on top of the NS3 packet level network simulator in the same process: the applications and Linux kernel are compiled as shared objects, the latter forming a Library OS liblinux.so shared object and loaded multiple times in the same process alongside with the NS3 instance.

Among several alternative strategies, DCE uses the dlmopen() function. This is a variant of dlopen() originating from SunOS/Solaris and implemented on the GNU userland which allows to load dynamic libraries in separated namespaces:

An alternative implementation of the ld.so dynamic linker, elf-loader, is used which provides additional features:

More information about dlmopen() can be found in old version of Sun Linkers and Libraries Guide.

A custom dynamic loader/linker on top of libc

However, I was envisioning something slightly different: instead of writing a replacement of ld.so (using raw system calls), I was thinking about building the custom dynamic linker on top of libc and libdl in order to be able to use libc (malloc()), libdl and libelf instead of using the raw system calls.

Impact on debuggability

In a split process design, the model-checker could be a quite standard application avoiding weird hacks (such as introspection with /proc/self/maps and DWARF, snapshoting/restoration of the state with memcpy(), custom mmalloc() implementation with multiple heaps). Once a relevant trajectory of the model-checked application has been identified, it could be replayed outside of the model-checker and debugged in this simpler mode.

However, having a single process could lead to a better debugging experience: by being able to combines breakpoints in the model-checker, the SimGrid kernel and the simulated application with conditions spanning all those components.

At the same time, using multiple dynamic-linking namespaces could make the debugging experience more complicated. I am not sure how well it is supported by the different available debugging tools. The DCE tools seems to show that it is reasonably well supported by GDB and valgrind.

Conclusion

So we have two possible directions:

The first solution provides a better isolation of the model-checker. The second solution is closer to the current implementation and should have better performances by avoiding the context switches and IPC in favour of direct memory access and function calls. Moreover, the dynamic-linker-based isolation could be reused for other parts of the projects (such as the isolation of the simulated MPI processes).

It is not clear which solution would provide the better debugging experience for the user and which solution would be better for the maintainability of SimGridMC.

Appendix: dlmopen() quick demo

This simple program creates three new namespaces and loads libpthread in those namespaces:

#define _GNU_SOURCE
#include <dlfcn.h>

#include <unistd.h>

int main(int argc, const char** argv)
{
  size_t i;
  for (i=0; i!=3; ++i) {
    void* x = dlmopen(LM_ID_NEWLM, "libpthread.so.0", RTLD_NOW);
    if (!x)
      return 1;
  }
  while(1) sleep(200000);
  return 0;
}

We see that libpthread is loaded thrice. Each instance has its own libc instance as well (and a fourth one is loaded for the main program):

00400000-00401000 r-xp 00000000 08:06 7603474                            /home/myself/temp/a.out
00600000-00601000 rw-p 00000000 08:06 7603474                            /home/myself/temp/a.out
0173a000-0175b000 rw-p 00000000 00:00 0                                  [heap]
7fca7ac7d000-7fca7ae1c000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7ae1c000-7fca7b01c000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b01c000-7fca7b020000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b020000-7fca7b022000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b022000-7fca7b026000 rw-p 00000000 00:00 0
7fca7b026000-7fca7b03e000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b03e000-7fca7b23d000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b23d000-7fca7b23e000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b23e000-7fca7b23f000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b23f000-7fca7b243000 rw-p 00000000 00:00 0
7fca7b243000-7fca7b3e2000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b3e2000-7fca7b5e2000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b5e2000-7fca7b5e6000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b5e6000-7fca7b5e8000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b5e8000-7fca7b5ec000 rw-p 00000000 00:00 0
7fca7b5ec000-7fca7b604000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b604000-7fca7b803000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b803000-7fca7b804000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b804000-7fca7b805000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7b805000-7fca7b809000 rw-p 00000000 00:00 0
7fca7b809000-7fca7b9a8000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7b9a8000-7fca7bba8000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7bba8000-7fca7bbac000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7bbac000-7fca7bbae000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7bbae000-7fca7bbb2000 rw-p 00000000 00:00 0
7fca7bbb2000-7fca7bbca000 r-xp 00000000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7bbca000-7fca7bdc9000 ---p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7bdc9000-7fca7bdca000 r--p 00017000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7bdca000-7fca7bdcb000 rw-p 00018000 08:01 2625992                    /lib/x86_64-linux-gnu/libpthread-2.19.so
7fca7bdcb000-7fca7bdcf000 rw-p 00000000 00:00 0
7fca7bdcf000-7fca7bf6e000 r-xp 00000000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7bf6e000-7fca7c16e000 ---p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7c16e000-7fca7c172000 r--p 0019f000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7c172000-7fca7c174000 rw-p 001a3000 08:01 2626010                    /lib/x86_64-linux-gnu/libc-2.19.so
7fca7c174000-7fca7c178000 rw-p 00000000 00:00 0
7fca7c178000-7fca7c17b000 r-xp 00000000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so
7fca7c17b000-7fca7c37a000 ---p 00003000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so
7fca7c37a000-7fca7c37b000 r--p 00002000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so
7fca7c37b000-7fca7c37c000 rw-p 00003000 08:01 2626017                    /lib/x86_64-linux-gnu/libdl-2.19.so
7fca7c37c000-7fca7c39c000 r-xp 00000000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so
7fca7c568000-7fca7c56b000 rw-p 00000000 00:00 0
7fca7c59a000-7fca7c59c000 rw-p 00000000 00:00 0
7fca7c59c000-7fca7c59d000 r--p 00020000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so
7fca7c59d000-7fca7c59e000 rw-p 00021000 08:01 2625993                    /lib/x86_64-linux-gnu/ld-2.19.so
7fca7c59e000-7fca7c59f000 rw-p 00000000 00:00 0
7fffa8481000-7fffa84a2000 rw-p 00000000 00:00 0                          [stack]
7fffa85f5000-7fffa85f7000 r-xp 00000000 00:00 0                          [vdso]
7fffa85f7000-7fffa85f9000 r--p 00000000 00:00 0                          [vvar]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

The new namespaces are probably not fully functional in this state: there are probably conflicts to solve in the different instances. For example, each libc probably tries to manage the same heap with sbrk().