SimGridMC: The Big Split (and Cleanup)

Published: Sep 1 2015

Updated: Sep 1 2015

In my previous SimGrid post, I talked about different solutions for a better isolation between the model-checked application and the model-checker. We chose to avoid the (hackery) solution based multiple dynamic-linker namespaces in the same process and use a more conventional process-based isolation.

Table of content

Table of content
Motivation
- Multiple heaps
- Goals and solutions
Splitting the model-checker and the simulator
Cleanup
Conversion to C++
Performance
Conclusion

Motivation

In the previous version of the SimGridMC, the model-checker was running in the same process as the main SimGrid application. We had in the same process:

all the simulated processes (containing the local state of each process);
the SimGrid simulator (containing the shared/global state such as the state of the communications);
the model-checker (containing the state of the exploration in the execution graph of the simulated application) which had to checkpoint/restore the state of the other components (but not its own state).

Multiple heaps

In order to do this, the SimGridMC process was using two different malloc()-heaps in the same process in order to separate:

the state of the simulated application (processes states and global state);
the state of the model-checker.

The model-checker code had a lot of code to select which heap had to be active (and used by malloc() and friends) at a given point of the code.

This is an example of a function with a lot of heap management calls (the lines managing the heap swapping are commented with <*>):

void MC_pre_modelcheck_safety()
{

  int mc_mem_set = (mmalloc_get_current_heap() == mc_heap);  // <*>

  mc_state_t initial_state = NULL;
  smx_process_t process;

  /* Create the initial state and push it into the exploration stack */
  if (!mc_mem_set)                                           // <*>
    MC_SET_MC_HEAP;                                          // <*>

  if (_sg_mc_visited > 0)
    visited_states = xbt_dynar_new(sizeof(mc_visited_state_t),
      visited_state_free_voidp);

  initial_state = MC_state_new();

  MC_SET_STD_HEAP;                                           // <*>

  /* Wait for requests (schedules processes) */
  MC_wait_for_requests();

  MC_SET_MC_HEAP;                                            // <*>

  /* Get an enabled process and insert it in the interleave set
     of the initial state */
  xbt_swag_foreach(process, simix_global->process_list) {
    if (MC_process_is_enabled(process)) {
      MC_state_interleave_process(initial_state, process);
      if (mc_reduce_kind != e_mc_reduce_none)
        break;
    }
  }

  xbt_fifo_unshift(mc_stack, initial_state);

  if (!mc_mem_set)                                           // <*>
    MC_SET_STD_HEAP;                                         // <*>
}

The heap management code was cumbersome and difficult to maintain: it was necessary to known which function had to be called in each context, which function was selecting the correct heap and select the current heap accordingly. It was moreover necessary to known which data was allocated in which heap. Failing to use the correct heap could lead to errors such as:

error in free() because the memory was not malloc()-ed with the current heap;
some model-checked application data failing to be snapshoted and restored correctly because it was in the model-checker heap;
some model-checker data changing on snapshot restores because it was in the model-checked heap;
the model-checker failing to find state equality because some of its data was included in the application state (because it was allocated in the application heap).

Goals and solutions

While this design was interesting for the performance of the model-checker, it was quite difficult to maintain and understand. We wanted to create a new version of the model-checker which would be simpler to understand and maintain:

the coexistence of two heaps was making it more difficult to maintain the code;
the coexistence of two heaps and the snapshotting/restoration logic was making it impossible or difficult to use debugging tools such as Valgrind and even GDB.

In order to avoid the coexistence of the two heaps we envisioned two possible solutions:

process-based isolation by moving the model-checker in its own process;
keeping them in the same process but using separate symbol namespaces at the dynamic linker level.

While the dynamic-linker based solution is quite interesting and would provide better performance by avoiding context switches (and who doesn't want to write their own dynamic linker?), it would probably be difficult to achieve and would probably not make the code easier to understand.

We chose to use the much more standard solution of using different processes which is conceptually much simpler and provides a better isolation between the model-checker and the model-checked application. With this design, the model-checker is a quite standard process: all debugging tools can be used without any problem (Valgrind, GDB) on the model-checker process. The model-checked process is not completely standard as we are constantly overwriting its state but we can still ptrace it and use a debugger.

Update (2016-04-01): the model-checker now ptraces the model-checked application (for various reasons) and it is not possible to debug the model-checked application anymore. However, we have a feature to replay an execution of the model-checked application outside of the model-checker.

Splitting the model-checker and the simulator

In this new design, the model-checker process behaves somehow like a debugger for the simulated (model-checked) application by monitoring and controlling its execution. The model-checker process is responsible for:

snapshotting and restoring the SimGrid process (the state of simulated application);
telling the simulated application which transition should be executed in the next iteration (this is similar to the stepping functionality of a debugger);
maintain the state of the MC (snapshots, execution graph);
comparing states.

The simulated application is responsible for:

maintaining its state;
executing the simulation logic (handling the simulation calls);
executing the simulated application logic (the processes);
reporting events of interest (nodes in the execution graph) to the model-checker and waiting for further instructions.

Two mechanisms are used to implement the interaction between the model-checker process and the model-checked application:

communication between the two processes using a UNIX socket;
direct access by the model-checker process to the model-checked process memory using /proc/$pid/mem (this is used for snapshot/restore and in order to look at the state of the model-checked application).

Since Linux 3.2, it is possible to read from and write to another process virtual memory without ptrace()-ing it: I took care not to use ptrace() in order to be able to use it from another purpose (a process can only be ptraced by a single process at a time):

The user can use a debugger on the model-checked application. This is quite useful for the MC code maintainer and might be useful as well for the user in order to understand how the application behaves.
In the future, we might use the model-checker with simterpose.

The split has been done in two phases:

In the first phase, the split process mode was implemented but the single-process mode was still still present and enabled by default. This allowed to detect regressions with the single-process mode and compare both modes of operations. The resulting code was quite ugly because it had to handle both modes of operations.
When the split process mode was complete and working correctly, the single-process mode was removed and a lot of cleanup could be done.

Explicit communications

The model-checker process and the model-checked process application communicate with each other over a UNIX datagram socket. This socket is created by the model-checker and passed to the child model-checked process.

This is used in the initialisation:

to tell the model-checker to ignore a part of the heap of the model-checked memory;
to tell the model-checker where the stacks are;
to define the location of a variable in the model-checked address-space which will be used in liveness properties (checked in the model-checker process);

This is used in runtime to control the execution of the model-checked application:

by telling the model-checker that the model-checked has finished executing code and is waiting for the model-checker to tell him to continue;
by telling the model-checked application that it should continue by executing a given transition;
by telling the model-checker that an assertion (a safety property) has failed.

The (simplified) client-loop looks like this:

void MC_client_main_loop(void)
{
  while (1) {
    message_type message;
    size_t = receive_message(&message);
    switch(message.type()) {

    // Executes a simcall:
    case MC_MESSAGE_SIMCALL_HANDLE:
      execute_transition(message.transition());
      send_message(MC_MESSAGE_WAITING);
      break;

    // Execute application code until a visible simcall is reached:
    case MC_MESSAGE_CONTINUE:
      execute_application_code();
      send_message(MC_MESSAGE_WAITING);
      break;

    // [...] (Other messages here)
    }    
  }
}

Each model-checking algorithm (safety, liveness, communication determinism) is implemented as model-checker side code which triggers execution of model-checked-side transitions with:

// Execute a simcall (MC_MESSAGE_SIMCALL_HANDLE):
MC_simcall_handle(req, value);

// Execute simulated application code (MC_MESSAGE_CONTINUE):
MC_wait_for_requests();

The communication determinism algorithm needs to see the result of some simcalls before triggering the application code:

MC_simcall_handle(req, value);
MC_handle_comm_pattern(call, req, value, communication_pattern, 0);
MC_wait_for_requests();

Snapshot/restore

Snapshot and restoration is handled by reading/writing the model-checked process memory with /proc/$pid/memory. During this operation, the model-checked process is waiting for messages on a special stack dedicated to the simulator (which is not managed by the snapshotting logic). During this time, the model-checked application is not supposed to be accessing the simulated application memory. When this is finished, the model-checker wakes up the simulated application with the MC_MESSAGE_SIMCALL_HANDLE and MC_MESSAGE_CONTINUE.

Peeking at the state of the model-checked application

The model-checker needs to read some of the state of the simulator (state of the communications, name of the processes and so on). Currently this is handled quite brutally by reading the data directly in the structures of the model-checked process (following linked-list items, arrays elements, etc. from the remote process):

// Read the hostname from the MCed process:
process->read_bytes(&host_copy, sizeof(host_copy), remote(p->host));
int len = host_copy.key_len + 1;
char hostname[len];
process->read_bytes(hostname, len, remote(host_copy.key));
info->hostname = mc_model_checker->get_host_name(hostname);

This is quite ugly and should probably be replaced by some more structured way to share this information in the future.

Impact on the user interface

We now have a simgrid-mc executable for the model-checker process. It must be called explicitly by the user in order to use the model-checker (similarly to gdb or other debugging tools):

# Running the raw application:
./bugged1

# Running the application in GDB:
gdb --args ./bugged1

# Running the application in valgrind:
valgrind ./bugged1

# Running the application in SimgridMC:
simgrid-mc ./bugged1

For SMPI applications, the --wrapper argument of smpirun must be used:

# Running the raw application:
smpirun \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in GDB:
smpirun -wrapper "gdb --args" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in valgrind:
smpirun -wrapper "valgrind" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in SimgridMC:
smpirun -wrapper "simgrid-mc" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

Under the hood, simgrid-mc sets a few environment variables for its child process:

SIMGRID_MC in order to enable the model-checker support in the child/simulated-application process (this triggers the usage of the custom heap for example);
SIMGRID_MC_SOCKET_FD contains the number of the file descriptor used to pass the UNIX datagram socket;
LD_BIND_NOW in order to avoid lazy relocations.

Cleanup

After implementing the separate mode, the single process mode has been removed in order to have a cleaner code. In order to have the two mode of operations coexist, many functions were checking the mode operation and the behaviour was changing depending on the mode. Most of this code has been removed and is now much simpler.

The code managing the two heaps is now useless and has been completely removed. We are still using our custom heap implementation in the model-checked application however: we are using its internal representation to track the different allocations in the heap; it is used as well in order to clear the bytes of an allocation before giving it to the application. The model-checked application however is a quite standard application and uses the standard system heap implementation (or could use another implementation) which is expected to have better performance than our implementation.

Currently, it is not quite clear which part of the API are intended to be used by the model-checked process, which part are to be used by the model-checker process and which parts can be used by both parts. Some effort has been used to separate the different parts of the API (by moving them in different header files) but this is still an ongoing process. In the future, we might want to have a better organisation using different header files, namespaces and possibly different shared-objects for the different parts of the API.

A longer term goal, would be to have a nice API for the model-checker which could easily be used by the users to write their own model-checker algorithms (in their own executables). We might even want to export a Lua based binding to write the model-checker algorithms.

Conversion to C++

In parallel, the model-checker code has been ported to C++ and a part of it has been rewritten in a more idiomatic C++:

classes with methods, destructors, constructors, private fields, inheritance (some part of the code was already in object-oriented C);
template/STL-based containers instead of void*/XBT-based containers;
RAII (Resource Acquisition Is Initialization);
C++ strings;
simgrid::mc namespace.

All the MC code has been converted to C++ but the conversion to idiomatic C++ is still ongoing: some parts of the code are still using C idioms.

Performance

This first version is quite slower than the previous one. It was expected that the new implementation would be slower than the previous one because it uses cross-process communications and the old version had been heavily optimised. However this might be optimised in the future in order to minimise the overhead of cross process synchronisations.

Conclusion

This is a first step towards a cleaner and simpler SimGridMC. The heap juggling code has been removed. Instead however, we have some code which reads directly in the data structures of the other process: this code is not so nice and not so maintainable and we will probably want to find a better way to do this.

Some things still need to be done:

file descriptor snapshot/restoration (this must be implemented in the model-checked application as we cannot manipulate the file descriptors of another process);
more cleanup;
more C++-ification;
better separation of the different parts of the API;
better access to the model-checked process state (avoid remote memory access to its internal data structures);
better argument processing;
better documentation of the code.