/dev/posts/

SimGridMC: The Big Split (and Cleanup)

Published:

Updated:

In my previous SimGrid post, I talked about different solutions for a better isolation between the model-checked application and the model-checker. We chose to avoid the (hackery) solution based multiple dynamic-linker namespaces in the same process and use a more conventional process-based isolation.

Table of content

Motivation

In the previous version of the SimGridMC, the model-checker was running in the same process as the main SimGrid application. We had in the same process:

Multiple heaps

In order to do this, the SimGridMC process was using two different malloc()-heaps in the same process in order to separate:

  1. the state of the simulated application (processes states and global state);

  2. the state of the model-checker.

The model-checker code had a lot of code to select which heap had to be active (and used by malloc() and friends) at a given point of the code.

This is an example of a function with a lot of heap management calls (the lines managing the heap swapping are commented with <*>):

void MC_pre_modelcheck_safety()
{

  int mc_mem_set = (mmalloc_get_current_heap() == mc_heap);  // <*>

  mc_state_t initial_state = NULL;
  smx_process_t process;

  /* Create the initial state and push it into the exploration stack */
  if (!mc_mem_set)                                           // <*>
    MC_SET_MC_HEAP;                                          // <*>

  if (_sg_mc_visited > 0)
    visited_states = xbt_dynar_new(sizeof(mc_visited_state_t),
      visited_state_free_voidp);

  initial_state = MC_state_new();

  MC_SET_STD_HEAP;                                           // <*>

  /* Wait for requests (schedules processes) */
  MC_wait_for_requests();

  MC_SET_MC_HEAP;                                            // <*>

  /* Get an enabled process and insert it in the interleave set
     of the initial state */
  xbt_swag_foreach(process, simix_global->process_list) {
    if (MC_process_is_enabled(process)) {
      MC_state_interleave_process(initial_state, process);
      if (mc_reduce_kind != e_mc_reduce_none)
        break;
    }
  }

  xbt_fifo_unshift(mc_stack, initial_state);

  if (!mc_mem_set)                                           // <*>
    MC_SET_STD_HEAP;                                         // <*>
}

The heap management code was cumbersome and difficult to maintain: it was necessary to known which function had to be called in each context, which function was selecting the correct heap and select the current heap accordingly. It was moreover necessary to known which data was allocated in which heap. Failing to use the correct heap could lead to errors such as:

Goals and solutions

While this design was interesting for the performance of the model-checker, it was quite difficult to maintain and understand. We wanted to create a new version of the model-checker which would be simpler to understand and maintain:

In order to avoid the coexistence of the two heaps we envisioned two possible solutions:

While the dynamic-linker based solution is quite interesting and would provide better performance by avoiding context switches (and who doesn't want to write their own dynamic linker?), it would probably be difficult to achieve and would probably not make the code easier to understand.

We chose to use the much more standard solution of using different processes which is conceptually much simpler and provides a better isolation between the model-checker and the model-checked application. With this design, the model-checker is a quite standard process: all debugging tools can be used without any problem (Valgrind, GDB) on the model-checker process. The model-checked process is not completely standard as we are constantly overwriting its state but we can still ptrace it and use a debugger.

Update (2016-04-01): the model-checker now ptraces the model-checked application (for various reasons) and it is not possible to debug the model-checked application anymore. However, we have a feature to replay an execution of the model-checked application outside of the model-checker.

Splitting the model-checker and the simulator

In this new design, the model-checker process behaves somehow like a debugger for the simulated (model-checked) application by monitoring and controlling its execution. The model-checker process is responsible for:

The simulated application is responsible for:

Two mechanisms are used to implement the interaction between the model-checker process and the model-checked application:

Since Linux 3.2, it is possible to read from and write to another process virtual memory without ptrace()-ing it: I took care not to use ptrace() in order to be able to use it from another purpose (a process can only be ptraced by a single process at a time):

The split has been done in two phases:

  1. In the first phase, the split process mode was implemented but the single-process mode was still still present and enabled by default. This allowed to detect regressions with the single-process mode and compare both modes of operations. The resulting code was quite ugly because it had to handle both modes of operations.
  2. When the split process mode was complete and working correctly, the single-process mode was removed and a lot of cleanup could be done.

Explicit communications

The model-checker process and the model-checked process application communicate with each other over a UNIX datagram socket. This socket is created by the model-checker and passed to the child model-checked process.

This is used in the initialisation:

This is used in runtime to control the execution of the model-checked application:

The (simplified) client-loop looks like this:

void MC_client_main_loop(void)
{
  while (1) {
    message_type message;
    size_t = receive_message(&message);
    switch(message.type()) {

    // Executes a simcall:
    case MC_MESSAGE_SIMCALL_HANDLE:
      execute_transition(message.transition());
      send_message(MC_MESSAGE_WAITING);
      break;

    // Execute application code until a visible simcall is reached:
    case MC_MESSAGE_CONTINUE:
      execute_application_code();
      send_message(MC_MESSAGE_WAITING);
      break;

    // [...] (Other messages here)
    }    
  }
}

Each model-checking algorithm (safety, liveness, communication determinism) is implemented as model-checker side code which triggers execution of model-checked-side transitions with:

// Execute a simcall (MC_MESSAGE_SIMCALL_HANDLE):
MC_simcall_handle(req, value);

// Execute simulated application code (MC_MESSAGE_CONTINUE):
MC_wait_for_requests();

The communication determinism algorithm needs to see the result of some simcalls before triggering the application code:

MC_simcall_handle(req, value);
MC_handle_comm_pattern(call, req, value, communication_pattern, 0);
MC_wait_for_requests();

Snapshot/restore

Snapshot and restoration is handled by reading/writing the model-checked process memory with /proc/$pid/memory. During this operation, the model-checked process is waiting for messages on a special stack dedicated to the simulator (which is not managed by the snapshotting logic). During this time, the model-checked application is not supposed to be accessing the simulated application memory. When this is finished, the model-checker wakes up the simulated application with the MC_MESSAGE_SIMCALL_HANDLE and MC_MESSAGE_CONTINUE.

Peeking at the state of the model-checked application

The model-checker needs to read some of the state of the simulator (state of the communications, name of the processes and so on). Currently this is handled quite brutally by reading the data directly in the structures of the model-checked process (following linked-list items, arrays elements, etc. from the remote process):

// Read the hostname from the MCed process:
process->read_bytes(&host_copy, sizeof(host_copy), remote(p->host));
int len = host_copy.key_len + 1;
char hostname[len];
process->read_bytes(hostname, len, remote(host_copy.key));
info->hostname = mc_model_checker->get_host_name(hostname);

This is quite ugly and should probably be replaced by some more structured way to share this information in the future.

Impact on the user interface

We now have a simgrid-mc executable for the model-checker process. It must be called explicitly by the user in order to use the model-checker (similarly to gdb or other debugging tools):

# Running the raw application:
./bugged1

# Running the application in GDB:
gdb --args ./bugged1

# Running the application in valgrind:
valgrind ./bugged1

# Running the application in SimgridMC:
simgrid-mc ./bugged1

For SMPI applications, the --wrapper argument of smpirun must be used:

# Running the raw application:
smpirun \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in GDB:
smpirun -wrapper "gdb --args" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in valgrind:
smpirun -wrapper "valgrind" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

# Running the application in SimgridMC:
smpirun -wrapper "simgrid-mc" \
  -hostfile hostfile -platform platform.xml \
  --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI \
  --cfg=network/TCP_gamma:4194304 \
  -np 4 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich \
  --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:4 \
  ./dup

Under the hood, simgrid-mc sets a a few environment variable for its child process:

Cleanup

After implementing the separate mode, the single process mode has been removed in order to have a cleaner code. In order to have the two mode of operations coexist, many functions were checking the mode operation and the behaviour was changing depending on the mode. Most of this code has been removed and is now much simpler.

The code managing the two heaps is now useless and has been completely removed. We are still using our custom heap implementation in the model-checked application however: we are using its internal representation to track the different allocations in the heap; it is used as well in order to clear the bytes of an allocation before giving it to the application. The model-checked application however is a quite standard application and uses the standard system heap implementation (or could use another implementation) which is expected to have better performance than our implementation.

Currently, it is not quite clear which part of the API are intended to be used by the model-checked process, which part are to be used by the model-checker process and which parts can be used by both parts. Some effort has been used to separate the different parts of the API (by moving them in different header files) but this is is still an ongoing process. In the future, we might want to have a better organisation using different header files, namespaces and possibly different shared-objects for the different parts of the API.

A longer term goal, would be to have a nice API for the model-checker which could easily be used by the users to write their own model-checker algorithms (in their own executables). We might even want to export a Lua based binding to write the model-checker algorithms.

Conversion to C++

In parallel, the model-checker code has been ported to C++ and a part of it has been rewritten in a more idiomatic C++:

All the MC code has been converted to C++ but the conversion to idiomatic C++ is still ongoing: some parts of the code are still using C idioms.

Performance

This first version is quite slower than the previous one. It was expected that the new implementation would be slower than the previous one because it uses cross-process communications and the old version had been heavily optimised. However this might be optimised in the future in order to minimise the overhead of cross process synchronisations.

Conclusion

This is a first step towards a cleaner and simpler SimGridMC. The heap juggling code has been removed. Instead however, we have some code which reads directly in the data structures of the other process: this code is not so nice and not so maintainable and we will probably want to find a better way to do this.

Some things still need to be done: