Results on same-page-merging snapshots
Published:
Updated:
In the previous episode, I talked about the implementation of a same-page-merging page store. On top of this, we can build same-page-merging snapshots for the SimGrid model checker.
Implementation
SimGrid agnostic layer
The next layer on top of the page store, is a generic logic for saving and restoring a contiguous area of memory pages:
/** @brief Take a per-page snapshot of a region
*
* @param data The start of the region (must be at the beginning of a page)
* @param pag_count Number of pages of the region
* @param pagemap Linux kernel pagemap values for this region (or NULL)
* @param reference_pages Snapshot page numbers of the previous mc_soft_dirty_reset() (or NULL)
* @return Snapshot page numbers of this new snapshot
*/
mc_mem_region_t region* mc_take_page_snapshot_region(
void* data, size_t page_count,
uint64_t* pagemap, size_t* reference_pages);
/** @brief Restore a snapshot of a region
*
* If possible, the restoration will be incremental
* (the modified pages will not be touched).
*
* @param start_addr Address of the first page where we have to restore the page
* @param page_count Number of pages of the region
* @param pagenos Array of page indices from the global page store
* @param pagemap Linux kernel pagemap values for this region (or NULL)
* @param reference_pages Snapshot page numbers of the previous soft_dirty_reset (or NULL)
*/
void mc_restore_page_snapshot_region(
void* start_ddr, size_t page_count,
size_t* pagenos,
uint64_t* pagemap, size_t* reference_pagenos);
/** @brief Free memory of a page store
*/
void mc_free_page_snapshot_region(
size_t* pagenos, size_t page_count);
/** @brief Reset the soft-dirty bits
*
* This is done after checkpointing and after checkpoint restoration
* (if per page checkpoiting is used) in order to know which pages were
* modified.
*
* See https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt
* */
void mc_softdirty_reset();
SimGrid snapshot layer
The next layer is SimGrid-specific and handles part of the snapshoting logic:
- resetting the soft-dirty bits by calling
mc_softdirty_reset()
when after takind snapshot or restoring a snapshot; - generating SimGrid data-structures;
- etc.
State comparison layer
The most invasive part of this modification in the SimGrid codebase is the logic to read data from the snapshots. Without this feature, a simple offset was applied to find the base of a variable in the snapshot: now, a software MMU algorithm must be done. A variable can now be split across different non-contiguous memory pages. The whole logic of reading from snapshots had to me modified to handle this.
Results
Those results were obtained with the command:
# COMMAND: sendrecv2, mprobe or sendall
# SPARSE, SOFTDIRTY: yes or no
cd teshsuite/smpi/mpich3-test/pt2pt/
export TIME="clock:%e user:%U sys:%S swapped:%W exitval:%x max:%Mk"
setarch x86_64 -R time smpirun -hostfile ../hostfile -platform $(find ../../../.. -name small_platform_with_routers.xml) --cfg=maxmin/precision:1e-9 --cfg=network/model:SMPI --cfg=network/TCP_gamma:4194304 -np 4 --cfg=model-check:1 --cfg=smpi/send_is_detached_thres:0 --cfg=smpi/coll_selector:mpich --cfg=contexts/factory:ucontext --cfg=model-check/max_depth:100000 --cfg=model-check/reduction:none --cfg=model-check/visited:100000 --cfg=contexts/stack_size:4 --cfg=model-check/sparse-checkpoint:$SPARSE --cfg=model-check/soft-dirty:$SOFTDIRTY $COMMAND
They were run on a laptop with quad-core Intel® Core™ i7-3687U CPU @ 2.10GHz with 8GiB of RAM. Note that the memory reported is the RSS and does include swapped-out memory.
sendrecv2
In this example, we observe a 80% reduction of the memory consumption for a slight slowdown. Using soft-dirty tracking does not have a positive impact on the performance: some time is gained in user land by avoiding comparing memory pages but the same amount of time is spend in kernel space tracking the soft-clean/soft-dirty pages.
Type | clock | user | system | Max. RSS (KiB) |
---|---|---|---|---|
Simple snapshot | 9.96s | 9.16s | 0.78s | 3 332 788 |
Same-page-merging snapshot w/o soft-dirty tracking | 10.02s | 9.82s | 0.19s | 540 420 |
Same-page-merging snapshot with soft-dirty tracking | 10.70s | 8.86s | 1.80s | 540 936 |
mprobe
Similar results here:
Type | clock | user | system | Max. RSS (KiB) |
---|---|---|---|---|
Simple snapshot | 13.41s | 13.00s | 0.40s | 1 692 492 |
Same-page-merging snapshot w/o soft-dirty tracking | 14.12s | 13.89s | 0.14s | 414 916 |
Same-page-merging snapshot with soft-dirty tracking | 14.44s | 13.16s | 1.25s | 415 028 |
sendflood
In this example, without the same-page-merging snapshot we hit the swap limit (the RSS does not include the swapped-out memory). In this case, using same-page-merging snapshot is faster because the process does not swap. Using soft-dirty tracking does not have a beneficial impact in this case either: a lot of a time is lost marking the pages as soft-dirty/soft-clean.
Type | clock | user | system | Max. RSS (KiB) |
---|---|---|---|---|
Simple snapshot | 73.31s | 56.34s | 5.26s | 7 213 956 |
Same-page-merging snapshot w/o soft-dirty tracking | 59.12s | 56.87s | 2.22s | 1 570 312 |
Same-page-merging snapshot with soft-dirty tracking | 82.74s | 53.71s | 29.06s | 1 609 048 |
Conclusion
This approach achieves an important reduction of the memory consumption without a significant impact on performance. With this technique we should be able to handle bigger applications problem, save more states of the application. Those tests were run on applications where a lot of pages change between snapshots. On applications where many pages are not modified, the reduction of memory consumption should be much more bigger.
Soft-dirty tracking does not seem to be very efficient in our tests. It might be useful if the applications is swapping by avoiding to swap when taking a snapshot. This feature will probably be disabled by default and might be removed in the future.
It should be possible to increase the efficiency of the method by increasing page sharing:
- by setting to 0 the bytes of the heap which are not used (for
example in
free()
); - by setting to 0 the unused part of the stacks (and using a reference to a zero page instead);
- by segregating data which do not change at the same time in different pages (in the SimGrid code);
- using compression and/or some delta encoding when reaching the limit of available RAM.
It should be possible to speed up the process by :
- by scanning the heap metadata to avoid saving pages which are known to be unused and restoring them;
- by avoiding to save the unused pages of the stacks (and using a reference to a zero page instead) and restoring them.
We used the granularity of the memory page but it is not strictly
necessary. We might use a finer granularity in order to increase the
sharing between snapshots. The granularity (the size of the chunks)
should be regular and a power of 2 (in order to be able to apply the
MMU algorithm). However, the memory overhead would be greater (index
of the