Page store for the Simgrid model checker

| 🤔 | 👍 | 👎 |

Previous episode:

Next episode:

The first (lower) layer of the per-page snapshot mechanism is a page store: its responsibility is to store immutable shareable reference-counted memory pages independently of the snapshoting logic. Snapshot management and representation, soft-dirty tracking will be handled in an higher layer.

Data structure

class s_mc_pages_store {

  typedef uint64_t hash_type;
  typedef boost::unordered_set<size_t> page_set_type;
  typedef boost::unordered_map<hash_type, page_set_type> pages_map_type;

  void* memory_;
  size_t capacity_;
  size_t top_index_;
  std::vector<uint64_t> page_counts_;
  std::vector<size_t> free_pages_;
  pages_map_type hash_index_;

  // [... Methods]

};

In this initial version, the structure of the page store is made of:

  • A pointer (memory_) to a (currently anonymous) mmap()ed memory region holding the memory pages (the address of the first page).

  • The number of pages mapped in virtual memory (capacity_). Once all those pages are used, we need to expand the page store with mremap().

  • A reference count for each memory page page_counts_. Each time a snapshot references a page, the counter is incremented. If a snapshot is freed, the reference count is decremented. When the reference count of a page reaches 0, it is added to a list of available pages (free_pages_).

  • A list of free pages free_pages_ which can be reused. This avoids having to scan the reference count list to find a free page.

  • When we are expanding the memory map we do not want to add thousand of page to the free_pages_ list and remove them just afterwards. The top_index_ field is an index after which all pages are free and are not in the free_pages_ list.

  • When we are adding a page, we need to check if a page with the same content is already in the page store in order to reuse it. For this reason, we maintain an index (hash_index_) mapping the hash of a page to the list of page indices with this hash. We use a fast (non-cryptographic) hash so there may be conflicts: we must be able to store multiple indices for the same hash.

We want to keep this memory region (*memory_) aligned on the memory pages (so that we might be able to create non-linear memory mappings on those pages in the future) and be able to expand it without copying the data (there will be a lot of pages here): we will be able to efficiently expand the memory mapping using mremap(), moving it to another virtual address if necessary.

void* new_memory = mremap(this->memory_, this->capacity_ << xbt_pagebits, newsize << xbt_pagebits, MREMAP_MAYMOVE);
if (new_memory == MAP_FAILED) {
 xbt_die("Could not mremap snapshot pages.");
}
this->capacity_ = newsize;
this->memory_ = new_memory;
this->page_counts_.resize(newsize, 0);

Because we will move this memory mapping on the virtual address space, we only need to store index of the page in the snapshots and the page will always be looked up by going through memory_:

const void* s_mc_pages_store::get_page(size_t pageno) const {
 return (char*) this->memory_ + pageno << pagebits;
}

API

class s_mc_pages_store {
  // [...]

public: // Ctor and dtor
  explicit s_mc_pages_store(size_t size);
  ~s_mc_pages_store();

public: // API

  void unref_page(size_t pageno);
  void ref_page(size_t pageno);
  size_t store_page(void* page);
  const void* get_page(size_t pageno) const;

private:
  size_t alloc_page();

};

get_page()

get_page() returns a pointer to the memory from its index.

const void* s_mc_pages_store::get_page(size_t pageno) const {
  return (char*) this->memory_ + pageno << pagebits;
}

store_page()

store_page() is used to store a page in the page store and return the index of the stored page.

size_t s_mc_pages_store::store_page(void* page)
{

First, we check if a page with the same content is already in the page store:

  1. compute the hash of the page;
  2. find pages with the same hash using hash_index_;
  3. memcmp() those pages with the one we are inserting to find a page with the same content.
  uint64_t hash = mc_hash_page(page);
  page_set_type& page_set = this->hash_index_[hash];
  BOOST_FOREACH (size_t pageno, page_set) {
    const void* snapshot_page = this->get_page(pageno);
    if (memcmp(page, snapshot_page, xbt_pagesize) == 0) {

If a page with the same content is already in the page store it is reused and its reference count is incremented.

      page_counts_[pageno]++;
      return pageno;
    }
  }

Otherwise, a new page is allocated in the page store and the content of the page is memcpy()-ed to this new page.

  size_t pageno = this->alloc_page();
  void* snapshot_page = (void*) this->get_page(pageno);
  memcpy(snapshot_page, page, xbt_pagesize);
  page_set.insert(pageno);
  page_counts_[pageno]++;
  return pageno;
}

ref_page()

This method used to increase a reference count of a page if we know that the content of a page is the same as a page already in the page store.

This will be the case if a page if soft clean: we know that is has not changed since the previous snapshot/restoration and we can avoid hashing the page, comparing byte-per-byte to candidates.

void s_mc_pages_store::ref_page(size_t pageno) {
  ++this->page_counts_[pageno];
}

unref_page()

Decrement the reference count of this page. Used when a snapshot is destroyed.

If the reference count reaches zero, the page is recycled: it is added to the free_pages_ list and removed from the hash_index_. In the current implementation, we need to hash the page in order to find it in the index.

void s_mc_pages_store::unref_page(size_t pageno) {
  if ((--this->page_counts_[pageno]) == 0) {
    this->free_pages_.push_back(pageno);
    void* page = ((char*)this->memory_ + (pageno << pagebits));
    uint64_t hash = mc_hash_page(page);
    this->hash_index_[hash].erase(pageno);
  }
}

Tweaks and improvements

Which hashing algorithm?

Currently the code is using djb2 but other hashes such as Murmur or CityHash are probably better.

Shared memory page store

It is very easy to use a file (shared memory, FS file, block-device) instead of anonymous memory for the page store: could we use this to parallelise the model checker on different processes or even machines?

References