Blog post: Using model-based testing on a Mirage filesystem implementation

gasche · May 17, 2025, 8:07am

I am attending the MirageOS retreat in Marrakesh, with in particular:

Mindy @yomimono, who implemented a small (but not so simple) file system for Mirage, Chamelon
Armaël @armael, who is interested in testing effectful, concurrent and/or distributed systems

Armaël and myself decided to try to use model-based testing on Chamelon.

Note: this is a long blog post, and you should feel free to skip at any point. In particular, do skip the code sections if you don’t care about the details, they are here to make things precise but not strictly necessary.

Background

Model-based testing

Model-based testing is a variant of property-based testing where instead of testing specific properties of a candidate implementation (“if you read twice in a row, you get the same result”), you implement a full reference implementation of the system in the most naive, simple, correct way possible, and you test that the reference implementations and the candidate implementation agree. I have used model-based testing on the Store library before, and I have found it to be surprisingly effective: on Store, it instantly finds all the bugs.

We decided to use the Monolith library for model-based testing, which Armaël and I both have experience with. Another library that can express model-based testing is Jan Midtgaard @jmid’s qcheck-stm library. (I haven’t used qcheck-stm yet and I hope to try it as well eventually. My understanding is that it is a bit more low-level: less obvious to use at first, but offers more opportunities for control test generation than Monolith.)

Chamelon

Chamelon has lib code that implements the low-level data-structures and operations of the file system – a simplified version of another filesystem called littlefs (see DESIGN.md for a deep dive). It then exposes to MirageOS programs it in two different ways:

Fs, which implements a traditional file-system interface, as a functor parametrized over the “block device” operations writing to the underlying disk
Kv, which exposes a simpler key-value store interface (where keys are filesystem paths: /foo/bar/baz).

The “block device” signature offered by Mirage looks like this (we had to provide a mock implementation of it):

type t
(** The type representing the internal state of the block device *)

val read: t -> int64 -> Cstruct.t list -> (unit, error) result Lwt.t
(** [read device sector_start buffers] reads data starting at
    [sector_start] from the block device into [buffers]. [...] *)

val write: t -> int64 -> Cstruct.t list ->
  (unit, write_error) result Lwt.t
(** [write device sector_start buffers] writes data from [buffers]
    onto the block device starting at [sector_start]. [...] *)
    
(* ... *)

The key-value interface exposed by Chamelon, which we tested, looks like this:

type t
(** The type representing the internal state of the key-value store. *)

val get: t -> key -> (string, error) result Lwt.t
val set: t -> key -> string -> (unit, write_error) result Lwt.t

(* ... *)

Difficulties: non-determinism, Lwt

My simple mental model of model-based testing is that the test engine will generate a “test plan”, a sequence of API functions to call. It will then execute this test plan on the two systems independently – the reference implementation and the candidate implementation (the system under test). At each point it checks that the results of both implementations agree – or it reports a bug.

This simple mental picture does not completely work with Chamelon for two reasons:

Chamelon is designed to be robust to crashes of the underlying block device, and we would like to be able to tell that by injecting crashes. But the place where the crash happen cannot be decided ahead of time: each call of Chamelon to the underlying block device is a possible crash point, but we don’t know how many calls a given operation will make, and we cannot guess the number in advance in our test plan.
Chamelon and the underlying block devices use Lwt, so their function results are wrapped in a Lwt.t. Monolith does not natively support testing Lwt computations, and it is not obvious how to do it: one may think of generating a Lwt computations that runs the two implementations in lockstep on a given test plan, but this sounds a bit tricky. We also ideally would like to change the way Lwt sub-computations are scheduled during testing, to be deterministic – following a fixed seed given as part of the test plan.

Non-determinism

For problem (1) (non-determinism), we eventually understood that we are not checking a symmmetric property that “the two systems agree”, but a simulation property: the reference implementation can simulate the candidate implementation. For each operation in the test plan, the engine should first run the candidate implementation, possibly inject a crash during one of its internal steps, keep track of the result and whether a crash happened, and ask the reference implementation: can you also produce this outcome? In our case, if there was no crash we run the operation on the reference implementation as before, and if there was a crash we simply discard the operation – the candidate implementation should be crash-tolerant, so that it can restart after a crash as if nothing happened.

We thought that we may have to re-implement our own model-testing engine to do this, but in fact there seems to be support in Monolith to do this: Testing nondeterministic functions. In fact, we decided to actually implement crash injection later, so we don’t know for sure how well that works.

Lwt

We want the behavior of the candidate implementation to be a deterministic function from the test plan. It can use randomness, but then the seed should be part of the test plan, for reproducibility of failures. We didn’t know how to test a component using Lwt and remain deterministic.

One idea was to simply port Chamelon away from Lwt: we have to implement a mock version of the Mirage block device, we can do it without Lwt, and then we can remove the monad from the Chamelon code on top of that. But this is work, and conceptually very invasive! Ideally we want to be able to test code as it is written, not rewrite it first.

Romain @dinosaure pointed us in a much better direction: the Mirage runtime actually has its own run : 'a Lwt.t -> 'a implementation, and it is very simple! We can do one as well for testing. The Mirage implementation is in mirage-solor5/lib/main.ml; it implements a Lwt scheduler on top of an underlying select/epoll-like abstraction provided by the solo5 runtime. The reason it is so simple is that the Mirage Lwt code does not use the various blocking primitives provided by the Lwt_unix module. The scheduler can assume that the only thing Lwt threads can block on are condition variables that it itself provided. We can make the same assumption: the Chamelon code does not use fancy Lwt primitives, it calls the Block module (its functor argument) which returns Lwt computations and then composes the result in simple ways.

So we wrote our own Lwt scheduler, and it’s very simple!

module Test_lwt = struct
  (* the state of the Lwt scheduler *) 
  type sched_state = {
    mutable waitlist : unit Lwt_condition.t list;
    rng : Random.State.t;
  }
  
  (* to be called by user programs (our mockup implementation
     of Block) to yield the current computation. *)
  let yield st =
    let cond = Lwt_condition.create () in
    st.waitlist <- cond :: st.waitlist;
    Lwt_condition.wait cond

  let no_waiter st =
    st.waitlist = []

  (* to restart computation, pick a waiter at random
     and remove it from the waitlist. *)
  let pick_waiter st =
    let len = List.length st.waitlist in
    if len = 0 then failwith "Test_lwt.run: starved";
    let i = Random.State.int st.rng len in
    let w = List.nth st.waitlist i in
    (* very naive code! but it does not matter in practice. *)
    let rest = List.filteri (fun j _ -> j <> i) st.waitlist in
    st.waitlist <- rest;
    w

  let rec run st (t : 'a Lwt.t) : 'a =
    Lwt.wakeup_paused (); (* this is for computations that called [Lwt.pause] *)
    match Lwt.poll t with
    | Some v ->
      assert (no_waiter st);
      v
    | None ->
      (* pick a random thread that has yielded,
         and restart it *)
      let w = pick_waiter st in
      Lwt_condition.broadcast w ();
      run st t
end

The test plan will choose a seed for the random generator, and this makes our Test_lwt.run function deterministic.

Actually testing it

We decided to use small integers for directory and file names, to make generating test plans easier. So ([0; 1], 5) for example is a “path”, and it gets mapped to the key "/d0/d1/f5" in the filesystem.

Reference implementation

Our reference implementation is dead simple:

type file = int
type dir = int
type path = dir list * file
type value = string

module Reference = struct
  module Key = struct
    type t = path
    let compare = Stdlib.compare
  end
  module KV = Map.Make(Key)
  type init = unit
  type t = value KV.t ref

  let empty (() : init) = ref KV.empty

  let exists kv path =
    KV.mem path !kv

  let get kv path =
    KV.find_opt path !kv

  let set kv path v =
    kv := KV.add path v !kv

  let remove kv path =
    kv := KV.remove path !kv
end

(We only support a few operations, we could add more over time.)

Mock Block device

Remember that Chamelon is parametrized over the implementation of the block device. This is very convenient for us, we can provide our own mock block device. It is implemented as simply a mutable byte sequence, and we insert calls to Test_lwt.yield to create the opportunity for interesting Lwt interleavings.

type block_t = Bytes.t
let sector_size = 512

module type Block = (Mirage_block.S with type t = block_t)
module Block_mock (Yield : sig val yield : unit -> unit Lwt.t end): Block = struct
  type t = block_t

  (* ... *)

  let pos offset i =
    (Int64.to_int offset + i) * sector_size

  let in_bounds buf i =
    0 <= i && i < Bytes.length buf

  let read buf offset sectors =
    let rec loop i sectors =
      (* TODO: crash in addition to yield? *)
      let* () = Yield.yield () in
      match sectors with
      | [] -> Lwt.return (Ok ())
      | sector :: rem ->
        if not (in_bounds buf (pos offset (i + 1)))
        then Lwt.return (Error `Out_of_bounds)
        else
          Lwt_result.both
            (let open Cstruct in
             blit_from_bytes
               buf (pos offset i)
               sector 0 sector_size;
             Lwt.return (Ok ())
            )
            (loop (i + 1) rem)
          |> Lwt_result.map ignore
    in
    loop 0 sectors

  let write buf offset sectors =
    (* ... very similar to read *)
end

Candidate implementation

Finally, our Candidate module is a wrapper around the Chamelon Kv implementation, which calls our Lwt scheduler after each operation to return something at a base type.

(* a pair of a key-value store implementation and one key-value store state *)
module type KV = sig
  include Mirage_kv.RW
  val state : t
end

module Candidate = struct
  type init = { seed : int } 
  type t = {
    kv : (module KV);
    sched : Test_lwt.sched_state;
    disk : block_t;
  }

  let disk_size = 2048000

  (* run our Lwt scheduler, fail on error *)
  let run ~msg ~pp sched lwt =
    match Test_lwt.run sched lwt with
    | Ok v -> v
    | Error err ->
      Format.kasprintf failwith
        "TODO: Error in %s: %a"
        msg
        pp err        

  (* create an empty mock block device;
     this needs to initialize the RNG state from
     the provided seed and setup the Lwt scheduler. *)
  let empty { seed } =
    let rng = Random.State.make [|seed|] in
    let sched = { Test_lwt.waitlist = []; rng } in
    let module Yield = struct
      let yield () = Test_lwt.yield sched
    end in
    let module Block = Block_mock(Yield) in
    let buf = Bytes.create disk_size in
    let module KV = struct
      include Kv.Make(Block)(Pclock)
      let state =
        (* initialize the Chamelon state and store it *)
        let program_block_size = 32 in
        let lwt = format ~program_block_size buf in
        run ~msg:"Candidate.empty(format)"
          ~pp:pp_write_error sched lwt;
        let lwt = connect ~program_block_size buf in
        run ~msg:"Candidate.empty(connect)"
          ~pp:pp_error sched lwt
    end in
    {
      kv = (module KV : KV);
      sched;
      disk = buf;
    }

  let exists t path =
    let module KV = (val t.kv) in
    KV.exists KV.state (key_of_path path)
    |> Lwt.map (function
      | Ok v -> Ok v
      | Error (`Not_found _) -> Ok None
      | Error e -> Error e
    )
    |> run ~pp:KV.pp_error ~msg:"Candidate.exists" t.sched
    |> Option.is_some

  let remove t path =
    let module KV = (val t.kv) in
    KV.remove KV.state (key_of_path path)
    |> run ~pp:KV.pp_write_error ~msg:"Candidate.remove" t.sched

  let get t path =
    let module KV = (val t.kv) in
    KV.get KV.state (key_of_path path)
    |> Lwt.map (function
      | Ok v -> Ok (Some v)
      | Error (`Not_found _) -> Ok None
      | Error e -> Error e
    )
    |> run ~pp:KV.pp_error ~msg:"Candidate.get" t.sched

  let set t path v =
    let module KV = (val t.kv) in
    KV.set KV.state (key_of_path path) v
    |> run ~pp:KV.pp_write_error ~msg:"Candidate.set" t.sched
end

Note: the code is a bit tricky with functors, first-class modules, etc. It comes from the fact that the block module has to receive the Lwt scheduler state from somewhere, and we decided to make it a functor. @armael kepts pleading for a global mutable variable instead, and I tried to stick to less global abstractions as long as we managed to do it. Maybe we should have given up earlier and went for the simple option.

At this point we could write a trivial program that writes a key and reads it back, and it helped us debug our code:

let () =
  let fs = Candidate.empty { seed = 42 } in
  let k = ([0; 1], 2) in
  Candidate.set fs k "toto";
  print_endline (match Candidate.get fs k with
    | None -> "Not found"
    | Some v -> v)

Monolith

The Monolith part of the code looks like this:

open Monolith

let kv =
  declare_abstract_type ()

let arbitrary_dir : dir Gen.gen =
  Gen.lt 2

let arbitrary_file : file Gen.gen =
  Gen.lt 4

let arbitrary_seed : int Gen.gen =
  Gen.lt 4

(* pick an arbitrary key *)
let arbitrary_key : path Gen.gen = fun () ->
  let dirs = Gen.list (Gen.lt 3) arbitrary_dir () in
  let file = arbitrary_file () in
  (dirs, file)

(* pick a key that is already in the store *)
let present_key (kv : R.t) () =
  let m = !kv in
  let n = R.KV.cardinal m in
  let i = Gen.int n () in
  let key, _value = List.nth (R.KV.bindings m) i in
  key

(* mix of already-present keys and arbitrary keys *)
let key m () =
  if Gen.bool() then
    arbitrary_key ()
  else
    present_key m ()

(* some boilerplate *)
let key m =
  easily_constructible
    (key m)
    Print.(pair (list int) int)

(* We want our values to be large strings that fit in several 'sectors' *)
let value_gen : string Gen.gen =
  let fragments = List.init 12 (fun i ->
    let init size = String.init size (fun pos -> char_of_int (pos mod 256)) in
    [|
      init (1 lsl i - 1);
      init (1 lsl i);
      init (1 lsl i + 1);
    |]
  ) |> Array.concat
  in
  fun () ->
    let nb_fragments = 1 + Gen.lt 5 () in
    List.init nb_fragments (fun _ -> fragments.(Gen.lt (Array.length fragments) ()))
    |> String.concat ""

let value = (* weird Monolith boilerplate *)
  ifpol
    (easily_constructible
       value_gen
       Print.string)
    (deconstructible
      ~equal:(String.equal, constant "String.equal")
      Print.string
    )


(* Declare the operations that Monolith should use to generate test plans. *)

let () =
  let spec = kv ^>> fun m -> key m ^> bool in
  declare "exists" spec R.exists C.exists;

  let spec = kv ^>> fun m -> key m ^> value ^> unit in
  declare "set" spec R.set C.set;

  let spec = kv ^>> fun m -> key m ^> option value in
  declare "get" spec R.get C.get;

  ()

(* Start the engine! *)
let () =
  (* the fuel is the number of operations in the test plan *)
  let fuel = 10 in
  main ~prologue:(fun () ->
    (* at the beginning of each test plan,
       generate a single empty key-value store from a fixed seed. *)
    let seed = arbitrary_seed () in
    dprintf "Seed: %d\n%!" seed;
    let spec = kv in
    declare "empty" spec
      (R.empty ())
      (C.empty { seed })
  ) fuel

A bug

We ran the program (dune exec fuzz/momo.exe). When everything goes well, the output looks like this:

 11K tests run so far (  5K/s overall,   5K/s now) (fuel = 10).
 15K tests run so far (  4K/s overall,   4K/s now) (fuel = 10).
 19K tests run so far (  4K/s overall,   4K/s now) (fuel = 10).
 23K tests run so far (  4K/s overall,   4K/s now) (fuel = 10).
 27K tests run so far (  4K/s overall,   4K/s now) (fuel = 10).
 30K tests run so far (  4K/s overall,   3K/s now) (fuel = 10).
 ...

But after a dozen seconds or so, it found a bug. It printed the following report:

(* @10: Failure in an observation: candidate and reference disagree. *)
          #require "monolith";;
          module Sup = Monolith.Support;;
Seed: 0
(* @01 *) let x0 = empty;;
(* @02 *) let () =
            set x0 ([  ], 3)
            "\000\001\002\003\004\005\006\007... !\"#$%&'()*+,-./0123456789:;<=>\000\001\002\003\004\005\006\007...\030\031";;
(* @03 *) let x1 = empty;;
(* @04 *) let _ = exists x0 ([  ], 3);;
(* @05 *) let () =
            set x0 ([  ], 2)
            "\000\001\002\003\004\005\006\007...0123456789:;<=>\000\000\001\002...\015\016";;
(* @06 *) let _ = exists x0 ([  ], 3);;
(* @07 *) let () =
            set x1 ([  ], 2)
            "\000\001\002\003\004\005\006\007...\254\255\000\001\002\003...\254...\253\254\255\000";;
(* @08 *) let (Some _) = get x0 ([  ], 3);;
(* @09 *) let () =
            set x0 ([  ], 2)
            "\000\001\002\003\004\005\006\007...\006\007\b";;
(* @10 *) let (Some observed) = get x0 ([  ], 2);;
          assert
          (
            String.equal observed
            "\000\001\002\003\004\005\006\007...\006\007\b"
          );;
          (* candidate finds "\000\001\002\003\004\005\006\007...\254\255\000\001\002\003...\254...\253\254\255\000" *)

The string literals shown are very long (I cut them out for readability), and some of the operations are in fact unnecessary.

Note: this is the trace that Monolith produces directly, and it is also valid OCaml code – after removing some stuff. So we can include the trace in our code to reproduce the bug following the test plan, which is very useful.

We did a bit of manual minimization and ended up with the following:

module _ = struct
[@@@warning "-a"]
open Candidate
let empty = empty { seed = 1 }
(* @02 *) let x1 = empty;;
(* @03 *) let () = set x1 ([ 0; 0 ], 2) (String.make 67 ' ');;
(* @04 *) let () = set x1 ([ 0; 0 ], 0) (String.make 126 ' ');;
(* @05 *) let () = set x1 ([ 0; 0 ], 0) "\001";;
(* @07 *) let () = set x1 ([ 0; 0 ], 0) "\000";;
(* @09 *) let (Some observed) = get x1 ([ 0; 0 ], 0);;
          Printf.printf "%S\n%!" observed;;
          assert (String.equal observed "\000");;
          (* candidate finds "\001" *)
end

The strings (String.make 67 ' ') and (String.make 126 ' ') are minimal: if you pass strings of size 66 or 125 there, the bug disappears.

Just to be sure, we also implemented a variant of the Candidate implementation that does not use our mock block device, but a standard in-memory block device from the Mirage libraries. It also does not use our custom Lwt scheduler, but instead the standard Lwt runn function. The bug could go away in these conditions if it depends on a specific Lwt scheduling order that we found; but in fact this bug did not.

module _ = struct
[@@@warning "-a"]
open Serious_candidate
let ( let* ) = Lwt.Infix.( >>= )
let () = Lwt_main.run @@
  let* empty = empty () in
(* @02 *) let x1 = empty in
(* @03 *) let* () = set x1 ([ 0; 0 ], 2) (String.make 67 ' ') in
(* @04 *) let* () = set x1 ([ 0; 0 ], 0) (String.make 126 ' ') in
(* @05 *) let* () = set x1 ([ 0; 0 ], 0) "\001" in
(* @07 *) let* () = set x1 ([ 0; 0 ], 0) "\000" in
(* @09 *) let* (Some observed) = get x1 ([ 0; 0 ], 0) in
          Printf.printf "%S\n%!" observed;
          assert (String.equal observed "\000");
          Lwt.return ()
          (* candidate finds "\001" *)
end

Our bug report on Chamelon has the full code in case you are curious. Note that our goal is not really to report and fix bugs in Chamelon – that particular filesystem is not actively used by Mirage projects and not actively maintained – but rather to find out how to do model-based testing on effectful, non-deterministic systems.

Note: thanks are due to @Rucikir for proofreading this post.

yomimono · May 17, 2025, 11:11am

Earlier in the retreat, @gasche gave a short presentation and demo of monolith. I was very impressed by the power of interrogating the code via the tests - “if I make this change, what breaks?” I’ve been able to do this with other testing frameworks to an extent, but it’s always been a way clunkier process and often didn’t have enough exploratory power, so I would only find “nearby” problems.

Chamelon is unmaintained largely because it’s a complicated piece of software that’s hard to revisit after some time away. In order to address issues of any significance, I need to take a lot of time to put the littlefs spec back into my working memory, along with the details of the implementation for Chamelon.

I was just about to start thinking “maybe tools like monolith can help me understand complicated old code and be more confident in changing it, but that seems like a lot of work and I’m not sure how I would do it”, and then @gasche and @armael did it for me! I’m very grateful for their work, both in the particular case of finding this bug in Chamelon and for the more general work of showing how this kind of approach can work for the kind of software I write (and sometimes abandon, but aspire to revisit).

reynir · May 17, 2025, 12:33pm

Thank you for this nice article! What struck me is you don’t try to model littlefs, and instead use a much simpler mirage-kv model! Which I guess can also be used to test other mirage-kv implementations (file systems). I think that is very interesting and a nice observation.

Topic		Replies	Views
[ANN] qcheck-lin and qcheck-stm 0.1.0 Community multicore , testing , qcheck	4	994	December 12, 2022
[ANN] qcheck-lin and qcheck-stm 0.2 Community multicore , testing , qcheck	5	926	April 10, 2025
[ANN] QCheck 0.6: property based testing Community announce , testing	0	835	May 30, 2017
[ANN] Testo 0.1.0 - a new testing framework for OCaml Ecosystem announce , testing	3	417	December 1, 2024
Introducing Model_quickcheck. Quickcheck for stateful, imperative code Learning	0	1072	March 12, 2020