I am attending the MirageOS retreat in Marrakesh, with in particular:
- Mindy @yomimono, who implemented a small (but not so simple) file system for Mirage, Chamelon
- Armaël @armael, who is interested in testing effectful, concurrent and/or distributed systems
Armaël and myself decided to try to use model-based testing on Chamelon.
Note: this is a long blog post, and you should feel free to skip at any point. In particular, do skip the code sections if you don’t care about the details, they are here to make things precise but not strictly necessary.
Background
Model-based testing
Model-based testing is a variant of property-based testing where instead of testing specific properties of a candidate implementation (“if you read twice in a row, you get the same result”), you implement a full reference implementation of the system in the most naive, simple, correct way possible, and you test that the reference implementations and the candidate implementation agree. I have used model-based testing on the Store library before, and I have found it to be surprisingly effective: on Store, it instantly finds all the bugs.
We decided to use the Monolith library for model-based testing, which Armaël and I both have experience with. Another library that can express model-based testing is Jan Midtgaard @jmid’s qcheck-stm library. (I haven’t used qcheck-stm
yet and I hope to try it as well eventually. My understanding is that it is a bit more low-level: less obvious to use at first, but offers more opportunities for control test generation than Monolith.)
Chamelon
Chamelon has lib code that implements the low-level data-structures and operations of the file system – a simplified version of another filesystem called littlefs (see DESIGN.md for a deep dive). It then exposes to MirageOS programs it in two different ways:
- Fs, which implements a traditional file-system interface, as a functor parametrized over the “block device” operations writing to the underlying disk
- Kv, which exposes a simpler key-value store interface (where keys are filesystem paths:
/foo/bar/baz
).
The “block device” signature offered by Mirage looks like this (we had to provide a mock implementation of it):
type t
(** The type representing the internal state of the block device *)
val read: t -> int64 -> Cstruct.t list -> (unit, error) result Lwt.t
(** [read device sector_start buffers] reads data starting at
[sector_start] from the block device into [buffers]. [...] *)
val write: t -> int64 -> Cstruct.t list ->
(unit, write_error) result Lwt.t
(** [write device sector_start buffers] writes data from [buffers]
onto the block device starting at [sector_start]. [...] *)
(* ... *)
The key-value interface exposed by Chamelon, which we tested, looks like this:
type t
(** The type representing the internal state of the key-value store. *)
val get: t -> key -> (string, error) result Lwt.t
val set: t -> key -> string -> (unit, write_error) result Lwt.t
(* ... *)
Difficulties: non-determinism, Lwt
My simple mental model of model-based testing is that the test engine will generate a “test plan”, a sequence of API functions to call. It will then execute this test plan on the two systems independently – the reference implementation and the candidate implementation (the system under test). At each point it checks that the results of both implementations agree – or it reports a bug.
This simple mental picture does not completely work with Chamelon for two reasons:
-
Chamelon is designed to be robust to crashes of the underlying block device, and we would like to be able to tell that by injecting crashes. But the place where the crash happen cannot be decided ahead of time: each call of Chamelon to the underlying block device is a possible crash point, but we don’t know how many calls a given operation will make, and we cannot guess the number in advance in our test plan.
-
Chamelon and the underlying block devices use Lwt, so their function results are wrapped in a
Lwt.t
. Monolith does not natively support testing Lwt computations, and it is not obvious how to do it: one may think of generating a Lwt computations that runs the two implementations in lockstep on a given test plan, but this sounds a bit tricky. We also ideally would like to change the way Lwt sub-computations are scheduled during testing, to be deterministic – following a fixed seed given as part of the test plan.
Non-determinism
For problem (1) (non-determinism), we eventually understood that we are not checking a symmmetric property that “the two systems agree”, but a simulation property: the reference implementation can simulate the candidate implementation. For each operation in the test plan, the engine should first run the candidate implementation, possibly inject a crash during one of its internal steps, keep track of the result and whether a crash happened, and ask the reference implementation: can you also produce this outcome? In our case, if there was no crash we run the operation on the reference implementation as before, and if there was a crash we simply discard the operation – the candidate implementation should be crash-tolerant, so that it can restart after a crash as if nothing happened.
We thought that we may have to re-implement our own model-testing engine to do this, but in fact there seems to be support in Monolith to do this: Testing nondeterministic functions. In fact, we decided to actually implement crash injection later, so we don’t know for sure how well that works.
Lwt
We want the behavior of the candidate implementation to be a deterministic function from the test plan. It can use randomness, but then the seed should be part of the test plan, for reproducibility of failures. We didn’t know how to test a component using Lwt
and remain deterministic.
One idea was to simply port Chamelon away from Lwt
: we have to implement a mock version of the Mirage block device, we can do it without Lwt
, and then we can remove the monad from the Chamelon code on top of that. But this is work, and conceptually very invasive! Ideally we want to be able to test code as it is written, not rewrite it first.
Romain @dinosaure pointed us in a much better direction: the Mirage runtime actually has its own run : 'a Lwt.t -> 'a
implementation, and it is very simple! We can do one as well for testing. The Mirage implementation is in mirage-solor5/lib/main.ml; it implements a Lwt scheduler on top of an underlying select/epoll-like abstraction provided by the solo5 runtime. The reason it is so simple is that the Mirage Lwt code does not use the various blocking primitives provided by the Lwt_unix module. The scheduler can assume that the only thing Lwt threads can block on are condition variables that it itself provided. We can make the same assumption: the Chamelon code does not use fancy Lwt primitives, it calls the Block
module (its functor argument) which returns Lwt computations and then composes the result in simple ways.
So we wrote our own Lwt scheduler, and it’s very simple!
module Test_lwt = struct
(* the state of the Lwt scheduler *)
type sched_state = {
mutable waitlist : unit Lwt_condition.t list;
rng : Random.State.t;
}
(* to be called by user programs (our mockup implementation
of Block) to yield the current computation. *)
let yield st =
let cond = Lwt_condition.create () in
st.waitlist <- cond :: st.waitlist;
Lwt_condition.wait cond
let no_waiter st =
st.waitlist = []
(* to restart computation, pick a waiter at random
and remove it from the waitlist. *)
let pick_waiter st =
let len = List.length st.waitlist in
if len = 0 then failwith "Test_lwt.run: starved";
let i = Random.State.int st.rng len in
let w = List.nth st.waitlist i in
(* very naive code! but it does not matter in practice. *)
let rest = List.filteri (fun j _ -> j <> i) st.waitlist in
st.waitlist <- rest;
w
let rec run st (t : 'a Lwt.t) : 'a =
Lwt.wakeup_paused (); (* this is for computations that called [Lwt.pause] *)
match Lwt.poll t with
| Some v ->
assert (no_waiter st);
v
| None ->
(* pick a random thread that has yielded,
and restart it *)
let w = pick_waiter st in
Lwt_condition.broadcast w ();
run st t
end
The test plan will choose a seed for the random generator, and this makes our Test_lwt.run
function deterministic.
Actually testing it
We decided to use small integers for directory and file names, to make generating test plans easier. So ([0; 1], 5)
for example is a “path”, and it gets mapped to the key "/d0/d1/f5"
in the filesystem.
Reference implementation
Our reference implementation is dead simple:
type file = int
type dir = int
type path = dir list * file
type value = string
module Reference = struct
module Key = struct
type t = path
let compare = Stdlib.compare
end
module KV = Map.Make(Key)
type init = unit
type t = value KV.t ref
let empty (() : init) = ref KV.empty
let exists kv path =
KV.mem path !kv
let get kv path =
KV.find_opt path !kv
let set kv path v =
kv := KV.add path v !kv
let remove kv path =
kv := KV.remove path !kv
end
(We only support a few operations, we could add more over time.)
Mock Block device
Remember that Chamelon is parametrized over the implementation of the block device. This is very convenient for us, we can provide our own mock block device. It is implemented as simply a mutable byte sequence, and we insert calls to Test_lwt.yield
to create the opportunity for interesting Lwt interleavings.
type block_t = Bytes.t
let sector_size = 512
module type Block = (Mirage_block.S with type t = block_t)
module Block_mock (Yield : sig val yield : unit -> unit Lwt.t end): Block = struct
type t = block_t
(* ... *)
let pos offset i =
(Int64.to_int offset + i) * sector_size
let in_bounds buf i =
0 <= i && i < Bytes.length buf
let read buf offset sectors =
let rec loop i sectors =
(* TODO: crash in addition to yield? *)
let* () = Yield.yield () in
match sectors with
| [] -> Lwt.return (Ok ())
| sector :: rem ->
if not (in_bounds buf (pos offset (i + 1)))
then Lwt.return (Error `Out_of_bounds)
else
Lwt_result.both
(let open Cstruct in
blit_from_bytes
buf (pos offset i)
sector 0 sector_size;
Lwt.return (Ok ())
)
(loop (i + 1) rem)
|> Lwt_result.map ignore
in
loop 0 sectors
let write buf offset sectors =
(* ... very similar to read *)
end
Candidate implementation
Finally, our Candidate
module is a wrapper around the Chamelon Kv implementation, which calls our Lwt scheduler after each operation to return something at a base type.
(* a pair of a key-value store implementation and one key-value store state *)
module type KV = sig
include Mirage_kv.RW
val state : t
end
module Candidate = struct
type init = { seed : int }
type t = {
kv : (module KV);
sched : Test_lwt.sched_state;
disk : block_t;
}
let disk_size = 2048000
(* run our Lwt scheduler, fail on error *)
let run ~msg ~pp sched lwt =
match Test_lwt.run sched lwt with
| Ok v -> v
| Error err ->
Format.kasprintf failwith
"TODO: Error in %s: %a"
msg
pp err
(* create an empty mock block device;
this needs to initialize the RNG state from
the provided seed and setup the Lwt scheduler. *)
let empty { seed } =
let rng = Random.State.make [|seed|] in
let sched = { Test_lwt.waitlist = []; rng } in
let module Yield = struct
let yield () = Test_lwt.yield sched
end in
let module Block = Block_mock(Yield) in
let buf = Bytes.create disk_size in
let module KV = struct
include Kv.Make(Block)(Pclock)
let state =
(* initialize the Chamelon state and store it *)
let program_block_size = 32 in
let lwt = format ~program_block_size buf in
run ~msg:"Candidate.empty(format)"
~pp:pp_write_error sched lwt;
let lwt = connect ~program_block_size buf in
run ~msg:"Candidate.empty(connect)"
~pp:pp_error sched lwt
end in
{
kv = (module KV : KV);
sched;
disk = buf;
}
let exists t path =
let module KV = (val t.kv) in
KV.exists KV.state (key_of_path path)
|> Lwt.map (function
| Ok v -> Ok v
| Error (`Not_found _) -> Ok None
| Error e -> Error e
)
|> run ~pp:KV.pp_error ~msg:"Candidate.exists" t.sched
|> Option.is_some
let remove t path =
let module KV = (val t.kv) in
KV.remove KV.state (key_of_path path)
|> run ~pp:KV.pp_write_error ~msg:"Candidate.remove" t.sched
let get t path =
let module KV = (val t.kv) in
KV.get KV.state (key_of_path path)
|> Lwt.map (function
| Ok v -> Ok (Some v)
| Error (`Not_found _) -> Ok None
| Error e -> Error e
)
|> run ~pp:KV.pp_error ~msg:"Candidate.get" t.sched
let set t path v =
let module KV = (val t.kv) in
KV.set KV.state (key_of_path path) v
|> run ~pp:KV.pp_write_error ~msg:"Candidate.set" t.sched
end
Note: the code is a bit tricky with functors, first-class modules, etc. It comes from the fact that the block module has to receive the Lwt scheduler state from somewhere, and we decided to make it a functor. @armael kepts pleading for a global mutable variable instead, and I tried to stick to less global abstractions as long as we managed to do it. Maybe we should have given up earlier and went for the simple option.
At this point we could write a trivial program that writes a key and reads it back, and it helped us debug our code:
let () =
let fs = Candidate.empty { seed = 42 } in
let k = ([0; 1], 2) in
Candidate.set fs k "toto";
print_endline (match Candidate.get fs k with
| None -> "Not found"
| Some v -> v)
Monolith
The Monolith part of the code looks like this:
open Monolith
let kv =
declare_abstract_type ()
let arbitrary_dir : dir Gen.gen =
Gen.lt 2
let arbitrary_file : file Gen.gen =
Gen.lt 4
let arbitrary_seed : int Gen.gen =
Gen.lt 4
(* pick an arbitrary key *)
let arbitrary_key : path Gen.gen = fun () ->
let dirs = Gen.list (Gen.lt 3) arbitrary_dir () in
let file = arbitrary_file () in
(dirs, file)
(* pick a key that is already in the store *)
let present_key (kv : R.t) () =
let m = !kv in
let n = R.KV.cardinal m in
let i = Gen.int n () in
let key, _value = List.nth (R.KV.bindings m) i in
key
(* mix of already-present keys and arbitrary keys *)
let key m () =
if Gen.bool() then
arbitrary_key ()
else
present_key m ()
(* some boilerplate *)
let key m =
easily_constructible
(key m)
Print.(pair (list int) int)
(* We want our values to be large strings that fit in several 'sectors' *)
let value_gen : string Gen.gen =
let fragments = List.init 12 (fun i ->
let init size = String.init size (fun pos -> char_of_int (pos mod 256)) in
[|
init (1 lsl i - 1);
init (1 lsl i);
init (1 lsl i + 1);
|]
) |> Array.concat
in
fun () ->
let nb_fragments = 1 + Gen.lt 5 () in
List.init nb_fragments (fun _ -> fragments.(Gen.lt (Array.length fragments) ()))
|> String.concat ""
let value = (* weird Monolith boilerplate *)
ifpol
(easily_constructible
value_gen
Print.string)
(deconstructible
~equal:(String.equal, constant "String.equal")
Print.string
)
(* Declare the operations that Monolith should use to generate test plans. *)
let () =
let spec = kv ^>> fun m -> key m ^> bool in
declare "exists" spec R.exists C.exists;
let spec = kv ^>> fun m -> key m ^> value ^> unit in
declare "set" spec R.set C.set;
let spec = kv ^>> fun m -> key m ^> option value in
declare "get" spec R.get C.get;
()
(* Start the engine! *)
let () =
(* the fuel is the number of operations in the test plan *)
let fuel = 10 in
main ~prologue:(fun () ->
(* at the beginning of each test plan,
generate a single empty key-value store from a fixed seed. *)
let seed = arbitrary_seed () in
dprintf "Seed: %d\n%!" seed;
let spec = kv in
declare "empty" spec
(R.empty ())
(C.empty { seed })
) fuel
A bug
We ran the program (dune exec fuzz/momo.exe
). When everything goes well, the output looks like this:
11K tests run so far ( 5K/s overall, 5K/s now) (fuel = 10).
15K tests run so far ( 4K/s overall, 4K/s now) (fuel = 10).
19K tests run so far ( 4K/s overall, 4K/s now) (fuel = 10).
23K tests run so far ( 4K/s overall, 4K/s now) (fuel = 10).
27K tests run so far ( 4K/s overall, 4K/s now) (fuel = 10).
30K tests run so far ( 4K/s overall, 3K/s now) (fuel = 10).
...
But after a dozen seconds or so, it found a bug. It printed the following report:
(* @10: Failure in an observation: candidate and reference disagree. *)
#require "monolith";;
module Sup = Monolith.Support;;
Seed: 0
(* @01 *) let x0 = empty;;
(* @02 *) let () =
set x0 ([ ], 3)
"\000\001\002\003\004\005\006\007... !\"#$%&'()*+,-./0123456789:;<=>\000\001\002\003\004\005\006\007...\030\031";;
(* @03 *) let x1 = empty;;
(* @04 *) let _ = exists x0 ([ ], 3);;
(* @05 *) let () =
set x0 ([ ], 2)
"\000\001\002\003\004\005\006\007...0123456789:;<=>\000\000\001\002...\015\016";;
(* @06 *) let _ = exists x0 ([ ], 3);;
(* @07 *) let () =
set x1 ([ ], 2)
"\000\001\002\003\004\005\006\007...\254\255\000\001\002\003...\254...\253\254\255\000";;
(* @08 *) let (Some _) = get x0 ([ ], 3);;
(* @09 *) let () =
set x0 ([ ], 2)
"\000\001\002\003\004\005\006\007...\006\007\b";;
(* @10 *) let (Some observed) = get x0 ([ ], 2);;
assert
(
String.equal observed
"\000\001\002\003\004\005\006\007...\006\007\b"
);;
(* candidate finds "\000\001\002\003\004\005\006\007...\254\255\000\001\002\003...\254...\253\254\255\000" *)
The string literals shown are very long (I cut them out for readability), and some of the operations are in fact unnecessary.
Note: this is the trace that Monolith produces directly, and it is also valid OCaml code – after removing some stuff. So we can include the trace in our code to reproduce the bug following the test plan, which is very useful.
We did a bit of manual minimization and ended up with the following:
module _ = struct
[@@@warning "-a"]
open Candidate
let empty = empty { seed = 1 }
(* @02 *) let x1 = empty;;
(* @03 *) let () = set x1 ([ 0; 0 ], 2) (String.make 67 ' ');;
(* @04 *) let () = set x1 ([ 0; 0 ], 0) (String.make 126 ' ');;
(* @05 *) let () = set x1 ([ 0; 0 ], 0) "\001";;
(* @07 *) let () = set x1 ([ 0; 0 ], 0) "\000";;
(* @09 *) let (Some observed) = get x1 ([ 0; 0 ], 0);;
Printf.printf "%S\n%!" observed;;
assert (String.equal observed "\000");;
(* candidate finds "\001" *)
end
The strings (String.make 67 ' ')
and (String.make 126 ' ')
are minimal: if you pass strings of size 66 or 125 there, the bug disappears.
Just to be sure, we also implemented a variant of the Candidate
implementation that does not use our mock block device, but a standard in-memory block device from the Mirage libraries. It also does not use our custom Lwt scheduler, but instead the standard Lwt runn function. The bug could go away in these conditions if it depends on a specific Lwt scheduling order that we found; but in fact this bug did not.
module _ = struct
[@@@warning "-a"]
open Serious_candidate
let ( let* ) = Lwt.Infix.( >>= )
let () = Lwt_main.run @@
let* empty = empty () in
(* @02 *) let x1 = empty in
(* @03 *) let* () = set x1 ([ 0; 0 ], 2) (String.make 67 ' ') in
(* @04 *) let* () = set x1 ([ 0; 0 ], 0) (String.make 126 ' ') in
(* @05 *) let* () = set x1 ([ 0; 0 ], 0) "\001" in
(* @07 *) let* () = set x1 ([ 0; 0 ], 0) "\000" in
(* @09 *) let* (Some observed) = get x1 ([ 0; 0 ], 0) in
Printf.printf "%S\n%!" observed;
assert (String.equal observed "\000");
Lwt.return ()
(* candidate finds "\001" *)
end
Our bug report on Chamelon has the full code in case you are curious. Note that our goal is not really to report and fix bugs in Chamelon – that particular filesystem is not actively used by Mirage projects and not actively maintained – but rather to find out how to do model-based testing on effectful, non-deterministic systems.
Note: thanks are due to @Rucikir for proofreading this post.