[ANN] First announcement of Bechamel

Bechamel, an agnostic micro-benchmarking tool

I’m glad to announce the release of Bechamel.0.2.0. Bechamel is a framework to do micro-benchmark. As a MirageOS project, the core library does not depends on Unix syscalls (hence the term “agnostic”). It provides:

  • an extensible way to record metrics
  • different views of results

Indeed, we know that it can be difficult to make a béchamel sauce. Adding the milk while adding the flour and mixing it all together requires at least three hands. The observability of the operation is difficult and can therefore, in view of our abilities, interfere with the expected result.

This is the reason why Bechamel exists. It allows to make this mixture and ensures that the results are more or less correct. It performs the desired function in a restricted and controlled context in order to remove interference. A touch of machine learning allows us to determine the true outcome of metrics such as time, words allocated in the minor heap or more exotic metrics such as those available via the Linux kernel.

Finally, the presentation of the results counts as the presentation of your lasagne. Thus, Bechamel offers several ways to present the results depending on what you want. We can offer you:

  • An interface in your terminal
  • A Web 3.0 page which is a full report of your experiment

You can see an example of this report here.

Extensibility of metrics

Depending on your runtime context, you can get few metrics from the kernel. For instance, Linux comes with the perf tools which is able to record some metrics such as:

  • the cpu-clock: this reports the CPU clock, a high-resolution per-CPU timer.
  • the page-faults: this reports the number of page faults
  • etc.

They are available via the bechamel-perf package which can be linked with your benchmark. You can see a simple example into the distribution: sqrt.ml

The HTML output

The HTML + Javascript is pretty simple to generate. Let’s say that you have:

let benchmark () : (Bechamel_js.ols_result * Bechamel_js.raws) =
  let ols = Analyze.ols ~bootstrap:0 ~r_square:true ~predictors:Measure.[| run |] in
  let instances = Instance.[ minor_allocated; major_allocated; monotonic_clock ] in
  let cfg =
    Benchmark.cfg ~limit:2000 ~stabilize:true ~quota:(Time.second 0.5)
      ~kde:(Some 1000) () in
  let raw_results =
    Benchmark.all cfg instances
      (Test.make_grouped ~name:"factorial" ~fmt:"%s %s" [ test0; test1 ]) in
  let results = List.map (fun instance -> Analyze.all ols instance raw_results) instances in
  let results = Analyze.merge ols instances results in
  (results, raw_results)

You just need to “emit” results into the JSON format:

let compare k0 k1 =
  let a = ref 0 and b = ref 0 in
  Scanf.sscanf k0 "%s %s %d" (fun _ _ a' -> a := a');
  Scanf.sscanf k1 "%s %s %d" (fun _ _ b' -> b := b');
  !a - !b

let nothing _ = Ok ()

let () =
  let results = benchmark () in
  let results =
    let open Bechamel_js in
    emit ~dst:(Channel stdout) nothing ~compare ~x_label:Measure.run
      ~y_label:(Measure.label Instance.monotonic_clock)
      results in
  match results with Ok () -> () | Error (`Msg err) -> invalid_arg err

And a simple intrumentation of dune is enough to generate the HTML + Javascript page via bechamel-html:

(executable
 (name fact)
 (modules fact)
 (public_name bechamel-js.examples.fact)
 (package bechamel-js)
 (libraries bechamel bechamel-js))

(rule
 (targets fact.json)
 (action
  (with-stdout-to
   %{targets}
   (run ./fact.exe))))

(rule
 (targets fact.html)
 (mode promote)
 (action
  (system "%{bin:bechamel-html} < %{dep:fact.json} > %{targets}")))

You can see a full example here.

Kernel Density Estimation

The report can show the histogram and/or the KDE of the given distribution of times to check if it’s a normal distribution - and ensure that the set given as arguments of our function considers all possibilities.

Resources

The micro-benchmark can be useful to ensure assumptions about syscalls. But they can require some resources. In that situation, Bechamel allows the user to define an allocation function which is executed before the benchmark.

This resource will be used by your test and will be released then at the end of the benchmark. For instance, Bechamel allows to record metrics for, for instance, io_uring.

Micro-benchmark, disclaimer

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil – Donald Knuth

Micro-benchmark should not be an argument to micro-optimize some parts of your code. Indeed, Bechamel mostly wants to report some observable values and ensure to avoid the Schrödinger’s cat case (where the tool affect results by the observation).

Bechamel wants to help the developper to assert some assumptions but it should not be an argument to say that your implementation is faster than an other one - at least, it helps you on this way.

23 Likes

Does anyone else want to see a @dinosaure cooking channel on watch.OCaml.org next?

21 Likes

a self-operated (peertube) video store, wonderful!

1 Like

Congrats on the impressive work, being in a benchmark sauce myself I will probably try it out shortly.

I’m not very familiar with microbenchmarks. Can I benchmark the runtime of a function that takes several hundred milliseconds with it? And will Bechamel automatically detect and remove GC pauses?

Also, is there a duration that is too long to be analyzed by micro-benchmarking? Is 100 ms too long?

Can I benchmark the runtime of a function that takes several hundred milliseconds with it?

It’s depends :slight_smile:. bechamel wants to provide some metrics about really small functions which spent few seconds. It’s hard to figure out the real time spent by these functions and this is why bechamel runs multiple times your function and analyze results to finally returns a more accurate result like: this syscall takes 0.5ms. For instance, fact.ml shows results in nanoseconds. We can see in this report that the longest call is the recursive factorial with the argument 100, it spent 13ms.

It’s probably fine to test a function which takes several hundred miliseconds - however, you probably reach the border between micro-benchmark and macro-benchmark :slight_smile:. The best is to see the r-square which is the coefficient of determination from the [OLS][ols] analyze. If it’s value is upper than 0.95, you can be confident about the outputted result.

And will Bechamel automatically detect and remove GC pauses?

bechamel prepares a runtime context which should not reach the GC pause. But it’s not really true. It depends on your function. stabilize prepares the GC and try to clean everything unneeded before the benchmark. But, if you function allocate a lot, the only possibility to execute your function multiple times without a GC pause is to enlarge your minor-heap.

Then, if your test requires a “resource” which must be allocated and deleted before and after the benchmark, bechamel provides such API. It will prepare your resource, stabilize the GC if you ask, run the benchmark and finally release the resource at the end.

Also, is there a duration that is too long to be analyzed by micro-benchmarking? Is 100 ms too long?

Depending on how long you want to wait :slight_smile: ! Again, you can restrict the whole benchmark and say that you don’t want to wait more than 1min for instance. You can limit how many samples we generate too. Most of these parameters are available via the cfg type:

More generally and this is what I said in the conclusion, the goal of bechamel is to give some accurate metrics. Don’t be afraid to consider that bechamel does not really help you. If we really think about micro-benchmark, metrics and contexts of the execution, we find many arguments to say that the micro-benchmark is not true/fair/reliable :slight_smile:

2 Likes

Thank you for the clarifications. It’s very nice to use. The ability to use perf metrics is very handy.