[ANN] Sandmark nightly now supports latency profiling

Sandmark nightly now monitors tail latency of sequential and parallel applications enabled by new features in OCaml 5.

Click to see the Sequential latency benchmark run

Click here to see the Parallel latency benchmark run

Instrumented runtime of the past

In the past, Sandmark used to support monitoring GC latencies using the instrumented runtime that was present in OCaml 4. But this GC latency feature was disabled due to breaking changes in Sandmark when moving from OCaml 4 to OCaml 5. It is also useful to note that the instrumented runtime wrote to a file, and had a noticeable impact on the program speed. As a result, this instrumentation had to be enabled with a compile-time flag that linked the instrumented runtime with the application rather than the default runtime. The instrumented runtime was used to generate the graphs that were used in the ICFP paper, Retrofitting Parallelism onto OCaml (Fig 10 and Fig 12). However, given its cost, the instrumented runtime was seen as only to be used by GC hackers for performance debugging.

Latency profiling through olly

OCaml 5 supports Runtime Events — a new feature that enables continuous monitoring of production applications. The key differences to the earlier instrumented runtime approach are

  1. Instead of a file, the events are now written to a shared in-memory ring. The events may be read out by an external process from this ring.
  2. Some of the frequent (expensive) probes associated are eschewed to keep the costs low. The expensive probes are still available using the instrumented runtime.

Due to this design, every OCaml 5 program may be continuously monitored for performance, not just the ones compiled with the instrumented runtime. On top of this runtime events feature, we have built olly, an observability tool for OCaml programs. Olly can extract traces of GC events that can be viewed by Perfetto and also produce a short report on GC behaviour including tail latency profiles.

The Sandmark team has now replaced the old latency profiling feature developed over OCaml 4 instrumented runtimes to using olly to generate the profiles. (See Sandmark PR here). Now, the OCaml compiler is continuously monitored not only for speed and memory usage, but also for latency.

Call for action

If you are interested in profiling and analysing the performance of the development branch of the OCaml compiler, please submit your branch through Sandmark Nightly Config.


olly is super appreciated! I’ve been searching for a tool like this for ocaml for a while. but which tools are used to profile time and space usage?

For time, Sandmark uses the time command. For example,

$ /usr/bin/time -v ocamlopt.opt 
	Command being timed: "ocamlopt.opt"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 100%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.01
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 13612
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1458
	Voluntary context switches: 1
	Involuntary context switches: 1
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Metrics related to memory usage are drawn from the gc statistics emitted by the programs using the built-in compiler feature. Sandmark does,

$ OCAMLRUNPARAM="v=0x400" ocamlopt.opt
allocated_words: 123142
minor_words: 121896
promoted_words: 0
major_words: 1246
minor_collections: 0
major_collections: 0
heap_words: 126976
heap_chunks: 1
top_heap_words: 126976
compactions: 0
forced_major_collections: 0

and collates and reports the metrics.

Note that Sandmark reports metrics for the whole program run and not at, say, the function level. If you are interested in doing that then use:

  • Time → perf
    • perf record <executable> <args> and perf report is a good start.
  • Space → statmemprof / memtrace
    • Note that statmemprof is not yet supported on OCaml 5. Tarides is working hard to restore it.
  • Latency → magictrace, meio, ??