[ANN] runtime_events_tools 0.5.0

I’m happy to announce the release of runtime_events_tools.0.5.0.

OCaml 5.0 introduced a new ring-buffer based tracing system with low overheads. This made most instrumentation available in the standard runtime and added the ability to keeping tracing on by default. The OCaml runtime uses this tracing system to track GC events. OCaml 5.1 went further to include support for custom events.

Runtime events tools through olly, provides functionality to grok the data provided by the runtime tracing system.

Olly has two modes; trace , and gc-stats

olly trace

$ olly trace example.trace example.exe

Records runtime traces in fuchsia and json formats. The trace files can be visualised with ui.perfetto. or json trace with chrome://tracing.

Here’s a sample trace rendered in perfetto.

olly gc-stats

Provides information about GC time and latencies.

$ olly gc-stats "binarytrees5_multicore.exe 2 20"

Execution times:
Wall time (s):	2.61
CPU time (s):	4.63
GC time (s):	2.93
GC overhead (% of CPU time):	63.18%

GC time per domain (s):
Domain0: 	1.59
Domain1: 	1.34

GC latency profile:
#[Mean (ms):	0.76,	 Stddev (ms):	1.56]
#[Min (ms):	0.00,	 max (ms):	15.68]

Percentile 	 Latency (ms)
25.0000 	 0.00
50.0000 	 0.01
60.0000 	 0.04
70.0000 	 0.28
75.0000 	 0.66
80.0000 	 1.40
85.0000 	 2.44
90.0000 	 3.16
95.0000 	 3.62
96.0000 	 3.79
97.0000 	 4.06
98.0000 	 4.73
99.0000 	 6.29
99.9000 	 13.59
99.9900 	 15.68
99.9990 	 15.68
99.9999 	 15.68
100.0000 	 15.68

New features in this release:

  • Support for fuchsia format: Stores the trace in binary format, making the trace files 4x smaller in size, on an average.
  • gc-stats mode: In addition to latency percentiles, GC stats provide more insights, such as GC time and GC time spent per domain.
  • Custom events support: Not only can you trace GC events, but now you can also trace your own events


Trace for a recursive fibonacci function

Note that if you see non-terminating events in your traces, you might want to include this compiler patch – Add a closing event for when `EV_MAJOR_EPHE_MARK` is complete by Sudha247 · Pull Request #12583 · ocaml/ocaml · GitHub.

16 Likes

The custom events support is great!

I’ve used it to generate OpenTelemetry traces with very low overhead (record begin/end and timestamp events using runtime events, and generate/submit the actual opentelemetry traces from another process).
This also works well when writing a load generator for a benchmark tool: you can generate and send the W3C Context header without influencing the benchmark itself too much (with some care it is possible to do this without allocating at all in the fast-path of the load generator), and record precise events with low overhead (e.g. when packets got sent, and received).

I’d have 2 suggestions:

  • currently I do my own clock synchronization by defining a custom event that records a Ptime.t on startup. This is very useful when doing distributed tracing across processes and across hosts. Having some first class support for absolute timestamps might be useful.

  • make the time-source user overridable. E.g. one may want to use rdtsc or rdtscp (with an appropriate fence instruction as needed), CLOCK_MONOTONIC may in some situations be very slow and result in a syscall each time (e.g. inside a VM with the Xen clocksource). Of course I can define my own custom event to record that but I can’t disable the internal caml_time_counter call.

BTW I want to avoid (memory allocation) overhead when serializing the custom event, and there are very few readily available solutions for that, except for Marshal. In general I avoid using Marshal due to type-safety concerns, but as long as you build the tracer and the traced program from the exact same library it should work. Is there a better way?

5 Likes

Hey @edwin,

It’s great to know custom events support is working well for you!

These are useful suggestions. May I suggest making issues on the runtime events Github repo so that we don’t forget these?