OCaml 5.0 introduced a new ring-buffer based tracing system with low overheads. This made most instrumentation available in the standard runtime and added the ability to keeping tracing on by default. The OCaml runtime uses this tracing system to track GC events. OCaml 5.1 went further to include support for custom events.
Runtime events tools through olly, provides functionality to grok the data provided by the runtime tracing system.
Olly has two modes; trace , and gc-stats
olly trace
$ olly trace example.trace example.exe
Records runtime traces in fuchsia and json formats. The trace files can be visualised with ui.perfetto. or json trace with chrome://tracing.
I’ve used it to generate OpenTelemetry traces with very low overhead (record begin/end and timestamp events using runtime events, and generate/submit the actual opentelemetry traces from another process).
This also works well when writing a load generator for a benchmark tool: you can generate and send the W3C Context header without influencing the benchmark itself too much (with some care it is possible to do this without allocating at all in the fast-path of the load generator), and record precise events with low overhead (e.g. when packets got sent, and received).
I’d have 2 suggestions:
currently I do my own clock synchronization by defining a custom event that records a Ptime.t on startup. This is very useful when doing distributed tracing across processes and across hosts. Having some first class support for absolute timestamps might be useful.
make the time-source user overridable. E.g. one may want to use rdtsc or rdtscp (with an appropriate fence instruction as needed), CLOCK_MONOTONIC may in some situations be very slow and result in a syscall each time (e.g. inside a VM with the Xen clocksource). Of course I can define my own custom event to record that but I can’t disable the internal caml_time_counter call.
BTW I want to avoid (memory allocation) overhead when serializing the custom event, and there are very few readily available solutions for that, except for Marshal. In general I avoid using Marshal due to type-safety concerns, but as long as you build the tracer and the traced program from the exact same library it should work. Is there a better way?