[ANN] trace 0.11

Dear all, I’m delighted to announce the release of trace 0.11. This is a major release and hopefully the last one before 1.0.

trace is a lightweight foundation for instrumentation, a bit like rust’s tracing. It provides a protocol between, one the one hand, libraries and applications that are instrumented; and a collector that decides what to do with that. My hope is that projects (especially libraries) can adopt trace without fear because of the tiny footprint and high flexibility, the same way logs is used in many places. Existing collectors can produce chrome format traces, fuchsia traces, plug into tracy, or into opentelemetry; a bridge to Runtime_events is planned[1].

API docs for the main library

brief example

A simple example program from the readme:

let (let@) = (@@)

let run () =
  Trace.set_process_name "main";
  Trace.set_thread_name "t1";

  let n = ref 0 in

  for _i = 1 to 50 do
    let@ _sp = Trace.with_span ~__FILE__ ~__LINE__ "outer.loop" in
    for _j = 2 to 5 do
      incr n;
      let _sp = Trace.with_span ~__FILE__ ~__LINE__ "inner.loop" in
      Trace.messagef (fun k -> k "hello %d %d" _i _j);
      Trace.message "world";
      Trace.counter_int "n" !n;
    done
  done

let () =
  (* here we setup the collector *)
  let@ () = Trace_tef.with_setup ~out:(`File "trace.json") () in
  run ()

If we run the program with TRACE=1 to enable this particular collector, we get a trace file in trace.json (but with actual timestamps):

[{"pid":2,"name":"process_name","ph":"M","args": {"name":"main"}},
{"pid":2,"tid": 3,"name":"thread_name","ph":"M","args": {"name":"t1"}},
{"pid":2,"cat":"","tid": 3,"ts": 2.00,"name":"hello 1 2","ph":"I"},
{"pid":2,"cat":"","tid": 3,"ts": 3.00,"name":"world","ph":"I"},
{"pid":2,"tid":3,"ts":4.00,"name":"c","ph":"C","args": {"n":1}},
…

Opening it in https://ui.perfetto.dev we get something like this:

screenshot of perfetto UI

what’s new in 0.11

0.11 contains major changes, almost all of which are breaking on the collector side. Instrumented programs should be mostly unaffected, aside from many deprecation warnings.

The core change is that Trace.span is now an open sum type, and not int64. This means less global state and fewer tables needed: collectors can pick exactly what data gets carried from the enter_span site into the exit_span site, if any. In turns, collectors get simpler and faster. The notion of “manual” span is now dead (a simple alias to normal spans) and all related functions are deprecated. 1.0 will not have this notion at all.

In addition, collectors are now a bag of callbacks+a state, rather than a first class module. trace.subscriber has been removed because the notion of subscriber is subsumed by the notion of collector (now more easily composable). The TEF and fuchsia collectors are now simpler and free of global state.

user_data is now a polymorphic variant to, for better ease of use. Metrics are an open sum type, and the previous int and float cases are just provided as constructors of this type. Dependencies on thread-local-storage and hmap are now entirely gone.

organization note

Note: the project has moved from my gh account (c-cube) to a dedicated organization ocaml-tracing for telemetry and tracing projects. Other projects such as opentelemetry have also migrated there.


  1. trace is more flexible than Runtime_events and works on OCaml 4, but of course it should be possible to have both interoperate! ↩︎

16 Likes

thanks @c-cube ! this is indeed will come in handy, more libraries adopt this, the more data we can collect. I will try use it in my project, a web framework hopefully it doesn’t affect the performance much.

Do you have an integration with EIO events? Will take a look more how to integrate the two EIO trace example.

Overall, thanks for creating this.

Thank you! Overhead for a span per HTTP request should really be fine, ultimately it depends which collector you install (no collector = basically 0 overhead).

There is no integration with Eio. I personally think Eio should use trace instead of hardcoding the runtime events, but I have very little hope of this actually happening :-).

Maybe the way to go will be to define a set of Runtime_events types that map nicely to the trace collector, and install a collector emitting these events. This way the event buffer would contain both sources of data.

let@ is an interesting let binding for the (@@)operator, that I have not come across before. Is there any chance you could desugar it so that I can more fully understand what this idiom is doing?

Edit: No matter, I have worked it out. Quite interesting.

Yes, it’s a fun idiom. It’s a bit similar to gleam’s use construct in a way. I’ve found it immensely useful for all sort of bracketed function calls (mutex enter/exit, span enter/exit, IO resource acquire/release, etc.).

1 Like