Spooky control flow with Format.kasprintf & friends

cemerick · May 1, 2023, 2:32pm

Recently I’ve needed to update our handling of logs to emit “structured” log messages, i.e. JSON-formatted objects with one field being the primary string log message. I’m using @dbuenzli’s logs library, so I looked to write a log reporter that assembled a Yojson value containing (in part) the formatted log message that would otherwise just be rendered mostly as-is on stderr or whatever.

Along the way, I ran into what looks to me like some truly spooky behaviour. I thought at first I must have been hitting a bug in logs, but my minimal repro appears to demonstrate that the problem might actually be either in Format or perhaps even the compiler.

let indirect =
  { Logs.report =
      (fun src level ~over k msgf ->
        msgf
        @@ fun ?header ?tags fmt ->
        let msg = ref "INIT" in
        let k =
          Format.kasprintf
            (fun s ->
(* A *)       print_endline @@ "in k, got msg: " ^ s; 
              msg := s;
              k ())
            fmt
        in
        (* in real usage, the log messages aren't going to stdout/stderr *)
(* B *) print_endline @@ "Log message: " ^ !msg;
        over ();
        k)
  }

let () =
  Logs.set_level ~all:true @@ Some Logs.Debug;
  Logs.set_reporter indirect;
  Logs.info (fun m -> m "No format annotations");
  print_endline "------------------------";
  Logs.info (fun m -> m "this won't %s" "print")

It seems obvious that the print on line *A* should always, always happen before the print on line *B*. (I’m eschewing Format usage in this example outside of the actual log handling just to avoid complicating the interaction with that module in any way.) Unfortunately, that’s not the case:

$ ocamlc -I `ocamlfind query logs` logs.cma  -o loggingtest loggingtest.ml && ./loggingtest
in k, got msg: No format annotations
Log message: No format annotations
------------------------
Log message: INIT
in k, got msg: this won't print

As you can see, the formatted message that includes any sort of annotation provokes some kind of bizarre out-of-order execution that results in the actual captured log message string effectively going missing.

FWIW:

I’ve tried variations of this that use e.g. kfprintf with both the provided str_formatter and a Buffer-backed formatter; though spooky actions happen in those scenarios as well, I think the above example demonstrates the problem most tangibly (using the k fun to either alter the ref or not before the later log message “output” @ *B*)
Outcomes are the same with OCaml 4.14 or 5.0 (I’m currently using 5 at the moment, but thought to check to see if the problem was somehow domain-related)

What in the world is going on here?

sim642 · May 1, 2023, 2:56pm

Why is that?
To me the obvious thing would be the other way around because let k, in which *A* is, only redefines the continuation but normally does not apply it (because it only gives the format argument and none of the actual values). Then *B* executes and actually prints something. Finally the returned k gets applied somewhere with the actual values.

I suspect the argumentless case is the odd one out here since it doesn’t require any extra kasprintf arguments after the format, so the entire printing is done and its continuation function is called immediately, giving unit. Only in that case is *A* printed before *B*.

nojb · May 1, 2023, 2:59pm

This is not the case. In fact:

does not cause the invocation of the first argument (fun s -> ...) in general. That function will be called only once all the arguments needed by the formatting string fmt have been passed.

Cheers,
Nicolas

cemerick · May 1, 2023, 3:03pm

I’m happy to be shown to not understand the semantics of Format.k*printf, but surely execution order of the program shouldn’t change depending on the format strings used?

cemerick · May 1, 2023, 3:04pm

Except all arguments needed by the formatting string are provided:

Logs.info (fun m -> m "this won't %s" "print")

dbuenzli · May 1, 2023, 3:12pm

You need to flush the pretty printer. See the “@.” here (it’s also a good idea to wrap the whole thing in a box).

cemerick · May 1, 2023, 3:15pm

Here’s an example that uses kfprintf:

let indirect =
  { Logs.report =
      (fun src level ~over k msgf ->
        msgf
        @@ fun ?header ?tags fmt ->
        let k = Format.(kfprintf (fun _ -> k ()) str_formatter fmt) in
        let msg = Format.flush_str_formatter () in
        (* in real usage, the log messages aren't going to stdout/stderr *)
        print_endline @@ "Log message: " ^ msg;
        over ();
        k)
  }

let () =
  Logs.set_level ~all:true @@ Some Logs.Debug;
  Logs.set_reporter indirect;
  Logs.info (fun m -> m "No format annotations");
  for i = 1 to 5 do
    Logs.info (fun m -> m "message %d" i)
  done;
  Logs.info (fun m -> m "All done")

Outputs:

$ ocamlc -I `ocamlfind query logs` logs.cma  -o loggingtest src/exe/test.ml && ./loggingtest
Log message: No format annotations
Log message: 
Log message: message 1
Log message: message 2
Log message: message 3
Log message: message 4
Log message: message 5All done

cemerick · May 1, 2023, 3:16pm

Unfortunately, no. Updating the example to add a newline/flush ((fmt ^^ "@.") going into kasprintf) yields the same result. (Notably, kasprintf already includes an implicit flush in the k that it hands to make_printf, but it never gets invoked…maybe due to a lack of a End_of_format in the compiled format?)

dbuenzli · May 1, 2023, 3:18pm

These bits need to be in the continuation see the example I linked to. The Format.kasprintf should be the last thing you call in the reporter.

cemerick · May 1, 2023, 3:24pm

Perhaps that’s the case in order to satisfy logs’ semantics re: k (in which case emitting JSON logs may well not be practical with the logs library, since one needs to obtain the concrete log message string before constructing and emitting the JSON object?), but I don’t think it’s relevant to whether Format works consistently or not. Note the equivalent odd behaviour of the kfprintf example.

dbuenzli · May 1, 2023, 3:34pm

Your control flow is botched, you are calling flush_str_formatter before the format occurs.

I’m not compiling the stuff but basically you want to write something like:

let buf = Buffer.create 255 
let myppf = Format.formatter_of_buffer buf

let report src level ~over k msgf =
  let k _ =
    let msg = Buffer.contents buf in 
    print_endline ("Log message: " ^ msg);
    Buffer.reset buf; over (); k ()
  in 
  msgf @@ fun ?header ?tags fmt ->
  Format.kfprintf k myppf ("@[" ^^ fmt ^^ "@]@.")

cemerick · May 1, 2023, 3:39pm

Well, I’d honestly never considered doing all of my “extra” work inside the continuation provided to e.g. kfprintf. Putting a pin in that for the moment though, how in the world is the flush in this snippet happening before formatting?

let k = Format.(kfprintf (fun _ -> k ()) str_formatter fmt) in
let msg = Format.flush_str_formatter () in ...

SkySkimmer · May 1, 2023, 3:40pm

Here is a simplified code with only stdlib dependencies which has the same result:

type ('a, 'b) msgf = (('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a) -> 'b

type reporter = { report : 'a. ('a, unit) msgf -> unit; }

let the_reporter =
  { report =
      (fun msgf ->
        msgf
        @@ fun fmt ->
        let msg = ref "INIT" in
        let k =
          Format.kasprintf
            (fun s ->
(* A *)       print_endline @@ "in k, got msg: " ^ s;
              msg := s;)
            fmt
        in
(* B *) print_endline @@ "Log message: " ^ !msg;
        k)
  }

type 'a log = ('a, unit) msgf -> unit

let info : 'a. 'a log = fun msgf -> the_reporter.report msgf

let () =
  info (fun m -> m "No format annotations");
  print_endline "------------------------";
  info (fun m -> m "this won't %s" "print")

Now let’s evaluate info (fun m -> m "this won't %s" "print") step by step:

info (fun m -> m "this won't %s" "print")

the_report.report (fun m -> m "this won't %s" "print")

(fun m -> m "this won't %s" "print") @@ fun fmt -> 
  let msg = ref "INIT" in
  let k =
    Format.kasprintf
      (fun s ->
         print_endline @@ "in k, got msg: " ^ s;
         msg := s;)
      fmt
  in
  print_endline @@ "Log message: " ^ !msg;
  k


(  let msg = ref "INIT" in
  let k =
    Format.kasprintf
      (fun s ->
         print_endline @@ "in k, got msg: " ^ s;
         msg := s;)
      "this won't %s"
  in
  print_endline @@ "Log message: " ^ !msg;
  k
) "print"


msg is ref INIT
---------------
( let k : string -> unit =
    Format.kasprintf
      (fun s ->
         print_endline @@ "in k, got msg: " ^ s;
         msg := s;)
      "this won't %s"
  in
  print_endline @@ "Log message: " ^ !msg;
  k
) "print"

The next step is to bind k to the partially applied function, then print Log message: INIT, then apply k to "print"

cemerick · May 1, 2023, 3:46pm

So I just lifted all my JSON-specific bits in my real app code into the continuation provided to kfprintf, and no longer observe any blank formatted messages or out-of-order execution.

That’s nice, though very unsettling, as I obviously understand Format even less than I thought, and so much so that what looks like obviously sequential code…sometimes isn’t?

dbuenzli · May 1, 2023, 3:46pm

cemerick:

Putting a pin in that for the moment though, how in the world is the flush in this snippet happening before formatting?
let k = Format.(kfprintf (fun _ -> k ()) str_formatter fmt) in
let msg = Format.flush_str_formatter () in ...

Your k value is a function here (that you return at the end of the function) it’s not being called. Then Format.flush_str_formatter () is called before you return k and thus before it gets called.

Btw. happy about hearing your thoughts on structured logging and Logs in Implement support for Message Templates · Issue #47 · dbuenzli/logs · GitHub

cemerick · May 1, 2023, 4:05pm

Just to clarify the different “k”'s, let’s work off of this:

let kv = Format.(kfprintf (fun _ -> k ()) str_formatter fmt) in ...

k is provided by the logs library, and its return value is intended to be opaque, there to support flexible lwt compat. It’s definitely always called in all of the examples I’ve provided; you’re not talking about invoking kv, are you?

Looking at Format.make_printf, the continuation passed there (an anon closure in my examples, not the logs-provided k) should always be called before kfprintf returns, presuming there’s an End_of_format signal in the format string. I simply don’t understand how any mechanism could cause the format continuation to be called after kfprintf exits.

Yeah, I saw that before embarking on this particular enhancement. Once I put this to bed (leaving myself either unsettled re: the semantics or not, I guess), I’ll register some thoughts on that issue.

cemerick · May 1, 2023, 4:11pm

Whew, I just got it. I’ll summarize for posterity/onlookers in a minute or two.

dbuenzli · May 1, 2023, 4:12pm

I am. The kv you built here is a closure, more precisely a function that is still expecting the arguments fmt needs.

Once these arguments will be supplied the whole kfprintf will be invoked to do the formatting and only then the (fun _ -> k ()) function you provided as a continuation to kfprintf will be invoked. In our example the actual formatting done by kv definitively happens after the flush_str_formatter (), when the actual arguments to format according to fmt have been provided.

cemerick · May 1, 2023, 7:03pm

This is one of those misunderstandings that, when resolved, results in an instant, Richter-scale-shattering facepalm. It’s difficult and honestly a little painful now to do a post-mortem of what that misunderstanding was, but I’ll try here.

(Thanks to @dbuenzli for his patient replies, and @nojb, @sim642, and @SkySkimmer, who were trying to tell me what was wrong, but I just wasn’t hearing them.)

The signature of logs’ ('a, 'b) msgf type is

('a, Stdlib.Format.formatter, unit, 'b) Stdlib.format4 -> 'a) -> 'b

(I’ll be ignoring the optional ?tags and ?header arguments throughout here). At no point did I think that 'b here would or could be a function; in fact, I thought it could only ever be unit or unit Lwt.t, just based on the expected return types of Logs.msg and Logs_lwt.msg. Thus, even given simple usage of logs, e.g.

Logs.info (fun m -> m "message %d" i)

I thought m (the msgf here) was strictly accepting only a single argument. With that in mind, I assumed that there was some compiler magic that would take up a format string and any subsequent arguments and bundle them into a format4 in a way similar to e.g. dprintf "message %d" i, basically a format-specific fexpr. (This looks like a pretty perilous jump of reasoning in hindsight, but in my defense, there is a lot of compiler magic around format strings, and this “implicit dprintf-ing” of the format string + arguments seems plausible and reasonable, if one misunderstands that the 'b in ('a, 'b) msgf is a () of some sort.)

Alas, is isn’t how it works; 'b will be a function corresponding to types of values required by the annotations in the provided format string. This is manifest if one pokes around at the type of m in these examples via e.g. merlin or vscode:

Logs.info (fun m -> m "No format annotations");

(unit, Format.formatter, unit, unit) format4 ->
unit

and

Logs.info (fun m -> m "message %d" i)

(int -> string -> unit, Format.formatter, unit, unit) format4 ->
int ->
string ->
unit

I remember looking at these types in vscode at an earlier stage in my trying to puzzle out what was wrong with my code, and deciding that vscode/merlin was broken in this instance; recall my premise that 'b was always some sort of unit, and even so, in that moment, I didn’t think it was possible for a function callee (m in the second example) to develop a different arity based only on the inferred requirements of a format string argument. I’m not sure how I had such a strong sense on the latter point, when such a constraint would make all of the Format.*printf functions impossible despite my using them constantly, but again, alas.

Speaking to intuition about these things, I think it’s helpful to think of any function that accepts a format string argument as being similar to a function from a functor application, where the functor’s argument (and types provided, etc) is defined by the format string. Likewise, it’s helpful to remind one’s self from time to time that, thanks to how OCaml does currying, these definitions are (almost always) equivalent from a caller’s perspective:

let f a = fun b -> a + b
let g a b = a + b

I hope bystanders find my explication of my foundering helpful.

dbuenzli · May 1, 2023, 7:17pm

Note, this used to be. Nowadays all the compiler magic is gone in favour of a GADT.

Gabriel once wrote about the format6 parameters which should make everything clear since format4 is just:

type ('a, 'b, 'c, 'd) format4 = ('a, 'b, 'c, 'c, 'c, 'd) format6

Topic		Replies	Views
Format.kprintf usage Learning logs	17	3605	March 31, 2023
Easy_logging 0.2 Community announce	7	2264	July 14, 2021
Using Format/Fmt/Lwt_fmt to write on several formatters? Learning	7	805	August 23, 2019
Easiest way to convert Logs.Tag.t values into strings? Learning	6	718	September 28, 2020
String templating and internationalization packages for ocaml Ecosystem	6	387	February 19, 2025

Spooky control flow with Format.kasprintf & friends

Related topics