Using ppx_deriving.show for trace messages

I just started using ppx_deriving.show. It adds newlines to (edit: embeds them in the content of) the output of both pp and show which is appropriate for pretty printing. But what about tracing? Normally trace messages don’t have newlines so that grep’ing the trace file prints the entire event.

I could write separate pp functions for tracing, but in that case I might as well not use ppx_deriving. I could replace the newlines with spaces in the string that show returns and print it, but that’s not an efficient way to print traces.

Has anyone else run into this? I didn’t see anything in the ppx_deriving issues.

Can you share an example? I don’t recall this as a problem with the show deriver, and it seems unlikely, since how could it compose (that is, the pp function for type a * b is composed of the function for each of a and b).

Sure, I just saw it. Let me distill it a little.

utop # #require "ppx_deriving.show" ;;
utop # type t = {dialog_idx: int; reply_idx: int; reply: string list; n_circular: int} [@@deriving show] ;;
type t = {
  dialog_idx : int;
  reply_idx : int;
  reply : string list;
  n_circular : int;
}
val pp : Format.formatter -> t -> unit = <fun>
val show : t -> string = <fun>
utop # show {dialog_idx = 1; reply_idx = 2; n_circular = 3; reply =  ["one"; "two"; "three"]} ;;
- : string =
"{ dialog_idx = 1; reply_idx = 2; reply = [\"one\"; \"two\"; \"three\"];\n  n_circular = 3 }"

The newline is after the list.

now do:

Fmt.(str "%a" pp [value of type t here])

[I think that’s how it’s done – I’m not going to test it out.]

That is to say, there’s a difference between “show” and “pp” in that “show” intended for use at the toplevel, where “pp” is intended for composition. I rarely-if-ever use show.

pp places the newline in the same place as show, so I naively assumed that the same code is used for both. But perhaps they’re different in some cases.

utop # asprintf "%a" pp {dialog_idx = 1; reply_idx = 2; n_circular = 3; reply =  ["one"; "two"; "three"]} ;;
- : string =
"{ dialog_idx = 1; reply_idx = 2; reply = [\"one\"; \"two\"; \"three\"];\n  n_circular = 3 }"

Edit: they’re the same.

In case this is what you’re going to ask next:

utop # printf "%a\n" pp r ;;
{ dialog_idx = 1; reply_idx = 2; reply = ["one"; "two"; "three"];
  n_circular = 3 }
- : unit = ()

OH, NOW I SEE! You meant newlines in the interior of the printout. Yes, that’s normal and expected. And truthfully, why would you do otherwise? We all had the same problem in Java, back in the day. And removing those newlines meant an unholy hell of trying to reinfer them during log-analysis.

Let’s back up: your problem here is that you have a byte-stream, and you want to put records into it for later demarcated parsing. As Marshall Rose said oh-so-long-ago, there are three ways of doing this:

  1. length-count (prepend a length-count)
  2. sentinel (a special character at the end of each record) – this is your “let’s use newline as the sentinel”
  3. byte-stuffing (convert that sentinel into something else when it occurs inside the record)

[it’s been a long, long time, so I might be misremembering]

So when I wrote a high-performance logging framework for Java, I chose (went back and looked at the code) length-count (#1). But I’ve used the sentinel approach – only, not “\n”, but rather “%|%” which never shows up in Java log output, for more “dash it off one-offs”.

Log-messages are also meant to be processed by humans, which is why removing newlines and whitespace is actually not-very-helpful.

I understand your point, but I disagree. Grepping logs is common and doesn’t give you the entire matching event if newlines are embedded. This is from my experience in debugging production systems. But it is fine for us to disagree. :slight_smile:

Oh, I understand that you need to grep things. That’s what Perl scripts are for, eh? A Perl script like:

$/ = "%|%" ;
while (<>) {
  print $_ if $_ =~ /foo bar buzz/s;
}

and of course, it’s trivial to make a version that takes a cmdline pattern. And compiles that pattern into a thunk before the while-loop.

Again going back to Java, stack-tracebacks are common. When I build large systems in OCaml they’re common also. So grepping isn’t very useful: I don’t want the line containing the pattern I was searching for, but rather the log-record. And as I showed above, it’s pretty trivial to write a script to grep on log-records, not lines.

You cann also put out “\0” (null) as a sentinel/terninator; then you can use “grep -z” that treats the null as the line-ending character. I never did that, b/c I found it ugly; I thought “%|%” was much better, b/c it stayed in ASCII.

Thanks for all this info, it is useful. But really my desire is to get rid of the newlines (or replace them with something) when using deriving.show. I realized just now that I have been accustomed to newline-free output when using Rust’s derived Debug format, so that is part of why I was surprised. I think my mistake is in not realizing that deriving.show is all about pretty-printing, not debug output, including the show function. And that not everyone (maybe no one else) wants newline-free output. :slight_smile:

Oh, lots of people want newline-free output. But I think once you start seeing large things getting pretty-printed (e.g. stack tracebacks), if it’s newline-free, your eyes roll back in their sockets, and if that happens a few too many times, it really, really hurts. So you learn to like newlines and whitespace-preservation.

What we did (in Java production systems) is to put everything about an error except for the stack trace in a single line, followed by a stack trace with newlines as usual. But of course that’s not the only option as you point out. It’s just what I am used to doing.

I thought I should add that, no, pp/show are definitely for debug, not pretty-printing. No end-user wants to see OCaml types. So a pattern you’ll see is that there are functions pp, and also functions pp_hum. With the latter as the “pretty” printing functions. This realy is a difference of opinion about newlines in debug output. And you’re not alone, as I said before.

1 Like

To answer the original question, the interpretation of new lines can be tuned in the formatter when using Format. For instance, we can set a low level device that outputs '↵` on newlines with

let set_escaped_newline ppf =
  let low_level_device = Format.pp_get_formatter_out_functions ppf () in
  let out_newline () = low_level_device.out_string "↵" 0 3  in
  let low_level_device = { low_level_device with out_newline } in
  Format.pp_set_formatter_out_functions ppf low_level_device

If we set up a version of sprintf that uses this kind of low_level device

let sprintf fmt =
  let b = Buffer.create 20 in
  let ppf = Format.formatter_of_buffer b in
  let () = set_escaped_newline ppf in
  Format.kfprintf
    (fun ppf -> Format.pp_print_flush ppf (); Buffer.contents b)
    ppf fmt

then

sprintf "@[<v>%a@]" Fmt.Dump.(list Fmt.int) (List.init 25 Fun.id)

returns

"[0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; ↵ 20; 21; 22; 23; 24]"
4 Likes

Wow, thank you!! That will give me what I need.