Using Format to define styled strings

I’ve been building a small styled string in order to make pretty display to terminal, markdown, and HTML.

type style =
  | Bold
  | Italics
  | Underline
  | FG of Color.t
  | BG of Color.t

type t =
  | String of string
  | Styled of style * t
  | Concat of t * t

Since defining the strings using constructors directly is tedious, I figured I could just use the Format.kfprintf function, using symbolic tags to mark the begin/end of styles. However, I’ve run into issues with building my formatter. The print_open_stag and print_close_stag functions are called out of sync with the formatter’s output function (which is only called after a flush, which closes all open tags…). I’ve found a workaround by using mark_open_stag which prints a particular string which is recognized by the output function, but that is a bit hackish:


let open_str = "\002\001open\001\002"
let close_str = "\002\001close\001\002"

let makef ?(margin=max_int) ?(max_indent=max_int - 1) fmt_str =
  let open Format in

  let buffer = ref (String "") in
  let style_stack = Stack.create () in

  let open_style string =
    let style = style_of_string string in
    Stack.push (!buffer, style) style_stack;
    buffer := String "" in
  let close_style () =
    let inner = !buffer in
    match Stack.pop style_stack with
    | (outer, style) -> buffer := Concat (outer, Styled (style, inner))
    | exception Stack.Empty ->
      raise (Invalid_argument "Styled_string.makef: too many closing styles") in

  let output str =
    (* recognize open/close style special strings *)
    if String.starts_with ~prefix:open_str str then
      let style = String.sub str (String.length open_str) (String.length str - String.length open_str) in
      open_style style
    else if str = close_str then close_style ()
    (* otherwise, simply concate the new string *)
    else buffer := Concat(!buffer, (String str)) in

  let fmt = make_formatter
    (fun str pos len -> output (String.sub str pos len))
    ignore in
  pp_set_mark_tags fmt true;
  pp_set_formatter_stag_functions fmt {
    mark_open_stag = (function String_tag s -> open_str ^ s | _ -> "");
    mark_close_stag = (function String_tag _ -> close_str | _ -> "");
    print_open_stag = ignore;
    print_close_stag = ignore;
  };
  pp_safe_set_geometry ~max_indent ~margin fmt;
  kfprintf (fun fmt ->
    pp_print_flush fmt ();
    while not (Stack.is_empty style_stack) do close_style () done;
    !buffer) fmt fmt_str

Is there any better way to do this? I really think the print_open_stag functions are more suited for this than the mark ones, but I could not get them working.

Note that I’ve found the format-doc library which allows building such symbolic format strings, and I could use that, but I’m also trying to learn a bit more about Format along the way and a library just for this seems overkill. Sadly, I couldn’t just copy the library’s technique since it appears to rely on the undocumented OCamlInternalFormat rather than the Format API.

1 Like

You may be interested in A `text` function with markup for styling · Issue #65 · dbuenzli/fmt · GitHub, where I discussed a few similar items.

In the end I settled on a similar approach to yours where I was using mark_*_stag. My reading of the documentation suggests this is the correct one, since I wouldn’t want the terminal escape sequences to be considered part of the length for e.g. line splitting. Why do you view it as hackish?

mark_*_stag works well when you want to insert characters to serve as markup/ansi sequence in a string. However, that’s not what is happening here: I use mark_*_stag to print a particular string, and then recognize that string in the output function to perform an action (and not add the string). It feels like this is what print should do: directly update the formatter state, and be separate from the actual content to be printed.

With this implementation: if the user inputs my close_str, the printing will break. In my case this is unlikely since I’ve chosen a really weird string, but still a bit hackish.

For a more general case, when printing with markup, the output function could be used to safely escape any markup in user text, so that the only markup comes from semantic tags. This needs to be inserted by print_*_stag, as mark_*_stag also goes through that output function.

For example, imagine a simple markdown printer:

print_md "@{<italic>x = a*b}"

If implemented with the mark_*_stag method, this will render incorrectly because the * in the text conflict with the * inserted for formatting. If we want our output function to escape the * in the text, then the formatting * must be inserted by another method, otherwise it will be escaped as well.

Ah, I see. Is this because you don’t want to write different stag implementations for the different output formats? Otherwise, it seems like your concerns would be partially alleviated by having the stag output the desired output format directly, though I’m not sure about escaping in that context

Actually the full use case is a bit more complex than that. I want to print tables, where cells can be individually formatted. This requires computing columns width, which can only be done once the full table is complete. Thus I can’t insert formatting directly in my cells, as that would mess up the width, hence the symbolic representation, which preserves width information and enables rendering at a later date.

This gets even more complicated when rendering to, say ANSI characters. I want to be able to alternate background colors between table rows individually. But that means that, if a cell sets a background color, it can’t unset it without knowing the row’s background color. Here also the symbolic representation and late rendering helps.

I guess my real question here is why do mark and print behave so differently?

For a minimal example, consider:

open Format

let using_mark_stag fmt_str =
  let fmt = make_formatter
    (fun str pos len -> printf "output: %s@." (String.sub str pos len))
    ignore in
  pp_set_mark_tags fmt true;
  pp_set_formatter_stag_functions fmt {
    print_open_stag = (fun _ -> ());
    print_close_stag = (fun _ -> ());
    mark_open_stag = (function (String_tag s) -> "open tag: " ^ s | _ -> "unknown tag");
    mark_close_stag = (function (String_tag s) -> "close tag: " ^ s | _ -> "unknown tag")
  };
  kfprintf (fun fmt -> pp_print_flush fmt ()) fmt fmt_str

let using_print_stag fmt_str =
  let fmt = make_formatter
    (fun str pos len -> printf "output: %s@." (String.sub str pos len))
    ignore in
  pp_set_print_tags fmt true;
  pp_set_formatter_stag_functions fmt {
    print_open_stag = (function (String_tag s) -> printf "open tag: %s@." s | _ -> printf "unknown tag");
    print_close_stag = (function (String_tag s) -> printf "close tag: %s@." s | _ -> printf "unknown tag");
    mark_open_stag = (fun _ -> "");
    mark_close_stag = (fun _ -> "")
  };
  kfprintf (fun fmt -> pp_print_flush fmt ()) fmt fmt_str

Calling both these functions gives very different results. Specifically, my problem is that with mark_stags, the tags are placed correctly in the text, whereas with print_stag, they are not:

# using_mark_stag "hello @{<bold>world@}!";;
output: hello 
output: open tag: bold
output: world
output: close tag: bold
output: !
- : unit = ()
# using_print_stag "hello @{<bold>world@}!";;
open tag: bold
close tag: bold
output: hello 
output: world
output: !
- : unit = ()

I believe that is because you’re mixing formatters. if you make the print_ functions close over fmt you get the expected ordering:

let using_print_stag fmt_str =
  let fmt = make_formatter
    (fun str pos len -> printf "output: %s@." (String.sub str pos len))
    ignore in
  pp_set_print_tags fmt true;
  pp_set_formatter_stag_functions fmt {
    print_open_stag = (function (String_tag s) -> fprintf fmt "open tag: %s@." s | _ -> printf "unknown tag");
    print_close_stag = (function (String_tag s) -> fprintf fmt "close tag: %s@." s | _ -> printf "unknown tag");
    mark_open_stag = (fun _ -> "");
    mark_close_stag = (fun _ -> "")
  };
  kfprintf (fun fmt -> pp_print_flush fmt ()) fmt fmt_str;;
using_print_stag "hello @{<bold>world@}!";;

output: hello 
output: open tag: 
output: bold
output: close tag: 
output: bold
output: 

output: 

output: world
output: !
- : unit = ()

Yes, but that is what I want. I have an my own buffer to store the style stack. I want the print_stagand output functions to update that buffer separately, but in order. I do not want the print_stag function to go through the output function.

Unfortunately, the formatter has its own internal buffer, which it uses to buffer calls to output but not to print_stag, which means these are called out of order. I do not seem to have a way to flush that internal buffer in the print_stag functions, since pp_print_flush does a lot more than that: it also closes all boxes and tags.

Format has more than a buffer, it has a queue of items waiting for a decision on a dominating break hint, thus it is not straightforward to flush the queue without promising that there would be no more contents on the stream of command.

For your use case, my impression is that your root issue is that you trying to lower the stream of Format command to a format richer than the one supported by Format’ out_* functions.

In this case, your solution of adding your own escape sequence in the out_string input seems hackish but reasonable to me. In particular, you could check that the input format string doesn’t contain your escape sequence.

However, from your description I am not sure if you are using the formatting engine part of Format? If this is not the case, it could be simpler to create a Printf interpreter which adds support for tags by creating your own version of CamlinternalFormat.output_acc.

(The CamlinternalFormat* modules are mostly internals because they are not guaranteed to be as stable as the rest of the standard library stable nor documented, there are fine to use in my opinion.)

P.S: your makef function are buggy because max_int is not a valid parameter for margin and max_indent, you should use Format.pp_infinity.