Generation of OCaml code

I’m working on router in which I would describe routes using S-expressions (sexplib), for example see the screenshot below. What is the best way to translate S-expressions in OCaml while building (I use Dune/Jbuilder)?

IMPORTANT!: My question is not about how to do this with Dune/Jbuilder, but about generation of OCaml code itself.

Beyond the basic printf based generator, a simple solution might be to use the compiler-libs library to create a small custom compiler (or you could just generate OCaml ast) and then add the corresponding rules and dependency to dune.

2 Likes

compiler-libs looks interesting to me, but I suggest it would be harder to use it than the variant with string concatenation or something like that.

Why not simply write a code generator that parse the sexp and saves it to a file as an ocaml software? Then in dune you can define a rule target for the output file that calls this executable (that itself could be defined as a target executable). See for example: https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/jbuild

Indeed, generating ast (with compiler-libs + ppx_metaquot) has a higher entry cost that simply concatenating strings, but it can be more reliable and maintainable in the long term.

1 Like

This is exactly what I want to do (see IMPORTANT section in the bottom of my question). My question was about code generation itself. I thought about simple string concatenation and was asking about alternatives to that approach.

@octachron has recommended to take a look at compiler-lib which looks very intriguing to me but, as for an OCaml newbie, it looks like substantially more work. I think I will use stupid simple string concatenation for the first version but I’ll keep in mind compiler-lib for refactoring in future.

If generating OCaml code without location directives pointing to the source is acceptable to you, you can use the approach that we use in atdgen. Instead of writing just printfs and trying to keep track of the indentation, we produce a tree of lines of code.

You can define your own, it’s straightforward:

type t =
| Line of string        (** single line (not indented) **)
| Block of t list       (** indented sequence **)
| Inline of t list      (** in-line sequence (not indented) **)

The functions that generate OCaml code return a t list rather than writing to a buffer. Here’s a random sample:

let l =
  insert (Line "Bi_outbuf.add_char ob ',';") (Array.to_list a)
in
let op, cl =
  if p.std then '[', ']'
  else '(', ')'
in
[
  Block [
    Line (sprintf "Bi_outbuf.add_char ob %C;" op);
    Inline l;
    Line (sprintf "Bi_outbuf.add_char ob %C;" cl);
  ]
]
3 Likes

It seems quite similar to Format’s formatting boxes: blocks sound like @[<hv 2>...@] (or @[<v 2>...@] ?), and Inline might be <hov> (or <h>?).

1 Like

Yes, except that the solution I’m proposing here produces subpar code (long lines, wasted whitespace). The goal here is to increase the readability of the generator’s source code, at the expense of reducing the readability of the generated code. It’s a worthy trade-off when the generated code is boilerplate that rarely has tricky bugs. Also, Format can be slow on some pathological input. I don’t know if it’s typical on generated code, but it’s nice to know it won’t happen.

For higher-quality output, we have easy-format, which is a functional wrapper around the Format module. Even though I’m the original author, I find it too cumbersome and not worth the effort if the user isn’t going to read the generated code.

2 Likes

There is also https://github.com/stedolan/malfunction that may be of interest.

1 Like

I wouldn’t worry about indentation so much. Why not use a tool such as ocp-indent, either when you want to read the file or as the last stage of your generation process (so that location directives are correct)?

I’m sure ocp-indent is suitable in some cases but AFAIK it won’t break or merge lines for you. e.g.

let x =
  f a
  || f b

won’t be turned into

let x = f a || f b
1 Like

You could use ocamlformat for that.

1 Like