How can we simplify walks through AST?

Hello,

I’m writing a compiler in OCaml, and have finished implementing lexer, parser, type inference, for a small subset of my language.

Currently I’m implementing a borrow checker like Rust.

Unlike other languages which choose to lower their AST to a simpler form and then do semantic analysis on them, I’m currently doing all analysis on the raw AST, because I want to compile down to C. Not Assembly.

As of now, I have been successful, as OCaml allows us to write deeply nested mutually recursive code easily, however this has caused my design to be a monolithic blob of code.

For example, the type inference itself is [514 lines of recursive code (Github Link)]

While I’m okay with this for the type inference, this design has become somewhat prohibitive for more advanced semantic analysis, such as borrow checking, because this monolithic design,

  • Forces me to do all tests in a single go, and I can not run smaller passes through the AST.
  • Even if I want to implement smaller passes in future, I can not do it, because this design would prohibit it.

What can I do?

I do not want to rewrite the “walks” through the AST. I’m looking for a “visitors” pattern solution that will help me simplify my design.

I tried using the visitors library since it seemed promising, but since my AST is somewhat complicated, it does not work “out of the box”, as they say.

I’m currently looking at ppx_deriving library to see if I can use its simpler iter, fold methods to write a custom visitor for my AST.

Do you know of any other library or tutorial that I can use to write my own custom visitors?

Or any other sources that teaches how to simplify walks through ASTs in OCaml?

Thank You
~oxcrow

Did you consider some libraries for visitors pattern?
GT supports polymorphic variants
visitors is somewhat faster

A common trick to address the mess of mutually recursive functions (let rec type_fun ... and type_instr ... and type_expr ... and ....) is to use open recursion (type_fun takes type_instr and type_expr as argument and similarly for the rest). You then have one module outside these where you tie the knot with a let rec …. and. If the number of function makes it impractical, using open recursion at the level of modules is also an option (these become functors) and you tie the knot with a recursive module. You may need to rework your code so that one of your module on the cycle is safe. I like this approach since, locally, each function looks the way it should when writing functional code, with pattern matching and “recursive” calls, just each function in its own file.

3 Likes

I think if you truly want to visit every node (like when compiling), the visitor pattern is not that useful. Its use is more if you want to do something for every expression in your AST, but you don’t care about the rest of the nodes at all.

1 Like

I’m sorry to be unable to give you good answers to your questions. But I thought I should at least pass along the pointers that others gave me, when I asked similar questions. They pointed me at two things:

(1) in Scheme, there is a somewhat well-developed “nanopass” methodology for doing compilers as many small passes

(2) in Haskell, there’s “Scrap Your Boilerplate”

Both of these are attempts to get rid of the rote visitor boilerplate, so you can focus on the actual AST -work-. I’m not going to pretend I understood them well: I’m …. no longer “in the biz”, so when stuff gets tiresome, I just abandon it. So I didn’t get very deep into this stuff.

But it might be a place to go looking for ideas.

1 Like