Refactoring tools for OCaml? Type-based refactoring?


What are the current tools to help refactor OCaml code?

I have a very heavy refactoring where I previously defined an AST like
type expr =
| Int of int
| String of string
| Plus of expr * expr

and I need to add some extra information at each node, like
type expr = { e: expr_kind; id: int }
and expr_kind =
| Int of int
| String of string
| Plus of expr * expr

That means old code like
match e with
| Int i → …
must become
match e.e with
| Int i → …

and each time I return an expression, I must now encapsulate it with a constructor function.
For example
match stuff with
| Whatever → Int 2

must now become
match stuff with
| Whatever → mke (Int 2)

where let mke ekind = { e = ekind; id = gensym() }

There are lots of files involved, so I’d rather use a tool that can help automate some of it.

Right now the only solution I see is to abuse the LSP server to identify all
the places programmatically and combine that with some emacs kungfu (macros)
to help automate most of it (and then rely on the typechecker to manually tune things).

There’s the rotor tool, which was demoed at last year’s OCaml workshop and looked very promising (API migration: compare transformed (OCaml 2020) - ICFP 2020).

However, it only supported old versions of OCaml when I tried it, and there doesn’t seem to have been any recent activity.

It looks mostly restricted to renaming though? Could it perform the refactoring
on ASTs I presented above?

I know of nothing that you could use. Internally at Jane Street, @ceastlund has built some internal tools that do what you suggest: basically run a build, observe build errors, and apply transformations based on the returned errors. But I don’t know of any generally available versions of said tools.

As others have said, I don’t think there is anything publicly available for this kind of thing. When we have had to make large-scale refactorings like this at LexiFi, we have sometimes written ad-hoc automatic tools to help us. It is not as hard as one would think :slight_smile:

The easiest case is when the transformation can be phrased in purely syntactic terms. In this case, you can write a tool using compiler-libs that parses each file, obtaining the corresponding Parsetree. Then you perform two passes. First, then you walk through the Parsetree, keeping track of the the places where you need to change something: in you situation, it would be the match scrutinee and the final expression of each match case. Second, you do a “rewriting” pass where you use the information that you obtained in the previous pass to textually insert the new code fragment in the original source files.

If the transformation cannot be phrased in purely syntactic terms, then you need to combine it with reading the .cmt files which contain the type-annotated Typedtree in order to find the places that need rewriting, and it is a bit more involved, but the overall logic remains the same.

Of course, it is so easy to do this kind of refactoring “by hand” just by following the compiler errors that the investment of writing this kind of tools is only worth it if you have a large codebase to refactor. It was worth it for us at LexiFi (~600k LOC), but for more reasonably-sized codebases the tradeoff may be less clear.


1 Like

Even a tool to detect and remove dead code from sources would be useful.
I.e. you want to release some code from a prototype: everything not reachable
from main() should be removed.
For the mental sanity of people who will try to read the code later…

For global dead code elimination (but note this is not exactly the same as “everything not reachable from main() should be removed”), I know of