Producing and using typed ASTs for all source files in a project

I’m trying to produce typed ASTs for all source files in an OCaml project (e.g. a library or git repo). The goal is to inspect these typed ASTs to learn how OCaml modules/libraries are used throughout a project. What I’d like to see:

  • Map of module functions/methods associated with number of applications throughout each source file.
  • Line numbers of each function/method application.

I tried several approaches and failed.

1 Write a simple parsing program using compiler-libs

Initially, I thought to use the functions in compiler-libs.common to parse the source files and produce typed ASTs. The parsing program’s source:

let lexbuf = Lexing.from_channel @@ open_in "test.ml"
let impl = Parse.implementation lexbuf
let typed_ast =
  Typemod.type_toplevel_phrase
  Env.initial_safe_string
  impl

Parses a simple example program test.ml:

let () =
  List.iter print_endline ["a";"b";"c";]

This fails with an exception:

Exception:
Typetexp.Error
 ({Location.loc_start =
    {Lexing.pos_fname = ""; pos_lnum = 2; pos_bol = 9; pos_cnum = 11};
   loc_end = {Lexing.pos_fname = ""; pos_lnum = 2; pos_bol = 9; pos_cnum = 20};
   loc_ghost = false},
 <abstr>, Typetexp.Unbound_module (Longident.Lident "List")).

I’m assuming this is because the given (empty) Env doesn’t contain Stdlib or any other libraries/implementations. I couldn’t find how to create/populate the Env value with module locations, etc.

Ultimately it doesn’t matter because it highlights that the parsing program will need to know about any/all the implementations and libraries used by a particular project before it can be useful. This likely means inspecting dune and/or .merlin files and installing + loading some packages/libs

2 Use annotation files

I then saw that ocamlc and ocamlopt support the flags:

  • -dparsetree - Prints each file AST only to stdout - I’d have to use the rather ugly approach of sending stdout to a file and re-ingesting from the file.
  • -annot - Produces the .annot file along with other files during compilation. This file appears to contain the data I need. If I could generate a .annot file for each source file in the project during the dune build, I could parse them all use the data therein.

Unfortunately dune doesn’t seem to support using the -annot compiler flag (it doesn’t error but it also doesn’t produce any .annot files). Maybe someone can shed light on whether I should raise this as an issue?

Using ocamlbuild alone would take some effort as I’d need to append the correct packages from the dune file with -pkgs. Let’s see what people say about dune’s -annot support before I consider this approach.

3 Use merlin

After some further thoughts and investigation, I figured merlin must already be parsing entire projects of source and constructing typed ASTs for the various tasks it performs.

ocamlmerlin doesn’t appear to have a command that outputs the full AST. The description of ocamlmerlin -server outline seemed hopeful but it doesn’t output anything useful for my example.

I finally spent some time reading the merlin source to understand how merlin itself works internally. I hoped to find a point where I could diverge to my own logic that can use the typed AST. I didn’t find that point so here I am.

You may read the .cmt files with the module Cmt_format, that is shipped with compiler-libs (for example, by calling Cmt_format.read filename). The .cmt files contain the typed tree (in the cmt_annot field from cmt_infos structures) and subsume the .annot files and this is where the informations processed by merlin come from.

2 Likes

Thanks, I was able to use Cmt_format to find the info I need.

Traversing the entire tree recursively in search of all function applications (i.e. Texp_apply) is quite laborious. Is there a library that can simplify this traversal for me (e.g. Given a Typedtree.structure, return all expressions of type Texp_apply)?

Unfortunately I still have the same issue with dune not respecting/using the -bin-annot compiler flag (i.e. no .cmt files are produced).

Nevermind the dune issue! I didn’t see that the binary annotation files were actually present in the _build directory:

$ find ./ -iname "*.cmt"
./_build/default/.test.eobjs/byte/test.cmt
./_build/default/.test.eobjs/native/test.cmt

You should be able to derive a visitor for Typedtree.structure for collecting Texp_apply nodes (either produced by visitors or ppxlib_traverse). The library override that I announced today aims at allowing you to derive types that are defined in other modules. Deriving a collector for Texp_apply is still quite laborious since there are some name clashes to solve and I don’t know a way to make some types opaque with ppxlib_traverse. Still, here is a solution: the module Typedtree_collect_texp_apply provides two functions by using ppxlib_traverse:

val collect_texp_apply_from_structure
  : Typedtree.structure -> texp_apply list

val collect_texp_apply_from_binary_annots
  : Cmt_format.binary_annots -> texp_apply list
2 Likes

Cool, thanks for referencing back to my question. I’ll see if I can make use of override as you mentioned.