I’m trying to produce typed ASTs for all source files in an OCaml project (e.g. a library or git repo). The goal is to inspect these typed ASTs to learn how OCaml modules/libraries are used throughout a project. What I’d like to see:
- Map of module functions/methods associated with number of applications throughout each source file.
- Line numbers of each function/method application.
I tried several approaches and failed.
1 Write a simple parsing program using compiler-libs
Initially, I thought to use the functions in compiler-libs.common
to parse the source files and produce typed ASTs. The parsing program’s source:
let lexbuf = Lexing.from_channel @@ open_in "test.ml"
let impl = Parse.implementation lexbuf
let typed_ast =
Typemod.type_toplevel_phrase
Env.initial_safe_string
impl
Parses a simple example program test.ml
:
let () =
List.iter print_endline ["a";"b";"c";]
This fails with an exception:
Exception:
Typetexp.Error
({Location.loc_start =
{Lexing.pos_fname = ""; pos_lnum = 2; pos_bol = 9; pos_cnum = 11};
loc_end = {Lexing.pos_fname = ""; pos_lnum = 2; pos_bol = 9; pos_cnum = 20};
loc_ghost = false},
<abstr>, Typetexp.Unbound_module (Longident.Lident "List")).
I’m assuming this is because the given (empty) Env doesn’t contain Stdlib
or any other libraries/implementations. I couldn’t find how to create/populate the Env
value with module locations, etc.
Ultimately it doesn’t matter because it highlights that the parsing program will need to know about any/all the implementations and libraries used by a particular project before it can be useful. This likely means inspecting dune
and/or .merlin
files and installing + loading some packages/libs
2 Use annotation files
I then saw that ocamlc
and ocamlopt
support the flags:
-
-dparsetree
- Prints each file AST only to stdout - I’d have to use the rather ugly approach of sending stdout to a file and re-ingesting from the file. -
-annot
- Produces the.annot
file along with other files during compilation. This file appears to contain the data I need. If I could generate a.annot
file for each source file in the project during thedune build
, I could parse them all use the data therein.
Unfortunately dune
doesn’t seem to support using the -annot
compiler flag (it doesn’t error but it also doesn’t produce any .annot
files). Maybe someone can shed light on whether I should raise this as an issue?
Using ocamlbuild
alone would take some effort as I’d need to append the correct packages from the dune file with -pkgs
. Let’s see what people say about dune’s -annot
support before I consider this approach.
3 Use merlin
After some further thoughts and investigation, I figured merlin must already be parsing entire projects of source and constructing typed ASTs for the various tasks it performs.
ocamlmerlin
doesn’t appear to have a command that outputs the full AST. The description of ocamlmerlin -server outline
seemed hopeful but it doesn’t output anything useful for my example.
I finally spent some time reading the merlin source to understand how merlin itself works internally. I hoped to find a point where I could diverge to my own logic that can use the typed AST. I didn’t find that point so here I am.