Enumerate value definitions

What is a good way to enumerate definitions (implementations) of a compilation unit (module file)? For disambiguation purpose the right type of function is Env.find_value: Path.t -> t -> value_description if a path is given. However I would like to enumerate all the paths that define values in a structure (and its sub-modules). Additionally it would be nice to know how to generate a subset of paths (those available through the interface) from the .cmi or .cmti file.

Motivation: this is equivalent to providing a tag file that lets you “jump to definition”. Currently from Typedtree API (.cmt file) I can generate a reference but external reference points to the val_loc in the signature file. I would like to be able to point to the implementation as well.

If you want to do it by hand, you could walk through the list of structure_item's stored in the .cmt file looking for Tstr_value items, keeping track of the “current path”. The current path is initialized with the identifier of the top-level module, and updated each time you go under a Tstr_module. Of course, more work is required to handle include, etc.

To know which of the values are exposed on the interface, you can play a similar game but walking through the signature_item's stored in the .cmti file, looking for Tsig_value and updating the current path each time you go under a Tsig_module.

I believe both merlin and ocp-index expose this functionality, so it may be worthwhile to go read their source. Also the annot functionality exposes “jump to definition” information if memory serves; you can look in the cmt2annot.ml file.


1 Like

Thanks! I started with cmt2annot.ml. It uses Env.find_value which gives the val_loc in the interface (signature) file. merlin is a bit hard to understand as it needs to handle more complicated usage model.

Suppose I walk the Typedtree of an implementation, the information I want should be in some env. Question is which one? Is the str_final_env the right one to look at? I only care about externally accessible values. To enumerate possible paths I could do as what you suggested looking for Tstr_value. By using str_final_env instead of str_env of the particular structure_item would things like include be taken care of already?

I’m not an expert, but I’m not sure you will be able to easily extract the information you want from Env.t. Environments are local entities that map identifiers (Ident.t) in scope at specific points of the program to different kinds of objects. As far as I remember it is not easy to get access to the “global” Path.ts through them.

That’s why I suggested walking through the Typedtree of the interface and manually keeping track of the current path, incrementally building up a mapping Path.t -> value_description. But again, I’m not an expert, so perhaps I’m missing something.


This is what I ended up doing: It turned out that I did not need to look for Tstr_value. The structure type of Typedtree contains a signature field str_type. The Sig_value items of this list can be used to map Ident.t to val_loc, which is the actual location of definition/implementation. Yes I still need to track entering and exiting modules to form the path.