How do dune libraries and library dependencies work?

It took me a while to learn this, but I now understand that “library” is an ocaml term, not just a dune term, and that dune turns a library stanza into a library file, also called an archive file, which according to the ocaml compiler man page ends with the extension .cmxa.

I’d like to understand a few things about library depndencies “below” the level of dune itself, i.e. without the abstraction dune provides letting libraries use external libraries. I can’t tell how much is dune and if it’s just a thin dune layer on top of plain opam/ocaml/ocamlopt stuff.

  1. Is the name of the archive file relevant to ocaml code at all in terms of modules? i.e does the archive file itself implicitly define the top-level module that dune makes available? Or is the top-level module in the archive file? I imagine it’s the latter.

  2. When dune turns a library stanza into a top-level module in an opam package it builds, that top-level module is then available in a consuming dune project’s library or executable whose libraries clause specifies the consumed library’s public_name, which must begin with its opam package name. Is this public_name lookup specific to using modules from dune-built projects? Or could I specify a non-dune opam package’s module in public_name to use it in a dune project? If not, how does one use a non-dune package’s modules in a dune project?

  3. I imagine dune uses a dune-built opam package’s dune-package file to resolve what modules are available and how to make them available in compiled ocaml code. Is there a lower-level way to inspect what top-level modules are available in an installed opam package? ocamlobjinfo looks promising but it’s not clear to me how to interpret its output in terms of how to use a library file’s modules.

Thanks for any help.

2 Likes

The latter. As far as the compiler is concerned, a library is an arbitrary collection of modules.

No, any package that has Findlib metadata (META) can appear in the libraries field as long as it is in the “search path” of Dune.

Correct. dune-package is basically a richer version of the Findlib metadata META, and they are used whenever they are available. If dune-package is not available (ie for packages not built by Dune), Dune does its best just with the META file.

If you look in the installation directory of your package, every .cmi file corresponds to a top-level module. You should ignore the files with __ in their names. The double underscore is an artifact of Dune’s implementation of wrapped libraries; these files correspond conceptually to the the submodules of the (only) top-level module of the wrapped library.

Cheers,
Nicolas

6 Likes

Thank you. This is so clarifying. Although it’s lower level, I think including this information in the dune manual might ease the path for newcomers to ocaml - it would have for me - because it demystifies things and empowers me to debug my dev environment as I move beyond hello world or a very basic dream app. Most of the tutorials start with ocamlopt and then introduce ocamlfind and then dune, but mostly omit how each level of abstraction maps to the one beneath it. I think some explanation of that could have a step-change impact on newcomers’ learning experience. Maybe I just missed that information in the usual startup materials, but I found your reply incredibly empowering. Thanks!

1 Like

A couple other questions:

  1. Is the .cmxa file involved in compilation at all then? The dune manual says that a library stanza is turned into an archive file of the same name, which I thought meant that the generated archive file is central to its use as a dependency. Is that the case? Or is it just the sibling .cmi file with the same name that’s involved?

  2. What is the incantation to compile an ocaml module using a .cmxa file as a dependency, using either ocamlopt or ocamlfind? Or is that not just not what it’s used for? What is the .cmxa file for?

Since you asked, let me try to explain the full compilation pipeline, starting from the simplest case, which is that of bytecode:

  • An interface file foo.mli produces foo.cmi with ocamlc -c foo.mli. An implementation foo.ml produces foo.cmo with ocamlc -c foo.ml. If the .mli file does not exist, then this command also produces foo.cmi in addition to foo.cmo. The file foo.cmi contains the signature of the module Foo. The file foo.cmo contains the raw bytecode of Foo.

  • If a file bar.ml depends on the module Foo, then its compilation command ocamlc -c bar.ml depends on foo.cmi. But it does not depend on foo.cmo. This is what is known as separate compilation: the .ml files can be compiled independently once the .cmi’s have been produced.

  • You can put the modules Foo and Bar together in a single library mylib.cma, which is an arbitrary collection of modules. You do this with ocamlc -a -o mylib.cma foo.cmo bar.cmo. Note that this does not involve the .cmi files at all; it is just a way to put together a bunch of object .cmo files together in a single file.

  • Finally, if you want to build an executable from a source file main.ml which uses the modules from the library mylib.cma (ie Foo and Bar), then you first produce main.cmo with the compilation command ocamlc -c main.ml, similarly to what was done with foo.ml and bar.ml. As mentioned, this step will require foo.cmi and bar.cmi if the modules Foo and Bar are referenced inside main.ml. Finally, you link everything together into an executable using the command ocamlc -o main.exe mylib.cma main.cmo. This step links all the bytecode together into a single file and does not require any .cmi file.

In short, .cmi files are only used for compilation; .cmo and .cma files are only used at linking time. If you have used C, one could say that .cmi files are like .h headers, .cmo files are like .o object files and .cma files are like .a archives. The compilation flow is actually pretty similar, except that in a safe language :slight_smile:

The pipeline for native code is exactly the same, except that ocamlc is replaced by ocamlopt, .cmo files are replaced by .cmx files and .cma files are replaced by .cmxa files. There is only one difference in the compilation flow which is that the native-code compiler performs inlining across modules. To do that, the compiler needs access to the (already compiled) functions from any referenced module, which are stored in their .cmx files. Accordingly, these files are needed in addition to the .cmi files during compilation (as opposed to only during linking, as in the case of bytecode).

Concretely, using the same module names that we used above, if you produce bar.cmx from bar.ml with the comlpilation command ocamlopt -c bar.ml and this file references the module Foo, then this command will require foo.cmx in addition to foo.cmi for cross-module inlining. You can disable cross-module inlining by passing -opaque to the compiler, in which case the last command will not require foo.cmx anymore and foo.cmi alone will suffice.

Hope that helps!

Cheers,
Nicolas

9 Likes

Thank you. This is a goldmine of information of how it all works.

I’m currently restructuring the Dune manual and including explanation guides for things like this. One document I added to the next version is a tour of the OCaml ecosystem which is an attempt at explain the role of the compiler, findlib, and opam, as well as the relation between modules, libraries, and packages. Do you find this document useful?

2 Likes

Yes, thank you. That page is a great step in the direction I have in mind but still requires that the reader go to many other places to have a working understanding of what the page describes. I’m busy writing other things for a couple weeks but I plan to take a crack at a page with what I have in mind. I think I understand enough of the low-level details to write that. I’m thinking of a page that explains the build process, including how source files and modules map to target files and modules, and then explaining how that maps to an opam package’s contents, then maybe what ocamlfind does (I don’t see that as too important if the goal is to use dune), and then the structure dune imposes and how its stanzas map to those lower level things. Then the abstractions are no longer opaque magic but transparent convenience.

Great! That would indeed make a great explanation guide. I created a doc request on the issue tracker to discuss this.