I don’t know why but it really bugs me when I have to create a .mli file to introduce a signature to a .ml module. Perhaps it is disrupts my flow while I’m coding. Or simply having to flip back and forth between two files. I’d much prefer if I could add and view the signature at the top of a .ml module.
Has anybody done any work or proposals in this area? I might be the only one who cares. I was thinking something along the lines of …
(* New anonymous module type to add a signature
to the module you are editing.
Must be at top of file. *)
module type _ = sig
type t
val create : unit -> t
end
(* The above constrains the module
structure that follows it. *)
type t = SomethingPrivate
let create () = SomethingPrivate
open struct
let a_private_function x = x + 1
end
let y = a_private_function 10
But sometime, that can lead to some type anchoring issues:
open struct type t = A end
let x = A
Here, type t (and its constructor A) is present in the current scope, but as it is not exported, it is impossible to type x correctly. If the module had a signature, it would be easy to realise that there is no acceptable type for x and that we would either have to change the opening directive or not export x at all. This is a problem known as type anchoring (and is documented here: https://inria.hal.science/hal-03526068/file/main.pdf)
But to be honest, i am in favour of having an mli (because of modular compilation, it is nice for documentation and it make encapsulation and type abstraction “easy”).
Your first form include (struct ... end : sig ... end) seems “clean” (it should be functionally equivalent since it does not introduce any new module type, value, etc). In fact, that would be trivial to write up in a PPX, although for reasons mentioned below I won’t.
But to be honest, i am in favour of having an mli (because of modular compilation, it is nice for documentation and it make encapsulation and type abstraction “easy”).
modular compilation: As a thought experiment, I can imagine a transformation of a .ml file into its true .ml and .mli representation based purely on the AST, if the signature embedding is standardized (ex. I suggested module type _ : sig ... end (* rest of module *)). And since that transformation exists, that means that the modular compilation feature can be done without a .mli. Is my thinking wrong?
If my thinking isn’t wrong, I may allow embedded signatures in MlFront in the future. Currently MlFront makes a symlink from a module A.ml to its fully-qualified module path (ex. Something__A.ml). It seems easy enough to break A.ml into Something__A.mli and Something__A.ml while scanning the AST for module dependency analysis. Supporting embedded signatures would obviously be better if it were in the compiler itself, but at least in my head the interim solution sounds ok.
I’m skipping discussion of a PPX solution since a PPX would make it difficult to distinguish between signatures and structures during dependency analysis. Keeping signatures/structures separate should allow me to infer requires/exports without user intervention: Proposal: a new `exports` field in `findlib` META files
Incidentally, in LexiFi’s fork of the OCaml compiler we had this extension for a long time until a few years ago (we called it “inline signatures”):
[%%sig: A] B
which would be translated as include (struct B end : sig A end). This was used mainly for plugins which could then consist of a single file, which in turn simplified their compilation on the client machine.
Also: it is more efficient. With a separate .mli, if you modify the .ml, you don’t need to recompile any other module (if you are compiling in -opaque mode, aka Dune’s dev profile).
A nice possibility with having the .mli inlined is that ocamlc -i could be updated/extended to understand it - so in theory having started with a unified .mli/.ml it’d be quick to split them (that would also allow installing the “.mli”). Also…
… you wouldn’t lose this as much, since the build system could always extract the .mli prior to compilation (and consequently only recompile the .cmi on actual interface changes).
There is likely also a few usability advantages that people do not see. For example, once you start documenting your signatures (because we all do, right ?) I’m not too keen on having to skip a wall of doc strings and explanations to get to the implementation of the interface.
Even without documentation, a long interface will cause the same issue, and indeed it harms readability. In fact, most of the uses we had internally of this construction were for the case of the empty signature…
Hah, I’m not really advocating for it, but there’d be no reason from the compiler’s perspective to restrict where the inlined mli should go (so it could be at the end of the file)
The thread turned far away from the original question. Yes, some people will find .ml modules, math proofs and other documents … things written in topological order … highly readable. Good for those people. I am not in that camp.
Other people will find documents that deviate from the scientific format hard to read … the format that starts with an abstract first, followed by structured exposition, gathered together in a single document. I am in that camp. I especially find it hard to read topologically sorted documents and must skip to the end of the code and read backwards (ie. skip a wall of code) to understand it.
Part of this thread makes me laugh since it reminds me of Gulliver’s Travels, where people are fighting about which side of the egg to crack first (the big end or the small end?). But there is a moderate usability issue underneath that I now have some good ideas how to tackle. Thanks.
I find that using a modal editor (Vim + the vim-ocaml plugin) completely eliminates any feelings of disruption when navigating between files. Toggling between a ml and an mli feels much smoother and faster than jumping back and forth between rows within one file. I’d rather just hit two keystrokes to auto-open the mli buffer than have to manually find and mark any rows with documentation that I want to be able to reference.
I say this not to just defend the status quo (or any particular editor), but maybe suggest that editor tooling could be improved across the board to make this experience smoother for everyone? Likewise, if people started a convention of including the signature within a single ml file, I would want my editor to be able to navigate to and from that signature just as smoothly as it currently does with two files.
IIRC think this was also eventually added to the OCaml VSCode extension (and was supported in caml-mode ever since I started programming in OCaml which is… a very long time ago ). So I don’t think anyone has been left behind there.
One thing I wished for a long time though is the ability for merlin to follow up and switch between .mli and .ml when locating rather than end up with “already at definition point”, see this issue.
My two cents. I like having separate .mli file for all the reasons people said — it is good for documentation and setting expectations about what a module does, without forcing people to slog through the actual implementation.
There is, however, a very real problem that leads users to dislike having to write .mli files, which is that you have to senselessly repeat things in both the .mli and .ml file that will in any well-typed ascription ultimately be identical. For example, definitions of module types, datatypes, records, etc. all must be repeated, and this can get painful. One way around this is to put those things in another .ml file that lacks a signature, and the just open that module in both the .mli and .ml file, but I think that decreases the utility of the .mli file as a convenient and user-friendly vehicle for documentation and expectations about code.
Can this be fixed in ML? I am not certain. The behaviour that one really wants is that the .mli file comes first, and is used to inform the elaboration of the .ml file — thus when the .ml file is elaborated against its interface, various missing bits could easily be filled in. But that is not the ML architecture — in ML, a signature is synthesized from a module (without any regard to the intented specification) and then this is unified (modulo subsumption) to another signature that is separately ascribed. A better architecture would be to have information flow inwards from the ascription, but then it would no longer be ML (which is, naturally, not the biggest concern for me in comparison to the practical improvements that would be possible under a bidirectional design).
If VS Code users still find that it disrupts their flow, perhaps it could use some more attention? Anyway, I realize that the OP doesn’t mention any particular editor, so I may be veering off-topic.
To bring things back on track, my larger point is that one of the root problems mentioned (feeling your flow disrupted when navigating between a signature and an implementation) is probably caused by the tools you use to view the source files as much as it is by the files themselves. I worry that it’s unproductive to only discuss how to rearrange code without also considering how our “flow” of navigating code works or how it could be improved.
While I’m not very fond of this trick, but I agree it can sometimes be useful. For the interested reader there’s a longer description of the ins and outs of the trick by @CraigFehere.
(One good thing about the pain of duplication is that it encourages you to keep your types abstract which is often a very good thing :–)
Up to definitional equality, there is always a unique implementation of a datatype or record specification. There may be many textually different implementations, depending on what things are abstract and what definitions are in scope, but one does not reason up to textual equivalence. (EDIT: I think that what I said may be incorrect in the presence of datatype cloning, as in type foo = bar = Foo | Bar. This is subtler than I thought… But nonetheless, we can imagine providing good defaults.)
VS Code has Alt-O to flip between .mli and .ml. However, if the .mli doesn’t already exist, then a new .mli file is created with its signature inferred. That means either a) opening/scanning the Project Explorer to find out if the .mli exists or b) trimming/delete the verbose auto-generated .mli if I don’t want to do the existence check. Since that takes time, I don’t use that feature often. In the context of an inline .mli, it would be two keystrokes: Ctrl-Home to skip to top of the file, and then Alt-Left to navigate back to where I was in the module. Just realized that having inline .mli also lets me see in the Project Explorer up to twice as many modules as I do today. I must have Stockholm Syndrome for not noticing that earlier.
(And yes, I love vim as a single-file editor. Just not as an IDE).