Has there been a syntax proposed for combining .mli into .ml?

jbeckford · August 20, 2024, 12:31am

I don’t know why but it really bugs me when I have to create a .mli file to introduce a signature to a .ml module. Perhaps it is disrupts my flow while I’m coding. Or simply having to flip back and forth between two files. I’d much prefer if I could add and view the signature at the top of a .ml module.

Has anybody done any work or proposals in this area? I might be the only one who cares. I was thinking something along the lines of …

(* New anonymous module type to add a signature
   to the module you are editing.
   Must be at top of file. *)
module type _ = sig
  type t
  val create : unit -> t
end

(* The above constrains the module
   structure that follows it. *)
type t = SomethingPrivate
let create () = SomethingPrivate

xvw · August 20, 2024, 1:21am

You can use arbitrary module expression in include and open. ie:

include (struct
  type t = int
  let f x = x
end :sig
  type t
  val f : int -> t
end)

You can also use a module type S on top of your file:

module type S = sig
  type t
  val create : unit -> t
end

include (struct
  type t = int
  let create () = 10
end : S)

This allows you to recover the module signature, as a module type, using moduleName.S and without having to go through module type of moduleName.

Or you can use open struct ... end for having private stuff (mentionned in https://www.cl.cam.ac.uk/~jdy22/papers/extending-ocamls-open-draft.pdf):

open struct
  let a_private_function x = x + 1
end

let y = a_private_function 10

But sometime, that can lead to some type anchoring issues:

open struct type t = A end
let x = A

Here, type t (and its constructor A) is present in the current scope, but as it is not exported, it is impossible to type x correctly. If the module had a signature, it would be easy to realise that there is no acceptable type for x and that we would either have to change the opening directive or not export x at all. This is a problem known as type anchoring (and is documented here: https://inria.hal.science/hal-03526068/file/main.pdf)

I have also written (in french but code examples should be accessible) about “importation scheme using OCaml module language”: xvw - OCaml, modules et schémas d'importation

But to be honest, i am in favour of having an mli (because of modular compilation, it is nice for documentation and it make encapsulation and type abstraction “easy”).

jbeckford · August 20, 2024, 2:21am

Thanks for the references! Light reading

Your first form include (struct ... end : sig ... end) seems “clean” (it should be functionally equivalent since it does not introduce any new module type, value, etc). In fact, that would be trivial to write up in a PPX, although for reasons mentioned below I won’t.

But to be honest, i am in favour of having an mli (because of modular compilation, it is nice for documentation and it make encapsulation and type abstraction “easy”).

modular compilation: As a thought experiment, I can imagine a transformation of a .ml file into its true .ml and .mli representation based purely on the AST, if the signature embedding is standardized (ex. I suggested module type _ : sig ... end (* rest of module *)). And since that transformation exists, that means that the modular compilation feature can be done without a .mli. Is my thinking wrong?

If my thinking isn’t wrong, I may allow embedded signatures in MlFront in the future. Currently MlFront makes a symlink from a module A.ml to its fully-qualified module path (ex. Something__A.ml). It seems easy enough to break A.ml into Something__A.mli and Something__A.ml while scanning the AST for module dependency analysis. Supporting embedded signatures would obviously be better if it were in the compiler itself, but at least in my head the interim solution sounds ok.

I’m skipping discussion of a PPX solution since a PPX would make it difficult to distinguish between signatures and structures during dependency analysis. Keeping signatures/structures separate should allow me to infer requires/exports without user intervention: Proposal: a new `exports` field in `findlib` META files

nojb · August 20, 2024, 5:49am

Incidentally, in LexiFi’s fork of the OCaml compiler we had this extension for a long time until a few years ago (we called it “inline signatures”):

[%%sig: A] B

which would be translated as include (struct B end : sig A end). This was used mainly for plugins which could then consist of a single file, which in turn simplified their compilation on the client machine.

Also: it is more efficient. With a separate .mli, if you modify the .ml, you don’t need to recompile any other module (if you are compiling in -opaque mode, aka Dune’s dev profile).

Cheers,
Nicolas

dra27 · August 20, 2024, 7:40am

A nice possibility with having the .mli inlined is that ocamlc -i could be updated/extended to understand it - so in theory having started with a unified .mli/.ml it’d be quick to split them (that would also allow installing the “.mli”). Also…

… you wouldn’t lose this as much, since the build system could always extract the .mli prior to compilation (and consequently only recompile the .cmi on actual interface changes).

dbuenzli · August 20, 2024, 8:02am

There is likely also a few usability advantages that people do not see. For example, once you start documenting your signatures (because we all do, right ?) I’m not too keen on having to skip a wall of doc strings and explanations to get to the implementation of the interface.

nojb · August 20, 2024, 8:09am

Even without documentation, a long interface will cause the same issue, and indeed it harms readability. In fact, most of the uses we had internally of this construction were for the case of the empty signature…

Cheers,
Nicolas

dra27 · August 20, 2024, 8:10am

Hah, I’m not really advocating for it, but there’d be no reason from the compiler’s perspective to restrict where the inlined mli should go (so it could be at the end of the file)

dbuenzli · August 20, 2024, 8:11am

But then I’m not to keen on having to have to skip a wall of code to be able to understand what the module exposes to the code base ;–)

jbeckford · August 20, 2024, 1:06pm

The thread turned far away from the original question. Yes, some people will find .ml modules, math proofs and other documents … things written in topological order … highly readable. Good for those people. I am not in that camp.

Other people will find documents that deviate from the scientific format hard to read … the format that starts with an abstract first, followed by structured exposition, gathered together in a single document. I am in that camp. I especially find it hard to read topologically sorted documents and must skip to the end of the code and read backwards (ie. skip a wall of code) to understand it.

Part of this thread makes me laugh since it reminds me of Gulliver’s Travels, where people are fighting about which side of the egg to crack first (the big end or the small end?). But there is a moderate usability issue underneath that I now have some good ideas how to tackle. Thanks.

dbuenzli · August 20, 2024, 2:17pm

Ah no, the thread rather tells you that it’s better to keep the egg shell separate from its content. Then this this debate simply does not exist :–)

JohnJ · August 20, 2024, 2:58pm

I find that using a modal editor (Vim + the vim-ocaml plugin) completely eliminates any feelings of disruption when navigating between files. Toggling between a ml and an mli feels much smoother and faster than jumping back and forth between rows within one file. I’d rather just hit two keystrokes to auto-open the mli buffer than have to manually find and mark any rows with documentation that I want to be able to reference.

I say this not to just defend the status quo (or any particular editor), but maybe suggest that editor tooling could be improved across the board to make this experience smoother for everyone? Likewise, if people started a convention of including the signature within a single ml file, I would want my editor to be able to navigate to and from that signature just as smoothly as it currently does with two files.

dbuenzli · August 20, 2024, 5:55pm

IIRC think this was also eventually added to the OCaml VSCode extension (and was supported in caml-mode ever since I started programming in OCaml which is… a very long time ago ). So I don’t think anyone has been left behind there.

One thing I wished for a long time though is the ability for merlin to follow up and switch between .mli and .ml when locating rather than end up with “already at definition point”, see this issue.

jonsterling · August 21, 2024, 3:21pm

My two cents. I like having separate .mli file for all the reasons people said — it is good for documentation and setting expectations about what a module does, without forcing people to slog through the actual implementation.

There is, however, a very real problem that leads users to dislike having to write .mli files, which is that you have to senselessly repeat things in both the .mli and .ml file that will in any well-typed ascription ultimately be identical. For example, definitions of module types, datatypes, records, etc. all must be repeated, and this can get painful. One way around this is to put those things in another .ml file that lacks a signature, and the just open that module in both the .mli and .ml file, but I think that decreases the utility of the .mli file as a convenient and user-friendly vehicle for documentation and expectations about code.

Can this be fixed in ML? I am not certain. The behaviour that one really wants is that the .mli file comes first, and is used to inform the elaboration of the .ml file — thus when the .ml file is elaborated against its interface, various missing bits could easily be filled in. But that is not the ML architecture — in ML, a signature is synthesized from a module (without any regard to the intented specification) and then this is unified (modulo subsumption) to another signature that is separately ascribed. A better architecture would be to have information flow inwards from the ascription, but then it would no longer be ML (which is, naturally, not the biggest concern for me in comparison to the practical improvements that would be possible under a bidirectional design).

JohnJ · August 21, 2024, 3:33pm

If VS Code users still find that it disrupts their flow, perhaps it could use some more attention? Anyway, I realize that the OP doesn’t mention any particular editor, so I may be veering off-topic.

To bring things back on track, my larger point is that one of the root problems mentioned (feeling your flow disrupted when navigating between a signature and an implementation) is probably caused by the tools you use to view the source files as much as it is by the files themselves. I worry that it’s unproductive to only discuss how to rearrange code without also considering how our “flow” of navigating code works or how it could be improved.

xvw · August 21, 2024, 3:42pm

I usually use an mli file for locating common types and module type and I use the dune rule (modules_without_implementation my_mli)

yawaramin · August 21, 2024, 3:47pm

Will they? If you have an abstract or private type, they have to be different by definition, no?

dbuenzli · August 21, 2024, 3:52pm

While I’m not very fond of this trick, but I agree it can sometimes be useful. For the interested reader there’s a longer description of the ins and outs of the trick by @CraigFe here.

(One good thing about the pain of duplication is that it encourages you to keep your types abstract which is often a very good thing :–)

jonsterling · August 21, 2024, 3:55pm

Up to definitional equality, there is always a unique implementation of a datatype or record specification. There may be many textually different implementations, depending on what things are abstract and what definitions are in scope, but one does not reason up to textual equivalence. (EDIT: I think that what I said may be incorrect in the presence of datatype cloning, as in type foo = bar = Foo | Bar. This is subtler than I thought… But nonetheless, we can imagine providing good defaults.)

jbeckford · August 21, 2024, 11:14pm

VS Code has Alt-O to flip between .mli and .ml. However, if the .mli doesn’t already exist, then a new .mli file is created with its signature inferred. That means either a) opening/scanning the Project Explorer to find out if the .mli exists or b) trimming/delete the verbose auto-generated .mli if I don’t want to do the existence check. Since that takes time, I don’t use that feature often. In the context of an inline .mli, it would be two keystrokes: Ctrl-Home to skip to top of the file, and then Alt-Left to navigate back to where I was in the module. Just realized that having inline .mli also lets me see in the Project Explorer up to twice as many modules as I do today. I must have Stockholm Syndrome for not noticing that earlier.

(And yes, I love vim as a single-file editor. Just not as an IDE).

Topic		Replies	Views
At what point do you start writing an .mli file? Learning	26	4621	October 25, 2020
What is the reason of separation of module implementation and signatures in OCaml? Learning	34	9917	March 20, 2018
De-duplicating module signatures that depend on abstract data types Learning module	11	3311	May 24, 2019
Sorry, not getting it with mli Learning	29	1188	July 9, 2024
Future of `.mli` only module setups? Learning compiler	14	3760	October 26, 2017

Has there been a syntax proposed for combining .mli into .ml?

Related topics