Has there been a syntax proposed for combining .mli into .ml?

Regarding the annoyance of duplicating exposed type definitions, I’m not sure that combining ml and mli into a single file would resolve that. The solutions proposed in this thread (i.e. annotating a module with a signature within the ml) would still require duplication. To avoid it, you would still need to define the types elsewhere and alias/include them.

2 Likes

I expect most people are aware of it, but I just wanted to make sure that ocaml-print-intf is mentioned somewhere in this thread. I mostly share @jonsterling’s point of view, but I’ve found that ocaml-print-intf significantly reduces the annoyance of having to repeat things identically. Often I first write the .ml file, then when it contains everything I expect to want, I run ocaml-print-intf to generate an .mli file that exposes everything, then edit the generated .mli file manually to remove things I don’t want to expose (and add opens and remove unnecessary module qualifications, since ocaml-print-intf isn’t smart about that). (Note that for weird reasons you can’t pipe the output of ocaml-print-intf directly to the .mli you want; you have to pipe it to a temporary file and then afterwards copy that temporary file to the correct .mli filename.)

Of course when I then add things to the .ml file that I also want to export, I have to copy them to the .mli file, but that’s not as bad. If I make a whole lot of changes to the .ml file at once, sometimes I then go back to re-generate a default .mli file.

Honestly, what I find the most annoying thing about writing .mli files is that they don’t accept the syntax for function types that has explicit universal type variables. Because I use a lot of GADTs, I almost always have to give my functions an explicit type with universal type variables, like

let f : type a b. a -> b -> (a, b) foo = ...

Since the .mli file only needs to know the type of things, you would think I could just copy this type declaration there and change let to val – but no, I have to also go in and change it to

val f : 'a -> 'b -> ('a, 'b) foo
1 Like

I think it was mentioned earlier but most editor tooling can do this with a button or keycombo press now eg I’ve done it in VSCode a few times.

I think that it would be nice to write .mli files separately from .ml files, but have a syntax to “include” the .mli signatures at the beginning of the .ml files and not have to repeat them – unless we want to provide more/stronger information in the implementation.

I think that this can be obtained somewhat nicely (with small orthogonal features), at least in first approximation – the details are trickier.

  1. Allow to mix signature items with structure items in modules. Each signature item must be followed by at least one structure item for the same name, and all structure items that follow it must refine it. For example, type t ;; type t = int or val x : int ;; let x = 42 are valid, but val x : int ;; let x = "foo" is not.
  2. Introduce a syntax to refer to the signature exposed in the .mli of the current compilation unit. @jbeckford proposed module type of _ above, any non-ambiguous syntax would do.

With these constructions, having include module type or include module type of _ at the beginning of the module would automatically import all .mli declarations, and only those that are too abstract would have to be refined in the .ml.

I have a draft of RFC for this that has been lying on my disk for years. But when we write the details down, there are two problems:

A. The interaction between this and shadowing are tricky; some people use a programming style with several let foo ... = ... function declarations for the same function foo in the file, possibly at different types. (For example, first an auxiliary function with extra arguments, and then a corollary function that hides the complexity away.) In practice what we really want is for the val foo : ... signature in the module to be matched to the last declaration (the one that is matched against the .mli today), but the natural specification instead mandates that it matches the first declaration, or maybe all of them. I don’t really like this shadowing-heavy style, so I would be happy to distance ourselves from it, but it makes the proposal harder to sell.

B. This basically introduces a mechanism for forward declarations, and it is very tempting to combine this with more generalized recursion between declarations. The safe/conservative thing to do is to forbid any extra recursion compared to the current language (by ruling out cyclic references arising from forward declaration), but the natural/tempting thing to do is to at least accept this use of forward declaration for non-term-level constructs, that don’t have tricky initialization questions. (And one can shoot oneself in the foot even better, for example by suggesting a systematic elaboration to recursive modules.) Having to ponder this question adds decision fatigue and chances for disagreement, and in my experience it makes writing a concrete design proposal harder.

1 Like

I had a thought coming at this problem from the opposite direction from that suggested by @gasche.

We have a nice and concise syntax for fully opaque types in

type t

and for “private” types (destructable but not constructable) in

type t = private foo

So I’ve been wondering if we could smooth out this rough edge with similar syntax to mark transparent types. E.g.,

type t = transparent
(* or *)
type t = public
(* or *)
type t = _
(* or whatever *)

Which would mean something like, type t (in a signature) exposes exactly whatever the type is specified to be (in the implementation). This would mean you don’t have to repeat the type definition in the interface, but just note that the type should be exposed transparently, i.e., mark it as public.

IMO, this feels a bit more natural (and less complex) than a feature that would allow “importing” parts of an interface file into the implementation. I think the key point in its favor is that it allows us to continue thinking of interfaces exclusively as a way of presenting what is to be exposed by an implementation.

Two problem I can anticipate with this idea: (1) afaik, the type language is currently identical in signatures and structures, and this would break that symmetry (but of course there are already many asymmetries between the two); (2) more significantly, imo adding features which only work in mli files but not on signatures generally seems like a troublesome complexification, but I don’t know what my suggestion here would imply when used in a signature that could be implemented by many different modules… iiuc, the idea of importing types from an mli file also has this problem?

[edit] a third problem, which is the worst of the tree I think, is that under my suggestion interfaces would no longer give you everything needed to know the public api of an implementation. Gache’s suggestion doesn’t have this problem.

IIRC the MixML design solves this duplication: MixML - Mixin' Up the ML Module System

1 Like

I’ve also though about this problem a bit, and I suggest a third way. In think it is unavoidable to have an opaque ascription between the .ml and .mli file, as the .mli serves as the interface for separate compilation, as noted in your [edit].

Right now the user can either (1) accept the result of the typechecker directly (putting no signature or no mli file) or (2) force a given signature (using ascription or an mli file). I propose a way to guide the typechecker but without writing down the whole signature: local declarations and export markers.

Local declarations are easy, they do not appear in the result signature:

module M = struct
  type t = int 
  local let x = ref 0
  let fresh () = x := !x+1; !x
end

would give :

module M : sig 
  type t = int 
  val fresh : unit -> int 
end

A local decl local decl is just syntactic sugar for open struct decl end (note that local type declarations can cause signature avoidance)

Then, I propose export markers for type and module type declarations [!export as ]: a type can be marked to be exported as abstract, or as private:

type t0 = A | B of int 
type t1 = C | D of string [!export as abstract]
type t2 = E | F of bool [!export as private]
module type T0 = sig val x : int val y : bool end
module type T1 = sig end [!export as abstract]

would give:

type t0 = A | B of int
type t1
type t2 = private E | F of bool
module type T0 = sig val x : int val y : bool end
module type T1

I can see (at least) three issues that would need design decisions:

  1. The type could have its original definition visible in the whole enclosing struct, or up to the whole file – I don’t know what’s best.
  2. Making a type abstract or private can be incorrect if the rest of the signature depends on the original equality (via paths with functor applications for instance). Here, we can either (1) infer the signature of the module ignoring the export tags, then make the change of the export tags and recheck the result signature or (2) infer the signature of the module with the information that those declarations will later be changed, and throw an error early if need be. A new error message would be needed.
  3. Using export tags with alias types like type t = int would be quite fragile, as the typechecker can arbitrarily rewrite such types, which would have a dramatic impact on what is abstracted and what is not in the result signature. Therefore, either (1) types declarations with export tags should be treated differently by the engine that chooses names, or (2) it should be deactivated for such declarations.
1 Like

I think this motivation is compelling, but am not sure that the point about the inclusion being at the beginning is crucial. Perhaps the difficult details in the vicinity of shadowing (though probably not recursion) would be easier to avoid if the feature was to include specific named declarations from the signature into a/the structure rather than the whole signature at once, since far less (no?) shadowing is supported in signatures. That is, if there was an .mli with a type definition:

type t = A of int | B of string

then in the .ml one could write, say,

include type t

with the semantics being the same as if

type t = A of int | B of string

had been written at exactly the same point in the .ml file. Then which of multiple shadowed definitions in the .ml would be associated with the definition in the .mli would be resolved by the position of the “include” in the .ml. The idea is that there will be a unique definition associated to type t by the .mli, so at any point in a .ml, one could write include type t as a shorthand for the particular definition of type t in the .mli. Similarly for module and other kinds of top-level signature item.

Note that this approach would require build systems to consider a .ml to depend on the corresponding .mli, as expected, but not vice versa. So IIUC (but would be happy to learn otherwise) there would be no negative impact on separate compilation.

There is a lot that this approach would not cover, for instance when a series of modules is defined in the .ml and then given a signature with module rec in the .mli even though no module rec was used in their .ml definition, but it would allow to avoid duplicating type definitions in many common cases.

This approach would cater to the camp who are happy to write the .mli separately from the .ml, but still would like to avoid duplicating definitions where possible. For the very different objective of not writing the .mli separately, something else would be needed, but I have never wanted that.

4 Likes