At what point do you start writing an .mli file?

This is more of a personal/philosophical question than a technical problem. When you’re building something new with OCaml, at what point do you write an interface file for your module?

Do you start with the interface file, and then fill in the implementation later? How far do you go before you start implementing the interface? Do you instead try to get something working first, and only then think about how to modularize it and define the interfaces between modules? Or is it a mix of both? Are there any rules of thumb that you follow?

I’m new to OCaml, and really enjoying it so far. However I find myself falling into the trap of spending too much time designing and documenting .mli files, only to later find, when implementing them, that I hadn’t considered some drawback to the interface, and have to throw out some work. I’m prone to do this in every language, but it seems even more tempting in OCaml to spend all of my time designing interfaces. It might come from the clear separation between .mli and .ml files – in other languages, the implementation and interface are mixed together in the same file, so both stay on my mind.

I’m not really asking for any kind of solution, I’m just curious about others’ habits.

2 Likes

You will find, I suspect, all kinds of ML programmers, from people like me, who barely write documentation (I once wrote a quite large caml-light system with barely any comments at all anywhere, not even header/algorithmic comments – the people who came after me must have loved me, sigh (as they say at Google: “sometimes, the best documentation is the source code”) HA!) to people like dbuenzli@, who write lovely copious documentation in their MLI files.

I typically start writing code, and only at the point when I think it needs “tightening” do I wrote interfaces for it. And since often that time is when the code is either going to be provided as an argument to a functor, or is the body of a functor, even THEN I don’t write MLI files – I just write signatures in the ML file and use 'em then-and-there. So when do I write MLI files? When I find that my project is getting big enough that I need the stability of the MLI file in order to be able to do better parallel compilation. Yeah, I know: “that’s a terrible reason to add MLI files, mang”. It is what it is.

I find that MLI files get in the way of refactoring, and since refactoring in a large project can involve a number of files, it’s just painful to have the MLI files around at that point. But for stable interfaces, sure/sure/sure, MLI files are great. And (of course) for allowing better parallel compilation.

1 Like

I would say 90% of the time I write the interface first. Usually that tells me if what I want to do actually makes sense and is likely to do what I want. If I’m just writing something closer to a quick script I won’t bother with a .mli file. Sometimes I really don’t have a strong idea of what I’m trying to accomplish and I might jump in with the .ml file.

And even when I write the .mli file first, obviously it’s not done so I go back and forth and update it as I’m implementing the .ml file.

I write mli files both before and after.

When doing DDD, I start with mli files. I start implementation when :

  • main business rules are encoded in types
  • I feel confident of code architecture
    During implementation process I write ml and create or update corresponding mli file.

Sometimes, I can have one or two ml files alone, I create mli file for documentation afterward

When someone puts a gun on my head.

More seriously, when I release a library, I might document the most important module
and put a .mli file for it.

PS: .mli file are a real pain when you are prototyping something

I tend to write mli files almost immediately after starting a ml file, try to keep them decently well-documented unless I’m rapidly prototyping something, and keep any public-facing documentation there instead of it cluttering my ml file. I do so to make sure I know exactly what the dependency surface between bits of my code is (and also so the compiler can tell me if there are things in my ml file that aren’t actually used by it).

I think the only files where I have ml without an mli are entry points into programs, usually.

Experiences may vary, for myself I haven’t found this to be a problem and often even a benefit. It keeps me on target for what I’m trying to accomplish. My interfaces tend not to change that much when prototyping but it depends on how one solves problems.

Is there a way to do something like this for random function declarations?

3 | val ge : int -> bool
Error : Value declarations are only allowed in signatures

Out of curiosity, why would you want to do that?

Do you mean something like:

module type S = sig val ge : int -> bool end
module M : S = struct let ge x = x >= 0 end

??

I would only do this if I -needed- the signature S for some functor-argument. In particular, this isn’t a substitute for an MLI file. Or more exactly, it’s no better: it’s just a brittle, and is worse in the sense that it doesn’t allow parallel compilation.

In case it’s not obvious, by “parallel compilation” I mean that if I have

  1. two ML files, a.ml and b.ml
  2. b.ml mentions A
  3. then I cannot compile b.ml until after a.ml is compiled
  4. but if I have an MLI file a.mli then I can compile a.mli first, then both .ml files in parallel.

I guess I don’t really mean something very much like that. It’s what I did once in a while in Haskell:

ge :: Int -> Bool
ge i = i >= 0

I find that very useful documentation, not least when I’m struggling with some error and in this way I document my intention to the compiler to govern the path its type inference takes. But it helps a lot when I just come back years later and have little recollection of how these functions fit together and what they’re doing. I want this in the code, not in some separate file, because the code is what I look at when trying to figure out what I’m doing. (And especially I don’t want it in a separate file that’s going to be interpreted as the complete interface, because that isn’t what I intend at all.)

Your module/signature doesn’t do that, of course, because I’d keep redefining module M for each new function I’d document in this way.

In OCaml I can’t write

val ge: int -> bool
let ge i = i >= 0

?

No, there’s no way to do this in Ocaml (that I know of, and I think I have a pretty firm grasp of what’s allowed in the syntax). The closest you can get is:

let (ge : int -> bool) = fun x -> x >= 0

I think people do this from time-to-time when debugging. But typically, once you have your code compiling, you remove these type-coercions … because except for polymorphic variants and objects, the principal type schemes property guarantees that the compiler would come up with that type (or a more general one) so why bother? [with the special case where you -want- a less-general type, or want to force local type-abstractions).

Also [channelling dbuenzli@ grin – I really do think he does a great job with documenting his lovely contributions to the ocaml world] one shouldn’t have to look at the code to understand what a module does, right? The MLI file is both an interface, and also its documentation. [I say, as I almost religiously fail to put any doc-comments in my MLI files, sigh. mea culpa, mea culpa, mea maxima culpa]

1 Like

Haskell style is to annotate the value’s type above the value definition. OCaml style is to annotate the type in the interface. They are just different. After programming in OCaml people usually come to appreciate its style.

fwiw an OCaml style where you annotate functions in .ml also works, and
is pretty nice.

I sometimes tend to write big .ml files, without a .mli, but with
submodules like that inside:

module Foo : sig
  type t
  val foo : t -> …
end = struct
  …
end`

This is convenient for fast iteration as you don’t have to annotate/copy
everything into a .mli, but you can still have abstraction barriers.

3 Likes

As I understand it, the .mli interface serves a quite different purpose. If I make that file, and fail to “annotate” a function there, then that function becomes unavailable outside, and conversely if for some reason I want to constrain availability of functions in that way, I can’t annotate those functions. Which is to say, it isn’t for annotation in the sense I’m talking about.

This is somewhat off topic and I don’t want to pursue it too far at the expense of the question of when you’d use an .mli file. But it isn’t a question of various ways to annotate a function type. Haskell provides that functionality and it’s widely used. OCaml doesn’t.

[ETA: Whoa, does Haskell not have the equivalent of MLI files?]

Well, I’d note that the “ML philosophy” [with which I agree] has -always- been:

The compiler will give you the most general type, so why on earth would you want to write down a type? It cannot be better than the type the compiler would provide for you, yes? And if you want to know the types of all the struct-level functions/values, you can always do ocamlc -i ... Heck, an IDE could automatically run that and keep the types around to show you when you hover over function-names, right? And meanwhile, all that extra annotation is just … distracting wasteful space. The only time that you’d want to write down a type, is when you want a less general type than what the compiler would infer. And for that, we have MLI files, because typically that only happens when you have a number of functions that share some common types, and you want to restrict them all from the perspective of “outside”, while not restricting their views of each other. That is to say, you’re implementing an abstract data-type.

It’s not that most MLs are “unable” to provide this function. It is that … implementers choose not to, because it’s not regarded as being a useful way of writing programs in the ML world. And this is true not just of Caml, but also of SML. [though my knowledge of SML stops in 1992, so hey, maybe it’s changed since then – only 28 years!]

I think we are still on topic as you raised a good point of not wanting to put certain annotations in interface files.

Haskell provides a very specific syntax for annotation that is not widely known or used outside of Haskell, i.e.

foo :: Int -> Int
foo int = ...

OCaml provides two more generally-known forms of annotation syntax:

let foo : int -> int =
fun int -> ...

(* and *)

let foo (int : int) : int = ...

And of course it also offers interface annotation syntax for public items.

Why does the let foo : type = ... syntax not count?

2 Likes

Not that I’m aware of. You declare the visible symbols at the top of the source file, in all cases - so more completely, 1) Haskell doesn’t automatically export all definitions, and 2) Haskell separates export from annotation, so you don’t need to declare a type for a function just to export it and you can declare a type without exporting it.

I stand by my account of the value of this a couple of messages back. If I’m trying to override type inference, it’s because I’m debugging a problem where the type inference is starting from the wrong end and I can’t see the problem. Once that’s done, the annotated type will normally be the same as the inferred type, except visible to the reader.

I wouldn’t care to ask for the Haskell declaration syntax, but it has occurred to me to wonder if the prohibition against val in the .ml file really serves any purpose.

As others have pointed out, you get that effect by putting a type-constraint in the definition, viz. let f : T = blabla