I’ve just started learning OCaml, and the most pronounced pain point so far has been having to repeat myself between interface and implementation files.
Don’t get me wrong, I think there are a lot of places where it makes sense to separate interface from implementation, and specifically to weaken type-level guarantees made by the interface to give more flexibility for later changes to the implementation. I think the fact that this is inherent in multi-file OCaml programs is for the most part a strength of the language.
By the same token, there are cases where it would really make sense to carry definitions made in the interface file over to the implementation file. Say I have the interface file my_data_structure.mli, and it looks something like this:
module type ELEMENT = sig
type t
...
end
module type HYPERPARAMETERS = sig
val foo: bar
...
end
module Create (Element: ELEMENT) (Hyperparameters: HYPERPARAMETERS) : sig
type element = Element.t
...
end
And now say I have to write my_data_structure.ml. I’m going to have to duplicate the definitions of ELEMENT and HYPERPARAMETERS, and as far as I can tell there is little to no reason I’d want either to have a different definition within the implementation file. The same goes for the guarantee inside Create that type element = Element.t: I’m going to have to repeat that declaration and guarantee inside my implementation.
I’ve found a few workarounds so far, such as this include hack, and my preferred so far is to extract ELEMENT and HYPERPARAMETERS into their own interface-only module(s). And this certainly works, but it still feels over-complicated and inelegant.
So this leaves me with two questions:
Is there a better workaround I’ve yet to stumble upon? Is there some idiom I’m unaware of that the community has centered around for solving just this sort of problem?
From a design standpoint, what’s the reasoning behind this behavior of the language? It seems to me that, in any place where a type declaration / definition that is present in the interface file is missing from the implementation file, the compiler could simply substitute the interface as a “default”, and otherwise allow what’s in the implementation file to be a local override (provided it’s compatible with the interface). Is there something I’m not considering that makes this infeasible / undesired, or is this just not something anyone’s gotten around to implementing?
There are a couple of tricks you can use to work around it in some situatuons.You already discovered the ‘top-down development’ trick, there’s also a ‘recursive modules’ trick which works for modules that contain only type definitions (and also external definitions): https://blog.janestreet.com/a-trick-recursive-modules-from-recursive-signatures/ . This one actually also works with a single module.
I am not bothered by the need to create the explicit interface file, but then again, I loved using Modula-2 back in the day.
My usual trick to avoid typing too much is to use the compiler itself to generate the first draft of the .mli file; if you use ocamlc with the -i flag, it will dump out the definitions of the values in a given .ml file; you just edit a bit (and remove what isn’t part of the explicit interface) and you’re set.
All this said, every once in a great while I find myself having a problem where something is generating code and I need to have different text in the interface and the implementation. I believe the last time this happened I was playing with Menhir and wanted the generated token type for use in the lexer to get cleaned up a bit. I believe I used a combination of cppo and some other tools to achieve what I wanted.
I’m very much behind requiring everything specified in the implementation file to be repeated in the interface; it’s just not clear to me why the opposite is also required.
If I write type foo = ... in the implementation file, it’s not clear that I intended foo to be visible outside of the implementation, and so it makes sense that I need to mention something about it in the interface file.
But say now I give a type foo = ... definition in the interface file. It’s possible for me to give a different definition of foo in the implementation (provided type compatibility rules are respected), but if I give no definition, my program is guaranteed invalid. From the admittedly small set of OCaml programs I’ve written so far, in any case where I expose a type or module type in a signature, I have no intention of specifying that type any differently in the implementation.
Obviously that option to do so should remain open, but I don’t understand why the compiler cannot use type definitions given in an interface as defaults when they are not also given in an implementation. Or if that’s too much, having a syntax like type foo = default that can be used in the implementation would go a long way towards staying DRY.
OUsing the expression type foo = ... seems to hide you the problem.
AFAIK, you should respect the best practice of using an abstract type (already discussed in this website) as follows:
(* foo.ml *)
module type FOO =
sig
type t
val bar : t -> int (* or anything you want *)
end
module Foo:FOO =
struct
type t = int list (* or anything else that makes sense *)
let bar = (* todo *)
end
This way, you can improve the structure with another better type without impacting the user of your module if he respects your module signature.
The contrary would create a strong and painful dependency, and he would be forced to update his program every time you change the structure of your module (if the signature of your module and the structure share the same concrete type).
I’m not disputing that abstract types are highly useful (and that there are many places where choosing them is the right answer). But there are other places (e.g. module types that are accepted by functors exposed as part of the API, or variants which you consciously decide expose as part of your API) where having to repeat yourself just seems needless.
I can also understand having the language be “idiot-proof” and guide people towards best practices, but take my hypothetical type t = default opt-in syntax. Something like that would still require the programmer to consciously decide, “yes, I want my interface to match my implementation for this type”, and someone would likely not learn of such a feature until they explicitly sook out how to do such a thing concisely.
I’ll also say that I don’t think @RogerT’s example is actually a counterexample to my point. To emphasize, I’m only asking why type definitions from the interface aren’t copied into the implementation, and not the other way around. In a hypothetical world where OCaml worked the way I wanted, if you had already written the implementation of type t, you would still need to separately specify its interface, and there decide that it should be left abstract. Everything put into the interface would still have to be put there manually and deliberately, and I think that that alone should be sufficient for guiding users of the language towards these best practices.
Edit
I should clarify that when I previously said
I only meant that when I write type t = ... in the interface, I’ve yet to find a case where I wanted a different definition in the implementation. I’ve definitely used abstract types and agree with their value, but perhaps it sounded like I didn’t, and that’s what @RogerT was trying to get at.
Type definitions in a signature can be abstract or concrete. You already know the pros and cons, especially for the user of the module.
Type definitions in a structure must be concrete (e.g. type abc = int * string * string list) because you need that to write concrete functions that use that concrete type.
I’m not aware of another way that may exist to use abstract type type t in a structure - OCaml experts will comment that if necessary.
You can copy type definitions from the structure (aka implementation) to the signature (aka interface). Manually or with the compiler to save time: ocamlc -i (then let types concrete or make them abstract).
You can write a module or a set of modules (i.e. a program) without writing any signature. It’s ok if it type checks. It’s usually the way one learns OCaml because it’s enough effort and adding signature may add confusion.
What may disturb you is that there are many ways to write an OCaml program.
Depending on your goal and requirements, you may find easier to start with writing the signature of all your modules because you know exactly what you want. Or you may want to start with playing a little bit with the implementation of some functions to clarify what the signatures should be (val, type, exception).
Or you may feel more comfortable with writing directly your module structures, and at the end decide if you need a signature and what signature will meet your requirements.
OCaml offers you these degrees of freedom.
So, each programmer can choose his way.
I think in the general case there’s no way to guarantee that you’d do it correctly. For example, you could have a private type abbreviation–this lets you wrap old types in new types without allocation and let users conveniently convert from the new type to the old type (one-way only). You would write type new_type = private old_type in the interface but actually define it as type new_type = old_type in the implementation.
So you could introduce a special rule for when you see private types that they must be copied from the interface to the implementation but just without the private access modifier. But what if in some case you actually wanted to keep the private modifier? Strictly speaking, it’s allowed by OCaml, but a type-copy rule would introduce a special case where it won’t be. I think this kind of special casing is what the language implementors try to avoid.
I’m only asking why type definitions from the interface aren’t copied into the implementation, and not the other way around.
I think that doing this properly in the general case requires at least higher-order unification. So it’s probably possible but it is not easy. There are some similar requirements for implementing modular implicits so I’m hoping that we can have at look at doing this as part of that work.
For cases where literal inclusion suffice, ppx_import announces in it’s README that “It is also possible to import items from your own .mli file.” I haven’t tried it yet.