Concise Module Syntax

language-design
ideas

#1

I just want to throw out there an idea I’ve had for enhancing OCaml’s syntax and gauge the reactions.

Because we have no type-based dispatch (outside of objects), it’s really important for us to be able to be able to create modules on the fly in OCaml. This is similar to the way that functions have low-cost syntax – the easier it is to create lambdas, the more they can be used. Well, OCaml’s ‘currency’ is really modules, and the easier it is to create and manipulate them, the more easily they can be used:

module MyModule = struct
    type t
    let foo x = x
    let bar y = y
    let baz z = z
end

(* This is syntactic sugar for importing only type t, val foo, val bar *)
open MyModule[type t; foo; bar]
(* It should be approximately equivalent to
module M = struct type t = MyModule.t let foo = MyModule.foo let bar = MyModule.bar end
open M *)
...

(* Syntactic sugar for locally opening MyModule with only foo and bar exposed in the type *)
MyModule[foo; bar].(let x = x + foo 3 in print_int x)

...

(* If a local selective open is made multiple times, we may want to factor it into a module: *)
module M = MyModule[foo; bar]
(* This is equivalent to 
    module M = struct let foo = MyModule.foo let bar = MyModule.bar end
*)

The concise syntax would be shorthand for creation of a local anonymous module. This takes care of the problem with local opens introducing potentially shadowing variables into a local context, as indexing only the variables you want to include is trivial:

(* In file modules.ml *)
module A = struct
    let foo x = x + 1
end
module B = struct
    let bar y = y + 2
    (* The following line is sneaked in one day, potentially causing a problem *)
    let foo y = 15
end

(* in file main.ml *)
open Modules
let () =
    A.(foo 10 +
        B.(bar 4 + foo 12)) (* here we aren't calling the right foo anymore *)
    |> print_int

(* Safe alternative: *)
let () =
    A[foo].(foo 10 +
        B[bar].(bar 4 + foo 12)) (* this works *)
    |> print_int

(* Alternatively *)
let () =
    let open A[foo] in
    let open B[bar] in
    foo 10 + bar 4 + foo 12 |> print_int

(* Also, opens are the only way to import operators, so *)
module Infix = struct let (-) x y = x - y let (+) x y = x + y end
open Infix[-]
(* Is a great way to specify which infix operator you want *)

(* It's also worth thinking about excluding certain imports. Sometimes this is more concise *)
module M = module MyModule[not foo; not bar]
(* would take everything in the module but values foo and bar. Once you choose to exclude values/types, you cannot include any, as that would become too confusing *)

I’m sure I’m not thinking of nuances in the module language right now, but I really think this could solve a lot of problems for us, while making the module language easier to use. Opinions?


#2

The final example looks even better as

let () =
  A.foo 10 + B.bar 4 + A.foo 12
  |> print_int

You might prefer the A.() syntax when the names are longer than A and B, but when it gets as gross as that nested-open example, you can shorten the module names right in the let binding:

let () =
  let module A = AReallyLongName in
  let module B = BaReallyLongName in
  A.foo 10 + B.bar 4 + A.foo 12
  |> print_int

This stops being an alternative when you’re opening the module to get its operators, but I would also think it very tedious to have to say B[bar;+;*;-] oh wait I also need B[bar;+;*;-;/] - I’d rather use non-operator aliases like A.sub, A.mul, and continue to avoid the nested open.


#3

IME, from other languages, explicitly listing out all the things you want from a namespace is extremely common - the prevailing practice in Rust, C++, Python, JS, and basically any other language is “just import the things you need”. It was very weird for me to come to OCaml, where the only thing you can do is open the whole module. It seems reasonable to have some syntax to allow this style, anyways.

IMO, basically the only modules that should be open’d are operator modules - Arith modules, Monad modules, etc., which have a guaranteed interface.


#4

Sure, and that’s an option in some circumstances. The examples are contrived for obvious reasons – there are places where a local open is the easiest option. For example, you cannot use B.(+) in an infix manner, so for infix operators, you have to use local opens. Also, if you’re repeatedly accessing the same module (many more times than the number of elements you’re actually importing), you may want to locally open it. It’s nice to be selective in what you’re importing in this case to make sure you’re not hit by an import bug later on.

Additionally, the benefit of selectively importing at the module level is huge, since you’re no longer filling your namespace with potentially unwanted values/types.


#5

I agree 100%. It’s even worse in OCaml since the lack of type-based dispatch (objects in Java/C++/Python, traits/type classes in Rust/haskell) means we constantly have to open modules locally for convenience. If we integrate my suggested syntax, it may even be worthwhile to add an optional warning when a complete open is made of a module from another file.


#6

The usual thing you do in OCaml though is not open modules. You can refer to their contents by name. When the name is long, you can shorten it. I’m sure there’s some JS framework where you do a lot of Python-style ‘import froms’, but most of what I see there is more similar to OCaml: you bind a module to a variable to use in your code and you can use a terse name like fs for convenience.

If you reuse lots of operators (not defining +/ or +: but just + and -) then you’ll want to use lots of local opens. If you use lots of local opens you’ll sometimes want to nest them. If you nest local opens because you’re always reusing names you’ll run easily to conflicts with names – so you’ll want local code duplication in the form of a list of the names you’ll be using right next to your uses of those names. Not just the end result, but some intermediate steps of this sequence are pretty ugly, but you put yourself on this path by trying to be cute. I’d rather see development on alternatives to this entire sequence, but I agree that a selective local open is a natural result of adding a problem like a nestable local open.


#7

The final example is very ugly. It is also much prettier than any real-world use would be as it uses very few (2) and very short (3-letter) identifiers. Wouldn’t you actually create selective aliases earlier in your code, and then open those for that example? In which case the final example would look exactly like the penultimate one, using current syntax. Although you still might like like a terse syntax when you create the aliases.


#8

Anybody else have feedback? It’s ok if you don’t like it, I just want to know if this is something worthwhile doing.


#9

I think being able to selectively import only some names from a module is a very worthwhile goal. I’m not 100% sure how I feel about your syntax, but I have no better suggestion at this time.


#10

I also tend to prefer alternatives that would limit and discourage opens (including locals) by making qualified use even more effective and flexible. However, I cannot deny that it seems idiomatic to open certain kinds of modules:

  • global opens for modules that provide a number of submodules (like Core or Notty)
  • local opens for modules that provide infix operators and other DSLs (like monad libraries or HTML generators in Ocsigen).

If I were trying to achieve open MyModule[type t; foo; bar] with existing syntax, I’d go for

module M = MyModule
type t = M.t
let foo = M.foo
let bar = M.bar

which is less verbose than declaring a submodule and opening it, but only barely, and still twice as verbose as the syntax you propose. ime, these are pretty good grounds to support your proposal.

I am less swayed by the local use, since MyModule[foo; bar].(let x = x + foo 3 in print_int x) is significantly more verbose and redundant than let x = x + MyModule.foo 3 in print_int x

Similarly, I’m not sure I see what there is to gain from this pattern: module M = MyModule[foo; bar]. Once the name space is contained in M, qualified usage should suffice to keep things clean, and we’d only need this explicit export syntax if we decide to open M later. Perhaps this is so you can open M later without worrying about pollutants? Maybe this make sense…

I think we might make a stronger case for this usage

let () =
    A[foo].(foo 10 +
        B[bar].(bar 4 + foo 12)) (* this works *)
    |> print_int

by using the let syntax for local opens:

let () =
    let open A[foo] in
    let open B[bar] in
    foo 10 + bar 4 + foo 12 |> print_int

In general, I think I am favorably disposed towards this idea!


#11

Imagine that foo and bar had been used 5+ times each within those parentheses, and it makes a lot more sense.

I didn’t list it, but I meant to imply that all module syntax would have this extension, including this form of local opens.


#12

I figured. I only meant to suggest that an example along these lines might more compelling, from a rhetorical perspective, as it doesn’t have to cope with the nested parens and indentation.


#13

Good idea. I added your version.


#14

This might be a slow-traffic week due to holidays in Europe, North America, and some other regions.

I like the proposal a lot. Not sure how much weight my opinion carries, as I think that I’m still an atypical OCaml user. (On the other hand, users and potential users need not all be typical.)

First, for me personally, this syntax

module M = struct
   type t = MyModule.t
   let foo = MyModule.foo
   let bar = MyModule.bar end
open M

always seemed unnecessarily verbose and a bit unnatural. Its relationship to

module MyModule = struct
    type t
    let foo x = x
    let bar y = y
    let baz z = z
end

is initially confusing, since the purpose of the second bit of code is to define a module, while one’s intention in using the first bit of code is just to make available some but not all of what’s defined in the defined module. It’s all intelligible in the end, but when you’re starting out, keeping this straight requires attention and practice. Your proposed syntax, @bluddy, by contrast, is simple to use and clear to me from the start.

Second, personally I don’t especially like it when someone opens a module at the beginning of a file. This practice seems somewhat common, and I’m not saying it’s a bad practice. However, it means that when you’re reading the code, if you don’t know Pervasives and the rest of the language well, then you don’t know where to look for a function that’s not defined in the file. If there are multiple modules that have been opened, you have to look through the docs or source for all of them. When a module shadows something in Pervasives, you might not realize that until something function is preface by “Pervasives.”, which is also a surprising thing to see. (I think that it’s occasionally reasonable to shadow functions in Pervasives.) By contrast,

open MyModule[type t; foo; bar]

at the beginning of a file notifies you right away where t, foo, and bar come from.

Your proposal is similar to a common idiom in Clojure in which only specific named functions are imported from a module. I don’t use that feature often in Clojure, but I do use it, and it’s clear that many Clojure users prefer it, fwiw.

In both Clojure and OCaml I typically define short module aliases at the top of a file and then preface all functions from another module with the appropriate alias. I use local opens sometimes, too. The module alias strategy is a bit silly for operators, though, so I would use the proposed syntax for operators that I want to use throughout a file, and would probably use it in a few other cases. However, given what’s common in other languages such as Clojure, I think that syntax like yours would get used by others, though perhaps not by many people who’ve been using OCaml for a long time, since the current methods would usually have become second nature for them.


#15

FWIW you can do something like this currently (though not for types):

let foo, bar = MyModule.(foo, bar)

#16

In case nobody has mentioned it yet, there is also the following trick:

include struct
  open M
  let foo, bar = foo, bar
end

#17

That’s rather clever. I think I’m going to have to remember that one…


#18

As a random rant, an unstructured flow of thoughts. First of all, I have a feeling that you’re trying to solve a wrong problem. It’s like trying to fix skis that do not slide on a gravel, though the real solution would be not to use skis on the gravel, in the first place :slight_smile:

First of all, my personal opinion is that,

let () =
    let open A[foo] in
    let open B[bar] in
    foo 10 + bar 4 + foo 12 |> print_int

Should be written as

let () = 
    print_int (A.foo 10 + B.bar 4 + A.foo 12)

That’s more readable and this is the essence of modularity. When I will read a piece of code that looks like your snippet, I would mentally do the translation to my version, because I want to know, at the point of call, which function is which.

Probably, what you’re trying to do here, is to reinvent type classes or modular implicits, by moving name bindings from explicit to implicit. In that case, a better syntax would be:

val main : Fooable a, Barable b . a -> b -> unit
let main x y = print_int (foo x + bar y + foo x)

In a long-term, I’m not sure, that type classes or modular implicits will actually make the code readable, as you should always manually consider the context (and the type specifications would be ubiquitous, that’s kind of defeats the whole purpose of the type inference). Though they indeed help to write code that is more concise. But more concise doesn’t mean better. Readability is what matters. Haskell code is unreadable, unless it is a toy example. OCaml is for big real-world programs, that can survive only on a foundation of strong modularity.

Now lets deal with another misconception - that in OCaml you can’t import only a part of a module. Of course you can, and this is a part of a module system.

Where in Python you will say

from factory import create

in OCaml you will say

include (Factory : sig type t val create : unit -> t end)

when in Python you say

import factory.create 

in OCaml you write

module Factory = (Factory : sig type t val create : unit -> t end)

In other words, the module type defines the list of module items that are imported. So what we really lack in OCaml is that we can’t say

open (Factory : sig type t val create : unit -> t end)

we need to do

module X = (Factory : sig type t val create : unit -> t end)
open X

And we don’t have concise for composing module types and modules on the fly. A good example of such syntax would be Coq, for example, in Coq T <+ Fooable <+ Barable is the same as

sig 
  include T
  include Fooable
  include Barable 
end

And the same could be done with modules, i.e., <+ corresponds to the include statement on the module level.

However, this won’t work, since T, Fooable, and Barable, all define the same type t. And Coq provides a nice solution for this problem. In Coq, when F in include F is a functor, then it automatically applies the current module to F. This is crazily nice, and it not only helps with the problem above but would be also useful, when you gradually derive a module definition using functors from some base.

We also lack a small syntactic sugar that will allow us to automatically import definitions from a parameter in a functor, e.g.,

module type Fooable (include T : T) = struct
  val foo : t -> int
end

is the same as

module Foo (T : T) = struct
  include T
  val foo : t -> int
end

We also lack a direct ability to define parametrized module types. We have parametrized modules, called functors. But we don’t have parametrized signatures, i.e., signatures that are parametrized with signatures.

For example, if we will have all these nice features we should be able to define our example as follows

module type T = sig type t end
module type Fooable(include T : T) = sig val foo : t -> int end
module type Barable(include T : T) = sig val bar : t -> int end

and now we can write the main function using first class modules instead of modular implicits or type classes:

let main (type x y) 
   (open Foo : T <+ Fooable with type t = x) 
   (open Foo : T <+ Barable with type t = y) x y = 
   foo x + bar y + foo x

and the invocation will look like

let () = main (Int <+ struct let foo x = x end) (Float <+ struct let bar = int_of_float) 42 42.

Still too verbose though, so apparently we also need a shorter syntax for the struct/end construct, if we are going to use lots of unnamed modules, i.e.,

let () = main (Int <+ {let foo x = x}) (Float <+ {let bar = int_of_float}) 42 42.

Or even allow to drop struct/end in certain context (i.e., in x and y in x <+ y):

let () = main (Int <+ let foo x = x) (Float <+ let bar = int_of_float) 42 42.

I believe, that all tricks above could be seen as pure syntactic sugar, and thus they could be implemented in camlp4 or ppx or some other preprocessor. Alternatively, you can use Coq as a preprocessor for OCaml :smiley:


#19

How about allowing the limited import only in local opens,
let open A.[foo] in
and possibly
A[foo].(...) ?
Since defining the limited module seems only useful when opening it anyway.

Also, out of curiosity, is this not expressible by a module signature?
E.g.
MyModule[type t; foo] would be equivalent to
module Mfoo : sig type t val foo end with type t = MyModule.t = MyModule

Edit – Unfortunately I wrote this before reading @ivg’s answer. My first suggestion is somewhat similar to the idea of a direct signature specification in the open statement it seems.


#20

I normally really, really like your writing, but in this case, it seems to me like you didn’t read the thread up to this point. Yes, the example is contrived. What about infix operators? What about many usages of the same few values? The M.( ... foo ...) pattern is widely used in OCaml for a reason.

I’m not mistaking it for modular implicits, which I would love to have (I expect them to land around summer of 2027 – get hyped everyone!), and your other syntax suggestions are interesting but not relevant to my suggestion here. In general, I find first class modules incredibly disappointing, since having to pass both the module and the data separately is very messy. Until OCaml gets proper type-based dispatch, it’s lagging severely behind other languages. But that’s not the point of this thread.

This is the point of this thread. Look how heavy the syntax is compared to python. I don’t want to worry about the types of the module I’m importing when importing selectively. This is why we want to have a concise syntax. It’s especially bad precisely because OCaml doesn’t have type-based dispatch (unlike python, for example), so we have to constantly import modules. Modules are our ‘main weapon’, so to speak. And because of that, the less friction there is when using modules, the better, in my opinion.

And before somebody comments that you could also create a local module without types (I realize that’s the case, but the syntax is still heavy - as @mmottl pointed out above, for example,) I realize that, but it’s still heavy syntax and doesn’t cover all the corner cases as neatly as my proposal. IMO, we need a fast and easy way to selectively open modules and exclude/include contents.

Currently, OCaml programmers are seeding potential bugs everywhere by having blanket module opens and local opens that can easily be messed up by shadowing as time goes on, so I consider this to be an important concern.