Inline module expr

There was a feature I feel should be in OCaml : the ability to use a functor directly instead of having to name its result.
I felt that instead of
module SString = Set.Make(String)
let e = SString.empty
should have been able to do :
let e = Set(String).empty
In my opinion this is way clearer, and makes use of functors more confortable (up to a point).
I did a very rough draft here : https://github.com/EmileTrotignon/ocaml/tree/inline-module-expr
The implemented syntax provided is inferior to the one I wanted. First of all you still have to use Set.Make(String) and you cannot use Set(String), but I think this a library issue.
The working syntax is the following:

let empty = (.[Set.Make(String)].).empty
let set1 = (.[Set.Make(String)].).singleton "abc"
let set2 = (.[Set.Make(String)].).singleton "def"

let _ = 
  (.[Set.Make(String)].).iter (fun s -> print_endline s) ((.[Set.Make(String)].).union set1 set2)

I put those dirty “(.[” around the module-expr node because this was the fastest way for me to remove the conflicts in the grammar. It is not meant to stay that way.

And also the way I implemented that pollute the namespace inside the expressions, because I only modified the parser ( “module_expr.ident” is compiled to “let module Inlined_____ = module_expr in Inlined_____.ident”).

I make this post to discuss this feature : was it already discussed (I found no trace) ? Is it even a good idea ? If you think it is a good idea, how should I (or someone else) go about implementing it cleanly ?

2 Likes

It’s a minor thing, but I also am not super-fond of having (sometimes duplicated) modules here and there to hold the results of simple functor applications.

AFAICT, the main approach that happens to address this is used in Jane Street’s libraries, where functors are de-emphasized. Rather, a module bearing type and e.g. comparator implementations is passed by value to “constructors”:

let empty = Set.empty (module String)
let set1 = Set.singleton (module String) "abc"

Note though that this approach does carry a performance penalty, as explored here: Functorial vs "comparator" container interfaces?

FWIW, I would personally appreciate something like Set(String).empty, which could probably be made as efficient as the current functor approach.

1 Like

Well, that’s the problem.

The entire reason why applying functors in path is currently forbidden is that it is misleading wrt performances. Many users seem to think that functors are statically applied. That’s not the case. If you put a functor application in each call to Map(Foo).add, you need to compute that functor application every time, and it’s not particularly cheap.

The decision to forbid them syntactically was made at a time where the compiler did very few optimizations, which would make such repeated functor application a performance disaster.

Nowadays, you could argue that the compiler could be a lot more aggressive in optimizing such code patterns. Nevertheless, things are not so simple:

  1. Some functors are not pure, so constant-folding them changes the semantics
  2. We still do not have real program-wide optimisations, so each module will have its own application of the functor, duplicating a lot of works
  3. OCaml is often praised for having very good “baseline” performances, before trying to optimize your code. The situation with functors-in-path put it a pretty similar situation to Haskell whose perf baseline is at the whim of the optimizer.

All that being said, such optims are not exactly rocket science (flambda is pretty good at doing them) and more importantly, we are going to need them for modular implicits anyway, so who knows …

2 Likes

I did not know about the way Core does that, it’s very interesting.

I do think my implementation is as efficient as the current approach, in terms of runtime efficiency at least, if defunctorisation is a thing (is it ?)

Why not just implement this as a PPX rewriter? Something that converts

[%module Set.Make(String)].empty
into
let module SS = Set.Make(String) in SS.empty

?

I recognize that the syntax is a -little- cumbersome, but then again, this is precisely the tradeoff that was made when OCaml went with PPX rewriters over syntax-extension (and I support that tradeoff: it has achieved its desired effect – that you can safely combine multiple PPX rewriters from completely independent authors with a good deal of confidence).

ETA: Heck, it doesn’t seem that difficult to do: if you can be a little more voluble/precise as to what you think should be permitted in extension payload, I could write you a little prototype, just for the fun of it.

I do think that this needs to be a language feature, I am not sure a ppx would be that usefull.
I tried doing a ppx prototype actually, but I could not figure it out. Editing the parser turned out way simpler for some reason.
I am not even sure this is possible in a clean way with a ppx. What I understood from ppx is that they should only replace their payload with an other ast node. But “let module A = … in” is not an AST node.

Indeed, I hadn’t thought of that at all when writing my hot-take speculation. (Sidebar: I’d love to read a rationale for implementing an impure functor :grimacing:)

Isn’t this typical practice as things stand? Presuming pure functors (which I guess is already right out), consolidating and lifting those applications into one on a per-module basis would be equivalent to e.g. every module having its own top-level StringMap declaration as needed anyway.

(1) a PPX rewriter is not constrained to only rewriting its payload. And even if it were thus constrained, we could amend my starting-point example to
[%module Set.Make(String).empty] .

(2) If the payload is constrained to be an extended-module-path (from the grammar: https://caml.inria.fr/pub/docs/manual-ocaml/names.html#module-path ) I think this should be easily implementable.

(3) why do you think that a PPX rewriter-based solution would be insufficient? I mean, assuming that it is implementable?

I don’t understand the motivation. Naming intermediate results is what makes programs readable (and, together with referential transparency, understandable).

In the particular case of functors you rarely use a single value of the result. I just do not see the interest of forcing your reader to have to reparse and understand the functor application each time you use one of its value.

3 Likes

grin

(1) exceptions are implemented via extensible-variants
(2) PPX rewriters like deriving.{show,yojson,etc} that handle exceptions (and typically, extensible-variants) must have a pre-existing mutable location where they store function they compute, which is -updated- when applying deriving to an exception
(3) if such an exception is declared in a functor … voila voila voila, you see where this leads, yes?

To complement @Chet_Murthy:

  1. Even the simple declaration of an exception and/or extension of a variant is impure (it generates new ids).
  2. It’s easier than you think. let enumerate = let r = ref 0 in fun () -> incr r; !r placed inside a functor would prevent constant-folding.
  3. There are genuine impure cases, for instance when you want to encapsulate imperative context (for solver, graphic stacks, hardware stuff, etc) but keep re-entrance. You make a functor that initialize the context when applied, and then provide the API using that context.

Of course naming intermediate results is useful, and I do not think that if the feature I propose is implemented, it will be good for every use case.
However, in the case of the exemple I used, I do think that Set(String) is a better syntax than module SetString = Set.Make(String) : it is more readable, and requires less typing, there is no need to look elsewhere to know the definition of the module, and it really looks likes generics in other languages with “()” instead of “<>”.

Actually, it only requires less typing, if there’s only one use of such a module in a file, right? The minute you have a few, then giving a name to the module-application reduces typing. And readability comes down to the naming of that intermediate module, right?

To go back to what @Drup wrote, one of the good things about OCaml, is that if you write code in the usual way, you can have a good idea what the performance is. Module-application is not the same as function-application: it can produce code-bloat, and adversely affect performance.

We’ve all worked with languages where choices were made to produce more succinct code, or code with less apparent duplication. For larger programs, it’s not clear that those languages do any better than OCaml, and sometimes they do markedly worse.

Nevertheless, as I described, you can get what you want, with a PPX rewriter. It would seem like that’s a largely sufficient solution.

1 Like

Yeah the typing argument is mostly dependant on what name you pick and how many times you use the module in the file.

Regarding the fact that it would make performance non obvious, this actually makes a defect of OCaml more obvious : functor application, for at outsider, look like they should be computed compile-time. They are not, except if optimized away I guess (I would really like information about in what case this is done). I think there should be a mecanism for compile-time module generation a bit more explicit than “maybe the compiler or flambda will do it for you”. Macros could do that for instance, I think there was a paper on that by Leo White, did it go anywhere ? (and even then, if the macro generates a struct, you need to name it before accessing it).
The arguments above about non pure modules are quite convincing too…
An argument I forgot to include was that it would feel nice for the sake of generality. Why is module access restricted to module_path when you have module_expr that could replace it and is more general ?

OK, because I have nothing better to do (on this “are we going to be Fascists or not?” day in the USA), I implemented the PPX rewriter I proposed. You can find it at: https://github.com/chetmurthy/pa_ppx_moduledot . If you checkout this repo, there are instructions for installing supporting packages and then building. Let me know if you have problems, and of course for further suggestions for development.

Notes:

(1) this is based on camlp5 and pa_ppx so you cannot (directly) combine this PPX rewriter with the “standard” collection that most Ocaml users prefer. [Though, there are simple ways of combining them, by preprocessing with camlp5 first, and then handing the result to ocaml+ppx.] However, there are a bunch of work-likes in pa_ppx for most of that standard collection, and a bunch of other ones that do new and interesting things.

(2) Notwithstanding, I’m not trying to convince you to use pa_ppx and camlp5. This PPX rewriter took 21 minutes to write (from README thru the tests) and the actual rewriter is 1k of code. It should be easy to write such a thing using the standard PPX rewtier tooling – dead easy in fact.

(3) I’ve chosen to use a hard-coded module-name. Obviously this could be replaced by a fresh-variable name, but that would have put me over the 30-minute mark, just deciding how to create fresh variables. [just kidding, I was just -lazy-]

Long-winded post coming (and I’ll be repeating some of the arguments already made by others). TL;DR version: what you propose is not easy to implement, and even if we implemented it binding your functor applications would always be better.

Most functor applications create types. For applicative functors (most of them), this isn’t that much of a problem as Set.Make(String).t is a valid type that is common to all applications of the Set.Make functor to the String module. For generative functors this is much more problematic. If we had defined Set.Make as a generative functor, then the following code would fail to typecheck:

let empty = Set.Make(String)().empty

This is because the type of empty would only be valid in the scope of the functor application, and by not binding the functor application you don’t have a scope associated to it.
Not having a good place to bind types is also a problem for applicative functors, but slightly less so as it’s more likely that the types created can stay valid in a larger scope.

Regarding compile-time evaluation of functor applications: functors are just regular functions under the hood. There is no way to know whether two successive applications of the same functor to the same argument are supposed to produce the same result, without deep inspection of the functor’s code.
For applicative functors, you do have the guarantee that two applications of the same functor to the same argument produce the same types, but that doesn’t mean that the values are the same.
To get around this, Flambda asks the front-end to flag function definitions that correspond to functors, and will try to inline those functions more aggressively than usual. In most cases, this will end up looking like compile-time evaluation, but with your approach of using applications at each use of the functor you’re making the compiler do the same work a lot of times, while binding the functor application to a name as early as possible means the work is only done once. Some work on purity annotations could improve the situation, but this is still not completely trivial (typically, if you have two different functions in your program that call the same pure function with the same arguments, it’s not always obvious that you can share the result as you would have to move the call to a scope that is visible from both use sites. This could result in unnecessary extra computations if the two functions are called at most once each and never both of them).

And I’m not fond of comparisons with generics. In a way, the generic Set<String> corresponds in OCaml to string set (for some parametric type set). The reason why OCaml uses functors for sets and not regular parametric types is that a set is parametrized by its comparison function. In the generics variant, this requires that you can find the canonical comparison function for each type. With functors, you can have two different set types with elements of the same type, but with different comparison functions.

If you want to implement something that looks like generic sets in pure OCaml, here is one possibility (I’m choosing a list-like implementation for simplicity, but this works equally well with other data structures):

type 'a set = Empty | Cons of 'a * 'a set constraint 'a = < compare : 'a -> int; ..>

let empty = Empty

let rec mem x = function
  | Empty -> false
  | Cons (y, l) ->
    let c = x#compare y in
    if c = 0 then true
    else if c > 0 then mem x l
    else (* c < 0 *) false

let rec add x = function
  | Empty -> Cons (x, empty)
  | Cons (y, l) as l0 ->
    let c = x#compare y in
    if c = 0 then l
    else if c > 0 then Cons (y, add x l)
    else (* c < 0 *) Cons (x, l0)
     
(* Other functions here *)

The main drawback is that set elements must have a comparison function canonically associated to them, so it will only work with objects (or maybe with modular implicits in the future). But if you want to instantiate it with strings, all you have to do is to wrap your strings in objects:

let wrap str =
  object method s = str method compare other = String.compare str other#s end

let _ = Set.mem (wrap "Hello") (Set.add (wrap "World") Empty)

This is basically how generics work in object-oriented languages. In OCaml it’s very inefficient, because little work has been done on optimising objects (and it would actually be very hard to do given the other constraints of the language), so the functor approach is recommended instead. Also, the functor approach guarantees that all sets of the same type agree on the comparison function, while it’s not enforceable in the object-oriented world.

Regarding simplified paths, currently compilation units cannot be compiled as functors, so whenever you need to use a functor you need to either have it bound locally (you can do module Set = Set.Make if you want), or access it through an indirection. There is some ongoing work on lifting the restriction and allowing compilation units that are functors, but in most cases it’s better to pack together the functor itself and the various interfaces it uses (for the arguments and result) so I don’t expect this feature to be used for shortening functor paths.

Finally, a few thing I’ve seen to deal with the problem:

  • Make a module, in your project, that contains all the functor applications you often need, along with short aliases for commonly used modules if you need to. You can then make this the only opened module in each of your files, giving good readability without making it too painful to resolve what a short alias refers to.
  • When you define a module around a specific type, pre-instantiate commonly used functors. You would then use Mytype.Set in subsequent files, instead of Set.Make(MyType). You can even do this for standard library modules if you want:
(* myString.ml *)
include String
module Set = Set.Make(String)
module Map = Map.Make(String)

Combined together, these two tricks can likely remove most of the functor applications in your code, without making it too hard to understand.

8 Likes