ICFP and improvement of module language

kantian · October 24, 2024, 4:27pm

This year, at ICFP, Samuel Vivien (@samsa1) did a presentation of his implementation of modular explicit (corresponding paper and PR). It introduces an extension of first-class module with module-dependent functions that allows writing code à la Haskell type-class (or Rust traits) in a simpler way. For instance:

module type Type = sig type t end

let id (module A : Type) (x : A.t) = x

(* instead of the more cumbersome *)
let id (type a) (module A : Type with type t = a) (x : a) = x

There is also, independently, a RFC to have type as possible module arguments.

In Haskell (or Rust) there is a clean syntax when a function depend on a type that must satisfy multiple type classes (or traits). So, I was wondering whether a new operator in the signature language might also be useful.

I would propose to add a binary infix operator & in the signature language to write code like this:

let foo (module A : S1 & S2) (x : A.t) = ...

Today we have to define a new module type in order to do so (which is cumbersome):

module type S3 = sig
  include S1
  include S2 with type t := t
end

let foo (module A : S3) (x : A.t) = ...

(* we could also write *)
module type S3 = S1 & S2

Its semantic will mimic the deductive rule of conjunction with one introduction rule and two elimination rules:

(* introduction *)

  (M : S1)   (M : S2)
--------------------------
     (M : S1 & S2)

(* elimination *)
  (M : S1 & S2)
------------------
    (M : S1)

  (M : S1 & S2)
------------------
    (M : S2)

Do you think it is a good idea ?

samsa1 · October 24, 2024, 4:42pm

Just as a notice : the elimination rules are wrong in the general case because we can have shadowing between names that are in S1 and S2. (in that case S1 & S2 and S2 & S1 are strictly different)

mbarbin · October 24, 2024, 7:47pm

Your post motivated me to finish a draft I had sitting on my machine for a while. This is a tutorial that demonstrates a related (but unsafe!) approach using a handler (a runtime value containing first-class modules). I added a section with an example requiring multiple Traits.

I hope you’ll forgive the shameless plug — it’s just to illustrate the concept. Having support for this kind of functionality in the module language sounds useful to me.

kantian · October 24, 2024, 9:05pm

The & operator doesn’t operate on module but on signature, so overlapping is not a problem. You can prove, with those rules, that & is commutative and associative, in the same way you prove that conjunction is such in propositional calculus.

Here some properties of & (I will use = to denote equivalence[1] of signatures):

S & S = S (* idempotence *)
S1 & S2 = S2 & S1 (* commutativity *)
S1 & (S2 & S3) = (S1 & S2) & S3 (* associativity *)
(S1 & S3) & (S2 & S3) = S1 & S2 & S3 (* overlapping is not a problem *)

(* There is an empty signature, unique up to equivalence, satisfied by no modules *)
module type Empty = (Type with t = int) & (Type with t = string)

(* There is a top signature, satisfied by any module *)
module type Top = sig end

S & Empty = Empty & S = Empty
S & Top = Top & S = S

I think such an operator will be a great addition to the signature algebra, and it will ease the use of traits with the new modular explicit syntax.

[1]: Two signature S and S' are equivalent when a module M satisfies S if and only if it satisfies S'.

No problem. It’s great to have example where this functionality will be useful (in addition to its use in Haskell and Rust).

samsa1 · October 25, 2024, 7:55am

As such you have a distinction between

module type S3 = sig
    include S1
    include S2 with type t := t
end

and

module type S3bis = S1 & S2

?

For example if I have :

module type S1 = sig
    type t
    val v : t -> t
end
module type S2 = sig
   type t
   val v : t * t
end

I’ll have S3 = sig type t val t * t end with is strictly different from the presentation of S1 & S2 you gave above.

lukstafi · October 25, 2024, 8:16am

This, but not limited to t, rather for all overlapping abstract types.

EmileTrotignon · October 25, 2024, 9:18am

I like this idea.

I think the plan to solve this was to use sub module.

So module Int would have a Print module and a Compare module.

A function that uses both would need two arguments.

I think your proposal looks better, and require less changes to libraries to work, so I like it better.

dbuenzli · October 25, 2024, 9:47am

Really ? That sounds horrible.

What’s wrong with defining

module type Myexpectation = sig
  include Ordered 
  include Printable with type t := t
end

And then using:

let foo (module A : Myexpectation) (x : A.t) = ...

I don’t think it’s that bad in practice. E.g. see here and usage here for examples.

EmileTrotignon · October 25, 2024, 9:55am

I am not a 100% sure about that, but I think I remember a demo of modular implicit that had very small submodules (the paper for modular implicit has small submodules, I think thats because the module have to be explicitly marked as implicitly selectable)

They have advantages as well, one being that you can make the submodule a functor for parametric types.

print (module List.Pp(Int.Pp)) my_list is noisy and annoying but is there an alternative ? Maybe print (module List.Pp(Int)) my_list would be nicer, but less consistent (maybe thats fine ?)

You proposition with an explicit module type would be quite annoying when you just want to use temporary printers for a function that already has a module input, because of the amount of boilerplate. Appart from that I think its ok,

kantian · October 25, 2024, 3:03pm

Yes, I imagined the & as a new primitive construct in the module type language (which doesn’t necessary need to be translated in sig ... end definition under the hood). ~~Your two module types S3 and S3bis are distinct but equivalent: they have the same extension, any module that inhabits S3 also inhabits S3bis and reciprocally.~~
Edit: I had misread your example, and didn’t see at first that S1 and S2 disagree on the type of v. In this case S3bis = S1 & S2 is empty (no module can inhabit S1 and S2) and non equivalent to S3 = S2.

You could see this new construction S1 & S2 as a convenient way to define the greatest common subtype of S1 and S2 (up to equivalence). The elimination rules state that it’s a common subtype, and the introduction rule states it’s the greatest one.

It’s a more convenient (and I would say a more ergonomic way) for a programmer to say that a type should satisfy the constraint in S1 and the constraints in S2 than with the sig include S1 include S2 with type t:=t trick. In other words, we add the notion of product (conjuction), that respects subtyping, as a primitive in the module type language (because that’s really what the programmer has in mind when he uses the include trick).

@EmileTrotignon : in the modular implict, there is a lot of small submodules because of the diamond problem when you want to constraint a type class with another one (in Haskell, a type that implement the Ord class must also implement the Eq one). But I’m not sure it’s a good thing to try to mimic Haskell on this side.

@dbuenzli: there is nothing wrong with include trick, it’s just cumbersome. The modular explicit feature doesn’t add any expressiveness to the language, but it makes it easier and more ergonomic to use some (already existing) features. You could for instance write:

module type Myexpectation = Ordered & Printable

And, honestly, I find it clearer what it means (and obviously shorter).

kantian · October 25, 2024, 3:42pm

This proposal won’t solve the diamond problem of modular implicit. The diamond problem arises, indeed, when there is overlapping signatures and different solutions for the implicit argument. The problem will stay with the & operator. The solution proposed to solve this was indeed to use small module aliases.

dbuenzli · October 25, 2024, 3:57pm

Sure but I bet the complications start when you factor in the whole type signature language, rather than a simple flat interface with a single type t = … and a bunch of functions.

In other words, I have nothing against your proposal but I suspect that the semantics of & would be a bit more involved than your exposition so far.

dlesbre · October 25, 2024, 4:00pm

From what I understand, your method of resolving conflicts in two modules combined by & (eg types that cannot be equal, or values with different types) is to just declare that is the Empty (which I would rather call unsatisfiable) signature?

(* There is an empty signature, unique up to equivalence, satisfied by no modules *)
module type Empty = (Type with t = int) & (Type with t = string)

I’m not sure that is the best way to deal with such conflicts.

For one, when combining large signature you might not know there is a conflict. Another case is if you are using signatures from a library, any additions to their signature could lead to new conflicts and unsatisfiable types (whereas addition are generally considered harmless interface changes). Therefore I believe such cases should be a least a compilation warning, and most probably an error.

Just to throw a few more ideas of how conflicts should be resolved:

Unsatisfiable types: if both S1 and S2 define a value v with incomparable types, then S1 & S2 cannot be instanciated
Raise an error
Shadowing: S1 & S2 has the value v with type specified in S2
Intersection: only keep the part of the module with no conflict: S1 & S2 has no value v, but keep their non-conflicting components.

Its also possibly a good idea to distinguish between type conflicts (S1 and S2 define t in conflicting ways) and value conflicts (they both have a value v with different types). For instance, we could throw an error in for type conflicts and use shadowing or intersection for value conflicts.

kantian · October 25, 2024, 7:43pm

It’s semantic is complete. When you have an object in your hand, if you know it’s an apple and it’s red, you can conclude it’s a red apple (introduction rule). If you know it’s a red apple, then you know it’s an apple and it’s red (elimination rules). That’s all you need to know in order to use correctly the expression red apple and know what it means.

By the way, it may not be so easy to implement this in the actual code base of the compiler. I don’t have any knowledge of the module language type checker implementation.

@dlesbre: it may be useful, from an UX point of view, to throw an error when a type is unsatisfiable because the constraints are contradictory. I use the example of Empty to show the correspondence with conjunction in propositional logic. By the way, among your ideas, the last one will break the elimination rules and you can’t find any module M to instantiate the introduction rule: if the two module types disagree on the type of a value, their greatest subtype is empty.

I realize that I read too quickly the example of @samsa1, and indeed S3bis = S1 & S2 is empty and so is not equivalent to S3 (the one with the include trick). If you know that the object in your hand is an apple and that is also a pear, then you have nothing in your hand. It’s emptiness is a consequence of the impossibility to instantiate the introduction rule because the two premises are contradictory. If you could have instantiated the introduction rule, then it’s always necessary safe to use the elimination ones: they are just an analysis principle to retrieve the two premises used in the introduction rule. You don’t have to worry on how to use something, if it’s impossible to produce it at first.

The difference between & and include trick is that the former is a semantic notion whereas the latter is just a syntactic one (an advanced form of copy/past), like the distinction between subtyping and inheritance in the object fragment of the language.

lukstafi · October 28, 2024, 4:34pm

Is wanting this feature right now a sign that my design is bad, or is it an overlooked gem of an idea?

kantian · October 28, 2024, 5:39pm

Which feature ? The operator & I described ? If so, I think it will be more useful module dependent functions that have not already landed in OCaml (the PR is still under review). By the way, you can still achieve its results with the include trick (without a check that the signatures may overlap and disagree).

I think a little bit more about what @dbuenzli and @dlesbre said. Today conflicts are resolved by raising an error, It should be the same with &. For instance that’s what I get today if I try to define the empty module type:

module type Empty = module type of Int with type t = float;;
Error: In this `with' constraint, the new definition of t
       does not match its original definition in the constrained signature:
       Type declarations do not match:
         type t = float
       is not included in
         type t = int
       The type float is not equal to the type t

Concerning the application of & to the whole type signature language. Today the include directive is limited to signatures. We can’t use it with functor type.

module type F = sig end -> sig end
module type S = sig include F end;;
Error: This module type is not a signature

I conjecture that the use of & on signature will be equivalent to the include trick with a check that the constraints are not contradictory (@samsa1 example’s S1 & S2 will fail). But I think it will be possible to apply & also to functor types. Since -> is contravariant in its domain and covariant in its codomain, we should have:

(A -> B) & (C -> D) = (A | C) -> (B & D)

where | is the dual of &: the least common supertype. Since I don’t think we should also add the | operator, this module type should fail if we don’t have one of this subtyping relation A <: C or C <: A.

Are you referring to the way you use first-class module in the design of your code ? If so, you can do as @dbuenzli suggest in this comment. It will give you the same result without the “safety” check.

octachron · October 28, 2024, 6:21pm

@kantian, your semantics is partially incomplete because it doesn’t take in account projections:
defining (M <: x & y) as M <: x && M <: y is not enough, one also need to be able to define the semantics of (M <: x & y).field.

In particular, runtime position of value component and runtime size are part of the type of module type for the intermediary language of explicitly typed OCaml program.
This discrepancy between the source language and the explicitly typed language is solved by inserting runtime coercion. For instance, in the last line of

module F(X: sig val y:int val x; int end) = struct end
module XY = struct let x = 0 let y = 1 end
module E = F(XY)

is translated to something that would be equivalent to:

module E = F( (XY :> sig val y:int val x:int ) )

Taking this point in account, & is not really commutative and for instance x & y places the
fields of x and then the missing fields from y. In particular,

module type t = x & y

is not equivalent to

module type t = y & x

However, that “low-level” concern does seem quite well contained, and does not sound like it would impact significantly the usability of &.

A more impactful concern might that all definitions in M: x & y should be writable in isolation, which means that the type system cannot skip the elaboration of x&y into a signature, aka a list of individually well-formed definitions and rules out all bottom signature.

lukstafi · October 28, 2024, 6:44pm

Thanks all for the discussion! Yes: the operator &, with the operational semantics of the include trick plus error check, with the substitution of all shared types in the second signature, with a type error instead of shadowing if there is type mismatch between same-named fields.

Edited to add: shadowing for include is a good feature – the operator & would be complementary in this regard.

kantian · October 28, 2024, 11:37pm

I tend to considered this as an implementation detail. The compiler is free to choose the runtime representation it wants. Moreover, strictly speaking, x & y is not a signature (but a module type) and a module at this type doesn’t have any field. It’s only if x and y are signatures, and after the application of one of the elimination rules, that you can speak of a field of your module.

When I said it is commutative, I say it is so up to equivalence (like in category theory we always state proposition up to isomorphism). For instance, consider this two module type:

module type Eq = sig type t val eq : t -> t -> bool end
module type Ord = sig type t val compare : t -> t -> int

Then the following two modules types are distinct, they don’t have the same runtime representation, but they are equivalent:

module Eq_and_Ord = sig
  include Eq
  include Ord with type t := t
end

module Ord_and_Eq = sig
  include Ord
  include Eq with type t:=t
end

Then the compiler may choose that Eq & Ord is the first one and Ord & Eq the second one: they are indeed distinct but equivalent. They have the same extension (any module that inhabits one inhabits the other), the fact that the casting operation is not a no op at runtime is an implementation details (it could also have been a no-op as in the object fragment of the language). In the same way that the proposition A and B and B and A are two distinct but equivalent propositions in propositional logic.

kantian · October 29, 2024, 8:03pm

After some reflection, I thought it was indeed incomplete and that I have to extend & to the core type language (because of subtyping relation in object and polymorphic variant). Fortunately, these relation are not taken into account with type coercion in the module language:

module type A = sig type t = <x : int> val v : t end
module type B = sig type t = <y : int> val v : t end
module type C = sig
  type t = <x : int; y : int >
  val v : t
end

module Cast (M : C) = struct module M : A = M end;;
Error: Signature mismatch:
       Modules do not match:
         sig type t = < x : int; y : int > val v : t end
       is not included in
         A
       Type declarations do not match:
         type t = < x : int; y : int >
       is not included in
         type t = < x : int >
       The type < x : int; y : int > is not equal to the type < x : int >
       The second object type has no method y

At first I was worry that A & B = C, but its not the case. The only rules we should add is that t & t = t and t & t' = bottom if t is not equal to t' (it seems that’s the rule implemented in the module type checker), with this other rule for values

val v : t    val v : t'
-----------------------
   val v : t & t'

That’s should be sufficient to do this:

Let see with the S1 & S2 example:

(M : S1 & S2)              (M : S1 & S2)
-------------              --------------
   (M : S1)                  (M : S2)
-----------------          ---------------
 M.v : M.t -> M.t         M.v : M.t * M.t
-----------------------------------------------
 M.v : (M.t -> M.t) & (M.t * M.t) = bottom

This is the reason why S1 & S2 must be ruled out.

So when we want to elaborate x & y into a signature, we explore each signature, each label will have a list of types resulting from the elimination rule and projection (for M.v it’s the list [M.t -> M.t; M;t * M.t]) and we fold & over this list. If one label have a bottom type the product is ruled out, otherwise we have an elaboration as a signature.

Topic		Replies	Views
First class modules and argument types Learning	5	777	January 20, 2023
[Gagalium blog post] Rethinking abstract module types - feedback wanted Community	11	1228	December 13, 2023
Module system resources Learning	3	1539	November 28, 2017
OCaml compiler development newsletter, issue 2: May 2021 Community compiler , news , compiler-newsletter	8	2778	June 23, 2021
Documenting what signatures are intended to be implemented by a module Ecosystem documentation , module-type	6	274	July 17, 2024

ICFP and improvement of module language

Related topics