Why can't OCaml simplify this (conjunctive) polymorphic variant type?

Hello,

The problem I’m about to describe initially arose using Ocsigen with some pretty large polymorphic variant types. I tried to simplify the problem as much as possible, but please keep in mind that a solution like “change your types” is not relevant for me :slight_smile:

Consider the following source code:

type u1 = [ `A | `B ]
type u2 = [ `A | `B | `C ]

let p1 = function `T (`A) | `T (`B) -> ()
let p2 = function `T (`A) | `T (`B) | `T (`C) -> ()
let p3 = function `T (`A) | `T (`B) | `T (`C) | `U -> ()

let f h =
  p1 (h :> [ `T of u1 ]);
  p2 (h :> [ `T of u2 ]);
  p3 (h :> [ `T of u2 | `U ])

When asking OCaml to infer its type (ocamlc -i mnwe.ml), I get:

type u1 = [ `A | `B ]
type u2 = [ `A | `B | `C ]
val p1 : [< `T of [< `A | `B ] ] -> unit
val p2 : [< `T of [< `A | `B | `C ] ] -> unit
val p3 : [< `T of [< `A | `B | `C ] | `U ] -> unit
val f : [< `T of [< u2 ] & [< u2 ] & [< u1 ] ] -> unit

So far so good, just note how weird the type of f is, specifying twice that T (I don’t know how to typeset “acute T” in code on this forum :stuck_out_tongue: ) should contain a subtype of u2.

The problem is if I want to add an mli interface to that implementation.

Here are two interfaces. The first will compile (it is a subset of the interface inferred by OCaml, so its no surprise), but is ugly in my opinion, especially with larger types. The second won’t compile, while it seems obvious that it is the same.

First interface (working):

type u1 = [ `A | `B ]
type u2 = [ `A | `B | `C ]

val f : [< `T of [< u2] & [< u2] & [< u1] ] -> unit

Second interface (not working): It is forbidden to remove duplicates in the conjunction.

type u1 = [ `A | `B ]
type u2 = [ `A | `B | `C ]

val f : [< `T of [< u2] & [< u1] ] -> unit

The example won’t compile either with the following interface.

Third interface (not working): It is forbidden to change the order of conjuncts in the conjunction. This is true even when adapting the example to use only p1 and p2.

type u1 = [ `A | `B ]
type u2 = [ `A | `B | `C ]

val f : [< `T of [< u2] & [< u1] & [< u2] ] -> unit

I tried to force the type of the h parameter in the f function, but the inferred type just got even more bloated:

let f (h : [< `T of [< u2] & [< u1] ]) =
  p1 (h :> [ `T of u1 ]);
  p2 (h :> [ `T of u2 ]);
  p3 (h :> [ `T of u2 | `U ])

val f : [< `T of [< u2] & [< u1] & [< u1] & [< u2] & [< u2] -> unit

I am using OCaml 4.03.0.

So the questions are : Is there some trick that would allow me to have beautiful interfaces? Should this be considered a bug and should I fill a bug report? Am I too much of a newbie to fiddle with such advanced constructions of the language?

Have a nice day!

Fabian Pijcke

See this issue for a simpler example that generates unnecessary conjunctions and some explanation of the behaviour.

As for your possible interfaces, equality between different sets of conjunctions only works for exactly the same sets. Doing better than that is a difficult problem and isn’t really needed. What you should be doing in your interface is getting rid of the conjunction, since u1 is a subset of u2 that would be:

val f : [< `T of [< u1 ] ] -> unit
1 Like

Mmh I see. Thanks for the link :slightly_smiling_face:

However the Ocsigen types I’m playing with are not included to each other, and the intersection of the (three) types contains 49 variants (in the current version of Ocsigen, and I would like my code to be portable between different versions, say as long as type names do not change), themselves parameterized by different types.

The “clean” way of specifying this into the interface would be to declare a type myinter containing it, but then I would need to paste this type in the implementation also. I’d rather repeat twice the type name as a conjunction ^^

I agree that it is not needed per se, but how do OCaml language developers decide what is and what is not? AFAICS the only thing needed is to order the list of conjuncts and to remove duplicates before comparing them :stuck_out_tongue:. Of course I’m by no means an OCaml language developer and I guess that it is way more complicated than that, I’m just curious :slight_smile:

What about using something like:

type t = [`A | `B | `C]
type u = [`C | `D | `E]

type 'a t_and_u = 'a
  constraint 'a = [< t]
  constraint 'a = [< u]

Then you can give f the type _ t_and_u -> unit.

the only thing needed is to order the list of conjuncts and to remove duplicates before comparing them

Yeah, these conjunctions are not conjunctions in the normal subtyping sense. They are really a list of arbitrary unifications that need to be performed when making the type “fixed” (i.e. when getting rid of the <). They are not designed for representing intersections of variants and they would do a pretty bad job of it. They should really be thought of as an artifact of type inference that is supposed to be removed as soon as possible with a type annotation.

3 Likes

This is very interesting. I had already seen the constraint keyword in some places but this is the first time I really have to use it. However I am stuck again. The problem comes from an over-simplification of what I am doing. Although your technique works great when h is a value, it breaks again when h is a function from an alphatype to the polymorphic variant type.

Here is the same example taking this issue in consideration:

type u1 = [ `A | `B | `D ]
type u2 = [ `A | `B | `C ]

let p1 v h = match h v with `T (`A) | `T (`B) | `T (`D) -> ()
let p2 v h = match h v with `T (`A) | `T (`B) | `T (`C) -> ()
let p3 v h = match h v with `T (`A) | `T (`B) | `T (`C) | `U -> ()

type 'a t_and_u = 'a
  constraint 'a = [< u1]
  constraint 'a = [< u2]
  
let f (g : 'a) (h : 'a -> [< `T of _ t_and_u]) =
  p1 g (h :> 'a -> [ `T of u1 ]);
  p2 g (h :> 'a -> [ `T of u2 ]);
  p3 g (h :> 'a -> [ `T of u2 | `U ])

And the inferred interface:

val f :
  'a ->
  ('a -> [< `T of [< `A | `B ] t_and_u & [< u1 ] & [< u2 ] & [< u2 ] ]) ->
  unit

So the spurious conjuncts kick in again. I tried to change the type of t_and_u so to something like ('a, 'b) t_and_u = 'a -> [< 'T of 'b] constraint 'b = [< u1] constraint 'b = [< u2] (replace the single quote before T with an acute accent), but the compiler then complains about 'b being unbound.

Sorry to bother again :frowning:

That code is always going to infer extra conjuncts, but you can specify the right interface with an annotation:

# module M : sig
  val f : 'a -> ('a -> [< `T of _ t_and_u ]) -> unit
end = struct
  let f (g : 'a) (h : 'a -> [< `T of _ t_and_u]) =
    p1 g (h :> 'a -> [ `T of u1 ]);
    p2 g (h :> 'a -> [ `T of u2 ]);
    p3 g (h :> 'a -> [ `T of u2 | `U ])
end;;

module M : sig val f : 'a -> ('a -> [< `T of [< `A | `B ] t_and_u ]) -> unit end
2 Likes

Okay, many thanks for all the information! For some reason I still have conjuncts even with your last technique, but anyway I think it is easier to just repeat the type in the interface rather than having a submodule just to get a cleaner interface.

Have a nice day!

Fabian

The essential reason why conjunctions can’t be generally simplified is because the presence of the corresponding constructor depends on the result of the delayed unification. If unification fails, the constructor is removed (disallowed), but the other constructors are still allowed. Consider for example:

let is_a a = function `Check a' when a = a' -> true | `Check _ | `Pass -> false;;
let f (x : [< `Check of 'a & 'b | `Pass]) (a : 'a) (b : 'b) : bool =
    match x with 
    | `Check _ -> is_a a (x :> [< `Check of 'a | `Pass]) &&
                  is_a b (x :> [< `Check of 'b | `Pass])
    | `Pass -> true;;
f (`Check 1) 1 1;;
f (`Check '1') '1' '1';;
f (`Check 1) 1 2;;
f `Pass 2 2;;
f `Pass '1' 1;;
f `Pass '1' "11111";;

But not:

f (`Check '1') '1' "11111";;


Error: This expression has type string but an expression was expected of type char

I.e. when the types 'a and 'b are different, only `Pass is allowed, but when they are the same, both `Pass and `Check are allowed.
In the example it seems the presence of `T should not depend on the success of delayed unification, so exactly this presence can be annotated to remove the delayed unification:

let f (g : 'a) (h : 'a -> [< `T of _ > `T]) =
  p1 g (h :> 'a -> [ `T of u1 ]);
  p2 g (h :> 'a -> [ `T of u2 ]);
  p3 g (h :> 'a -> [ `T of u2 | `U ]);;

(this can actually be [<`T of _ | `Some_other_constr1 of 'a | `Some_other_constr2 of 'b > `T])
The type-checker will now perform the unification immediately, but the inferred type can indeed contain a huge constant intersection that can be expressed concisely in the signature using the t_and_u.

Edit: actually the presence requirement can be further removed by adding redefinition of f just after:

let f (g : 'a) (h : 'a -> [< `T of _ > `T]) = ...
let f a h = f a (h :> _ -> [ `T of _ | `U]);;

val f : 'a -> ('a -> [< `T of [< `A | `B ] | `U ]) -> unit = <fun>
2 Likes

This is unrelated to the topic, but I just wanted to inform you that I found a nice (in my humble opinion) solution to my problem, which is surrounding anything that’s passed with a span tag, so that the type needed is phrasing regardless of the (previously) surrounding tag :face_with_raised_eyebrow:

Now the interface looks really clean, and I feel like the trick in the implementation is really acceptable :slight_smile:

@schrodibear, please allow me to delay reading your comment until tomorrow, my brain seems switched off already…