Struggling with (higher-kinded?) polymorphism

I think I have a problem that’s relatd to higher-kinded-types, but I’m not sure.

First attempt:

Consider the following problem. Say I have a function that shuffles an array:

let shuffle_array (arr : 'e array) : unit =
  for i = Array.length arr - 1 downto 1 do
    let j = Random.int (i + 1) in
    let tmp = arr.(i) in
    arr.(i) <- arr.(j);
    arr.(j) <- tmp
  done

Now I want create similar functions shuffle_list and shuffle_seq that work on lists and sequences, respectively. I tried this:

let shuffle_convert
  (input_conv : 'i -> 'e array)
  (output_conv : 'e array -> 'o)
  (input : 'i)
  : 'o =
  let arr = input_conv input in
  shuffle_array arr;
  output_conv arr

But it doesn’t work as wanted, because if I pass Array.of_list and Array.to_list as conversion functions to shuffle_convert, I won’t get a polymorphic function:

let shuffle_list_fails : 'a. 'a list -> 'a list =
  shuffle_convert Array.of_list Array.to_list
Error: This definition has type 'a list -> 'a list which is less general than
         'a0. 'a0 list -> 'a0 list

I tried something like this, but it also doesn’t work (not even syntactically):

let shuffle_convert_higher
  (input_conv : 'e 'i -> 'e array)
  (output_conv : 'e array -> 'e 'o)
  (input : 'e 'i)
  : 'e 'o =
  let arr = input_conv input in
  shuffle_array arr;
  output_conv arr
Error: Syntax error: ')' expected, the highlighted '(' might be unmatched

A weak version works:

I can create a weak version of the function:

let shuffle_list_weak : 'a list -> 'a list =
  shuffle_convert Array.of_list Array.to_list

In utop it gets shown as:

val shuffle_list_weak : '_a list -> '_a list = <fun>

I don’t understand why it is denoted as '_a and not '_weak1, but it does behave like a weak type variable.

utop # shuffle_list_weak [1;2;3;4;5];;
- : int list = [4; 2; 1; 3; 5]
utop # shuffle_list_weak ["A";"B";"C"];;
Error: This constant has type string but an expression was expected of type int

So this is not what I want.

Practical solution (with boilerplate):

Note that it is possible to create the functions I want with a bit of boilerplate (i.e. when I don’t use shuffle_convert):

let shuffle_list_boilerplate : 'a. 'a list -> 'a list =
  fun input ->
    let arr = Array.of_list input in
    shuffle_array arr;
    Array.to_list arr

let shuffle_seq_boilerplate : 'a. 'a Seq.t -> 'a Seq.t =
  fun input ->
    let arr = Array.of_seq input in
    shuffle_array arr;
    Array.to_seq arr

Thus, I do have a practical solution. But I would like to understand the underlying problem better.

Using modules as a workaround (that sometimes fails):

I did some searches and was wondering if this is related to higher-kinded types (not sure if I even understand what that is), but apparently OCaml doesn’t have them but there is a work-around for it using modules.

As far as I understand, the problem in shuffle_convert as written above is that the types 'i and 'o should rather be type constructors, but I can’t write 'e 'i and 'e 'o as types because the compiler expects type variables to refer to concrete types and not to type constructors, right?

So I tried to work around this issue using modules:

module type Container = sig
  type 'a t
  val to_array : 'a t -> 'a array
  val of_array : 'a array -> 'a t
end

module Shuffle_convert (C : Container) = struct
  let f : 'a. 'a C.t -> 'a C.t =
    fun input ->
      let arr = C.to_array input in
      shuffle_array arr;
      C.of_array arr
end

This compiles, YAY! :smiley: (Even with the explicit polymorphic annotation for f, which is what I want.)

But, apart from the bloated syntax, I was disappointed when I tried this:

let shuffle_list_weak_again =
  let module Shuffle_list = Shuffle_convert (struct
    type 'a t = 'a list
    let to_array = Array.of_list
    let of_array = Array.to_list
  end) in
    Shuffle_list.f;;
val shuffle_list_weak_again : '_weak1 list -> '_weak1 list = <fun>

Why is shuffle_list_weak_again weak?

If I instantiate the module outside of the function, it works:

module Shuffle_list = Shuffle_convert (struct
  type 'a t = 'a list
  let to_array = Array.of_list
  let of_array = Array.to_list
end);;

let shuffle_list = Shuffle_list.f

module Shuffle_seq = Shuffle_convert (struct
  type 'a t = 'a Seq.t
  let to_array = Array.of_seq
  let of_array = Array.to_seq
end);;

let shuffle_seq = Shuffle_seq.f
module Shuffle_list : sig val f : 'a list -> 'a list end
val shuffle_list : 'a list -> 'a list = <fun>
module Shuffle_seq : sig val f : 'a Seq.t -> 'a Seq.t end
val shuffle_seq : 'a Seq.t -> 'a Seq.t = <fun>

Questions:

My open questions:

  • Is my approach of shuffle_convert_higher theoretically correct (just not supported by OCaml)?
  • Why does shuffle_list_weak not show '_weak1 in its type signature but '_a? Is the word “weak” noise here and can be omitted? When is it omitted?
  • What is the practical approach to the problem? Simply avoiding it by skipping the step of a generalized shuffle_convert and directly write each function?
  • Why is shuffle_list_weak_again weak and shuffle_list not? Both seem to be defined in the same way, except that shuffle_list_weak_again uses a locally instantiated functor. Is this a bug? Or am I missing something?

Also, thanks again for all your responses on this forum so far. :folded_hands: I feel like l learned a lot here already.

I haven’t read everything in detail but it looks like you’re hitting the value restriction.
In particular, you should be able to replace shuffle_list_fails with the following:

let shuffle_list : 'a. 'a list -> 'a list =
  fun l -> shuffle_convert Array.of_list Array.to_list l
2 Likes

I can confirm, this fixes it! Thanks. I’ll read into that.

I wrongly thought that shuffle_convert has already a wrong type, but apparently it is good as is.

I also tried to see if the value restriction causes problems with shuffle_list_weak_again, and indeed it does. The following fix works there too:

-let shuffle_list_weak_again =
+let shuffle_list_weak_again x =
   let module Shuffle_list = Shuffle_convert (struct
     type 'a t = 'a list
     let to_array = Array.of_list
     let of_array = Array.to_list
   end) in
-    Shuffle_list.f
+    Shuffle_list.f x

Still a bit weird why I don’t need this when instantiating modules at the top level. :thinking:

Shuffle_list.f is a value, so it can be generalized, but let module Shuffle_list = ... in Shuffle_list.f is not.
Simpler examples:

# let id x = x;;
val id : 'a -> 'a = <fun>
# let id2 = id (* Value *);;
val id2 : 'a -> 'a = <fun>
# let id3 = let () = ignore () in id (* Not value *);;
val id3 : '_weak1 -> '_weak1 = <fun>

I checked the manual (OCaml 5.4, Weakly polymorphic types), and the manual mentions the prefix '_weak (and not '_):

Type variables whose names start with a _weak prefix like '_weak1 are weakly polymorphic type variables, sometimes shortened to “weak type variables”.

The typechecker try to respect your choice of type variables, thus if you write

let f: 'a -> 'a = Fun.id Fun.id

it will use the name '_a to indicate that it is a weakly polymorphic type variable named a. The names weak%d are used for anonymous weakly polymorphic type variables. (I should update the manual to mention the underscore case).

However in general, you don’t want to use unification variables at all (except for some expert use cases). You should use locally abstract types, either with the inline version:

let shuffle_convert (type i e o) 
  (input_conv: i -> e array) 
  (output_conv: e array -> o)
  (input i): o =
  ...

or the more general version (which work with recursive function) :

let shuffle_convert: type i e o. 
  (i -> e array) -> (e array ->o) -> u -> o = 
  fun input_conv output_conv input ->

this will give you far better error messages when writing the function itself and will ensure that
no functions are less polymorphic than you might expect:

let wrong_twice_with_error (type a) (f: a -> a) (x:a) = f (f (x+1))

Error: The value x has type a but an expression was expected of type int

compared to

let wrong_twice_accepted (f: 'a -> 'a) (x:'a) = f (f (x + 1))

val wrong_twice_accepted : (int → int) → int → int

Ah, so any type variable beginning with '_ is weak, and the weak part is just inserted when they are anonymous. Yeah, the manual is a bit misleading, though strictly speaking not wrong (it doesn’t say that those beginning with '_weak are the only weak type variables, but it is somewhat implied to the reader).

I’m curious what’s wrong with them and when to use them and when not. I have used them a couple of times (especially with type constructors, like val f : 'a list -> 'a.

This brings me back to the previously discussed: Understanding type quantification and locally abstract types - #18 by jbe.

I’m not sure exactly why I would want locally abstract types. Afterall, they don’t protect me from this:

let shuffle_list_badsurprise (type a) : a list -> a list =
  shuffle_convert Array.of_list Array.to_list

Where I get:

val shuffle_list_badsurprise : '_a list -> '_a list = <fun>

Which is weakly polymorphic, as I learned. So the locally abstract types don’t ensure my function is polymorphic.

In that other thread, I (possibly wrongly) concluded:

But that seems to be wrong:

# let x (type a) : a = 5;;
Error: The constant 5 has type int but an expression was expected of type a

So I think I still don’t fully understand what abstract local types do exactly.

To avoid the bad surprise above, I can also write:

let shuffle_list_rejected : 'a. 'a list -> 'a list =
  shuffle_convert Array.of_list Array.to_list
Error: This definition has type 'a list -> 'a list which is less general than
         'a0. 'a0 list -> 'a0 list

Same as I could write:

let shuffle_list_rejected : type a. a list -> a list =
  shuffle_convert Array.of_list Array.to_list
Error: This definition has type 'a list -> 'a list which is less general than
         'a0. 'a0 list -> 'a0 list

But my point is: In this case, I don’t need the locally abstract type to protect me from surprises, but I need the explicit polymorphic annotation (either using 'a. or type a.). So why and when should I use locally abstract types (other than when dealing with GADTs)? (Sorry if I have trouble understanding the concept, but I think it’s pretty complex to grasp.)

Any time that you are adding type annotations that you intend to be polymorphic.

It is true that they cannot make the result of a computation polymorphic, but that is a general rule: type annotations don’t increase polymorphism in OCaml[1]. In particular, they cannot influence the value restriction which is a stronger rule:

  • The result of a computation can never be polymorphic[2].

In a sense, it is a contradiction to try to add polymorphic type annotations to the defining of a non-syntactic value. Another important consequence is that for the core language not writing any annotation is the easiest way to be sure that functions are as polymorphic as possible.

Going back to definition that can be polymorphic and type annotations, a good point of a comparison is

let id (type a) (x:a): a =
  x + 1
2 |   x + 1
      ^
Error: The value x has type a but an expression was expected of type int

which gives you error on the expression which is incompatible with keeping the function id polymorphic, with

let id (x:'a): 'a = 1 + x

val id: int → int

that happily unifies the toplevel unification type variable 'a with int.

From a higher point point view, you can read

let id (x:'a): 'a = 1 + x

as

  • "Let name 'a the type of the expression x in the definitionlet id = ... .

Contrarily (going into the gritty details to try to avoid sowing more confusion)

fun (type a) (x:a) -> ( 1 + x : a)

reads:

  1. Let a be an abstract/black box/unknown type inside the expression fun (x:a) -> (1 + x:a),
  2. Once we leave the context of this expression consider that a is an free type variable 'a.

Moreover the interaction of these two points guarantees that if you use locally abstract types as the first arguments of a let-definition of a syntactic value, the variable will be generalized (to an universally quantified variable) in the type of the defined term. Indeed, if the definition

let f (type a) (x:a) = x

typechecks then:

  • The type of fun (x:a) -> x will be exactly a -> a
    (because the type a is locally abstract (aka an arbitrary unknown type) in the context of fun (type a) -> ... )
  • Then the type of the body of the let (aka fun (type a) (x:a) -> x) will be 'a -> 'a
    (here 'a is a free ordinary type variable)
  • Moreover, since the body of the let is a syntactic value, we can generalize free variables in the type of the body to obtain the type of the definition
  • Consequently, the type of f is the expected polymorphic type 'a. 'a -> 'a.

This means that when using locally abstract types in let-definition

let chain (type a b) (f:a -> b) (g:b -> c): a -> c = 
  fun x -> x |> f |> g

you can read (type a b) as “let a and b be polymorphic/rigid type variables in the body of the let”.

Looking at the question from the other side, toplevel unification type variables are intended to be used to give a local name to a single type in a toplevel definition:

let f (x: int * int * int as 'a) (y:'a) = [x;y]
type u = A of (int -> int -> int as 'a) | B of 'a * 'a

and adding a type annotation containing a toplevel unification type variable make functions less polymorphic. The obvious case would be

let f (x:'a) (y:'a * 'a) = x, y

where without the annotation the type of f would 'x -> 'y -> 'x * 'y. But toplevel unification variables can also hinder let-generalization. For instance, this definition is valid:

let f x =
  let pair y = x, y in
  pair 0, pair "zero"

whereas this one is not

let f x =
 let pair (y:'a) = x, y in
 pair 0, pair "zero"
3 |  pair 0, pair "zero"
                  ^^^^^^
Error: This constant has type string but an expression was expected of type
        int

because adding an annotation on y means that you intend the type of pair to be 'a -> 'b * 'a
for the same type 'a in the whole body of the function f.

Locally abstract types have a more intuitive scope as they are not implicitly scoped at toplevel definition. Moreover, they work
and maps more closely to the concept of a type which is locally a black box and thus can be safely generalized to an universally quantified type variable when defining a value in a let definition.


  1. outside of mutually polymorphic recursive functions (and type annotations can make GADTs function typechecks) ↩︎

  2. except on irrelevant type parameters with the relaxed value restriction ↩︎

1 Like

The way I usually explain it is that, initially, any type variable is global (its concrete type is still unknown, but at the end, there can be only one such concrete type, hence the notion of “weak”). Later on, in some specific cases, the typechecker can notice that using a global type variable is overly restrictive and that it can make the definition prenex polymorphic. When you write let f (type a) : a -> a = ..., you are introducing a new global type variable and give it a local name, but it does not change the fact that it starts as weak, and whether f can eventually be generalized to be prenex polymorphic is orthogonal. But when you write let f : type a. a -> a = ..., you are explicitly telling the compiler that the final type of f is supposed to be prenex polymorphic, and it is a compilation error if not.

But locally abstract types don’t ensure that, right?. Two examples:

Which, due to the value restriction, has a non-polymorphic type (but compiles). Another example:

let x (type a) : a option ref = ref None

Which can’t be polymorphic (but isn’t rejected by the compiler and also turned into a weakly polymorphic value).

So when I want to make a type annotation that I intend to be polymorphic, why shouldn’t I use 'a. or type a. instead, which both are explicitly polymorphic with regard to the resulting type (and not just introducing an abstract but not necessarily polymorphic type inside)?

I would like to understand when exactly (and why) I should favor (type a) over 'a. or type a.

My current understanding is:

  • Both 'a. and type a. describe a polymorphic type.
  • Both (type a) and type a. require that within the following definition, a cannot be made equal to a concrete type within that definition, but (edit: in case of (type a)) may still be non-polymorphic (e.g. due to the value restriction).

Taking the “not_id” example in the manual on the value restriction and annotating it with a locally abstract variable (not explicitly polymorphic) gives:

let not_id (type a) : a -> a = (fun x -> x) (fun x -> x);;
val not_id : '_a -> '_a = <fun>

Which is not polymorphic. So what practical use has this (outside GADTs)?

If I want to make sure the function is polymorphic, shouldn’t I use this then:

let not_id : type a. a -> a = (fun x -> x) (fun x -> x);;
Error: This definition has type 'a -> 'a which is less general than
         'a0. 'a0 -> 'a0

Then I see the problem, and can fix it:

let id_again : type a. a -> a = fun x -> (fun x -> x) (fun x -> x) x;;
val id_again : 'a -> 'a = <fun>

As I see it, (type a) can still give me bad surprises, while type a. can’t. So why use the former and not always the latter for polymorphic annotations? Also, why should I used type a. and not the syntactically shorter 'a.?

Regarding terminology, a is not a type variable, right? Type variables start with an apostrophe ('), right?

I think the manual here is also somewhat inconsistent. It first says:

This construction [regarding locally abstract types] is useful because the type constructors it introduces can be used in places where a type variable is not allowed.

So I would say when I write (type a), then a is not a type variable but a type constructor.

Yet later, the manual says:

The (type typeconstr-name) syntax construction by itself does not make polymorphic the type variable it introduces, […].

So here, a would be a type variable. Is a a type variable or not?

1 Like

It is a type variable. Using an apostrophe is just a shorthand to avoid having to explicitly declare it. The issue with this shorthand is that it is not obvious for the compiler where it should put the actual declaration of a variable that uses an apostrophe. See below.

It is just that the manual does not explain clearly what it means by “not allowed”. It is actually allowed, you can write 'a there, but it would not have the correct meaning. Indeed, you want the type variable that appears in the declaration of the exception to be used both inside the inner module and outside of it. So, if you use the version with an apostrophe, you are stuck because it becomes impossible to use the variable outside. But if you give it a proper declaration with the (type t) syntax outside the inner module, then the variable can also be used there.

I noticed that I can’t use explicit polymorphic annotations when I use the let f x = … notation, instead of let f = fun x -> …

let f (type a) : a -> a = fun x -> x  (* Works. *)
let f : 'a. 'a -> 'a = fun x -> x  (* Works too. *)
let f (type a) (x : a) : a = x  (* Works also. *)
let f 'a. (x : 'a) : 'a = x  (* A syntax like this doesn't exist, because
                                ['a.] would need to appear in a single type. *)

So I guess that means (type a) is the most I can do when I use the let f x = … notation. But I still have no real consistent view, when to use what. I guess (type a) is better than just using 'a in a let?

Side note: When I use val, I would (have to) stick to 'a anyway (which automatically implies polymorphism):

module M : sig val f : 'a -> 'a end = struct let f x = x + 1 end
Error: Signature mismatch:
       Modules do not match:
         sig val f : int -> int end
       is not included in
         sig val f : 'a -> 'a end
       Values do not match:
         val f : int -> int
       is not included in
         val f : 'a -> 'a
       The type int -> int is not compatible with the type 'a -> 'a
       Type int is not compatible with type 'a

So:

  • In val, a type variable ('a) is always universally quantified. (Here it’s always clear what to do.)
  • In let f x = … I can (should?) use (type a) for a polymorphic annotation? (edit: which doesn’t really enforce a polymorphic type as shown in my previous post)
  • In let f = fun x -> … I could also use 'a or type a. for a polymorphic annotation.

But I guess I should not use a lot of type annotations anyway?

So many options, and I’m confused when to use what. I mean… I understand more or less how these annotations work and what they do, but I lack a strategy for good coding style and understanding the reasons behind it.