Weak type variables are instantiated w.r.t the environment at the time of their introduction (?)

Hi,

I stumbled upon the following situation involving weak type variables:

# let x = ref None;;
val x : '_weak1 option ref = {contents = None}
# type 'a res =
  | Done of 'a 
  | Undone of unit
;;
type 'a res = Done of 'a | Undone of unit
# x := Some (Done 5);;
Error: This expression has type 'a res but an expression was expected of type
         'weak1
       The type constructor res would escape its scope
# let x = ref None;;
val x : '_weak2 option ref = {contents = None}
# x := Some (Done 5);;
- : unit = ()

This is a simplified example, it actually occurred in “real” code, so it was quite hard for me to see where the error came from.

My interpretation is that weak type variables should ultimately be instantiated, which all manuals and books that I know of indeed say, but only with types existing in the environment when the weak type variable was introduced, which -I think- no manual or book says (most examples in the literature instantiate x with an integer, but int is already declared in the OCaml prelude).

So this a question for OCaml experts:

  1. Is my interpretation correct or at least sufficiently accurate for everyday programming?
  2. If so, would you agree that it would be relevant to update the OCaml manual (at least) (and why not RWO2) to explain this?
  3. In this specific case, would it be possible to implement in the compiler an improved error message with an hint saying that the type constructor res didn’t even exist in the environment when _weak1 was introduced, and so cannot even be used when mutating x?

Contrary to the name “Weakly polymorphic types” weakly typed values are not at all polymorphic, but truly monomorphic, they could have one and only one type, we just delayed its instantiation until we have more information. So a weak type variable is a placeholder for a concrete type. Thus we have to apply the same reasoning to weakly typed values as we apply to the values with monomorphic type, i.e., that a value must have a type, and obviously a type must be declared before the value is created. I hope that this explanation is less confusing and less surprising, however your point of view is also correct. I agree, that the error message is obscure, and that some explanations why this happen should occur in the manual. If not in the manual, then at least here and probably we can extend an article about weak type variables here. Besided, the notion of type constructor escape and its escape is not very well covered in OCaml manual and actually anywhere, that’s why error messages that mention this vaguely defined scope produce so much confusion. Probably, we should either define it, or the compiler should stop mentioning scope at all, keeping it as an implementation detail.

1 Like

Yes, I know the distinction between polymorphic and weak variables. But I had never linked the value of weak variables to their environment… and, as I wrote, tutorials and manuals usually gloss over this as they stick to types from the prelude.

That makes sense.

Concerning the “scope escape” message, this is indeed quite cryptic… Beyond the question of whether it should be removed, and/or whether it should be defined in the manual, the main question for me is rather whether it’s possible to get clearer/more detailed messages in at least some situations where this error happens.

On a related note, the error message story in OCaml is far better than before (I started using Caml in 1997) but I really wish they were even better. I know @charguer had worked on that (and there’s also this work on [ANN] harmatia, "prettified" compiler error message plugin - #3 by paurkedal)… Some messages are really hard to explain to students, and even for teachers that don’t know what a scope escape is :slight_smile: , and the thing is that some of these messages sometimes happen even when not using advanced type features.

On a related note, the error message story in OCaml is far better than before (I started using Caml in 1997) but I really wish they were even better.

For the record, @octachron and myself have been working on improving the error messages lately, with changes that either appeared in 4.07 or will appear in the next release (@octachron improved a number of typing error messages among other things, and I integrated patches by @charguer and also worked on the formatting of error messages).

In my experience, there is often some tension between different tradeoffs when trying to improve (typing) error messages with a specific example in mind.
On one end, the simplest solution seems to have a specific code-path in the compiler recognizing this specific code pattern, and producing a perfectly fitting error message. But at the extreme, this can make the compiler messages unpredictable (deviate a bit from this pattern and you go back to undecipherable error messages) and the compiler’s implementation hard to maintain because of all the ad-hoc special cases.
So one is tempted to try to generalize a bit from the example; but then we have to make sure that the specially crafted error messages do also make sense in a more general setting: otherwise, the compiler could be trying to provide hints “did you mean to do …?” when they make no sense, which is worse than no hint at all.

2 Likes

Yeah, I was just trying to provide an easier explanation why this happens, as if we want to improve the error message, we need to find an explanation that is easier to understand (who knows, maybe the current explanation is the best :D).

An error that we see is actually an instance of a more general problem of generalization and in fact has nothing to do with the weakness. The scopes here (or levels as they are implemented in the OCaml typchecker) are in fact an implementation detail, an optimization, that was necessary to implement an efficient type checker. In the original Caml implementation, there were no levels (and hence no notion of escaping). The semantic notion that this level check is capturing is that the declaration of a type must dominate all its uses. Which must hold for all types, not only weak types.

So when a type constructor or a type escapes its scope or a scope of an equation (my favorite) it means that the type is either used before defined or, vice verse, may be used after it became undefined. Perhaps a better error message would explain the problem in terms of defined/undefined. The use before define is definitely easy to understand, so for our case, we can generate an error

The type 'a res is used before defined

(ideally a location of the use should be provided, though unification trace doesn’t contain it). For the dual case, when a type outlives its definition, it is not that easy to provide a better error message. Something like this might work:

This expression has a type x that is local and may become unknown later. Either bind x to a locally abstract type or constrain it to a type with a wider scope.

The message could be specialized, for example

type univ = Univ : 'a * ('a -> string) -> univ
let show (Univ (x,show)) = x;;

produces a cryptic:

Error: This expression has type $Univ_'a
but an expression was expected of type 'a
The type constructor $Univ_'a would escape its scope

Probably, the following might be better:

This expression has $Univ_'a which is local to the enclosing let definition. Local types should either remain in the scope of their definition or bound to types with wider scopes.

Note, the above are just thoughts aloud, maybe they will help to find better error messages and better explanations why something is untypable. But in general, we need an easy to read and to understand explanation of the OCaml typechecker, and error messages consistent with this explanation.

Thanks, that’s quite enlightening.
Let me take this occasion to thank you as well as @octachron, @gasche and @lpw25 for your strong investment in explaining intricate OCaml issues so well and with so much patience on this very site.

3 Likes