Is [] a type or not?

I wondered whether [] is a type, since it would be a natural notation for an empty polymorphic variant type. At first it doesn’t seem to be:

# let f (x : []) : [] = x;;
Error: Syntax error: type expected.

But this works:

# type foo = [];;
type foo = []
# let f (x : foo) : foo = x;;
val f : foo -> foo = <fun>

However, it doesn’t seem to be an empty type:

# let f (x : foo) = match x with _ -> .;;
Error: This match case could not be refuted.
       Here is an example of a value that would reach it: []

This error message suggests that [] is an element of the type [], and indeed it seems to be, at least as long as the type was given a name:

# let y : foo = [];;
val y : foo = []
# let y : [] = [];;
Error: Syntax error
# let f (x : foo) = match x with [] -> 3;;
val f : foo -> int = <fun>

Normally [] denotes the empty list, so I thought maybe [] was “the type of empty lists”. But if so, these “lists” don’t seem to have anything to do with ordinary ones:

# 3 :: y;;
Error: This expression has type foo but an expression was expected of type
         int list
# let z : 'a list = [];;
val z : 'a list = []
# y = z;;
Error: This expression has type 'a list but an expression was expected of type
         foo

I am very confused. Is [] a type, as it seems to sort of be? If so, what type is it, and why does it only work in some places and not others? If not, what is going on?

It’s not exactly a type, it’s a type constructor. If I remember correctly, there are five special “keywords” that are parsed and typed as type constructors: [] and (::), for the list type, (), for the unit type, and true and false for the bool type.
Apart from this special parsing rules, there is nothing particular about these constructors and you can reuse them for custom types:

# type t = () | [] | (::) | true | false | Something;;
type t = () | [] | (::) | true | false | Something
# [| (); []; (::); true; false; Something|];;
- : t array = [|(); []; (::); true; false; Something|]
2 Likes

[] is a data constructor, not a type. Replace it with Nil (just my choice of name, nothing special about it) everywhere in your code sample and you get exactly the same results.

1 Like

Oh! I get it. So when I wrote

type foo = []

I was defining a new variant type foo with one constructor [] having no arguments. I have to say I think it’s super confusing to have just a few special cases of tokens that are interpreted as data constructors. At least in the case of

type foo = ()

I get a

Warning 65 [redefining-unit]: This type declaration is defining a new '()' constructor
which shadows the existing one.
Hint: Did you mean 'type foo = unit'?

Wouldn’t it make sense to give similar warnings for the other special cases?

2 Likes

Yeah, exactly. And you’re totally right, redefining [] should issue warning 65 too. Looks like a gap. Maybe file an issue on the compiler repo?

Re-using the list data constructor names is occasionally useful, in particular when working with heterogeneous lists:

# type _ t =
  | [] : unit t
  | ( :: ) : 'a * 'b t -> ('a -> 'b) t ;;

# [ 1; "foo"; true ] ;;
- : (int -> string -> bool -> unit) t
5 Likes

That’s neat; so even if you redefine [] and (::), the builtin syntax [ _ ; _ ; ... ; _] still expands in terms of the new ones. Why do you use 'a -> 'b rather than, say, 'a * 'b?

1 Like

'a * 'b also works, and is functionally equivalent :slight_smile:

Using -> has the minor convenience of not generating lots of nested parentheses in printed types, since -> is already right-associative.

# type _ t =
  | [] : unit t
  | ( :: ) : 'a * 'b t -> ('a * 'b) t ;;

# [ 1; 2; 3; 4; 5 ] ;;
- : (int * (int * (int * (int * (int * unit))))) t
2 Likes

@CraigFe, do you disagree that redefining [] and (::) should issue a warning, because of that use case? Or would it be reasonable to issue a warning to help people like me, and let someone who really wants to use them for heterogeneous lists ignore or silence the warning?

How about redefining (), true, and false – are there any use cases for those?

The reason for the warning about () is because it is a common mistake and not very useful to redefine it. On the other hand, redefining other predefined constructors is less common and also more obviously useful. So it would seem OK not to warn about redefining them.

Cheers,
Nicolas

Is redefining true and false useful?

1 Like

IMHO, it’s more a footgun than useful.

utop # type bool = true | false;;
utop # let x = true;;
utop # if x then 1 else 2;;
Error: This expression has type bool/1 but an expression was expected of type
         bool/2
       because it is in the condition of an if-statement
       File "_none_", line 1:
         Definition of type bool/2
       Hint: The type bool has been defined multiple times in this toplevel
         session. Some toplevel values still refer to old versions of this
         type. Did you try to redefine them?

How about a warning if [] is redefined without simultaneously redefining (::)? That seems more likely to indicate someone, like me, not knowing what they’re doing. I expect most if not all real use cases would involve redefining both together?

1 Like

I was curious about how prevalent redefining (::) is in the opam repository, and it seems to happen quite a bit:

List of packages
$ opam grep "( *:: *)"
  • accessor
  • amqp-client
  • amqp-client-async
  • amqp-client-lwt
  • archi
  • archi-async
  • archi-lwt
  • b0
  • batsat
  • batteries
  • bitwuzla
  • bitwuzla-bin
  • bitwuzla-c
  • blake3
  • brr
  • bsbnative
  • camlon
  • camlp5
  • coccinelle
  • codept
  • colombe
  • conformist
  • containers
  • containers-data
  • containers-thread
  • coq
  • coq-of-ocaml
  • coqide
  • crowbar
  • doc-ock
  • dot-merlin-reader
  • earley
  • elpi
  • embedded_ocaml_templates
  • extlib
  • farfadet
  • GT
  • git-cohttp-mirage
  • gopcaml-mode-merlin
  • graphql
  • graphql-async
  • graphql-cohttp
  • graphql-lwt
  • graphql_parser
  • immutable
  • iocaml
  • jsonrpc
  • ketrew
  • learn-ocaml
  • learn-ocaml-client
  • lem
  • linksem
  • liquidsoap

    (At this point, opam grep ran out of disk space on my machine… :grin:.)

My suspicion is that adding a naive warning for redefinition of (::) would overall cause more inconvenience than it would save, although it might make sense to warn on redefinition of (::) but not [] in the same type. (This still warns against some valid use-cases, but I would guess far fewer of these occur in the wild.)

I agree that the construct type t = [] in particular is very surprising with regard to polymorphic variants. I would propose adding a subsection to the OCaml manual on polymorphic variants explaining:

  • the (lack of a) constructor-less polymorphic variant type, and the reasoning for this;
  • the gotcha mentioned here w.r.t. [];
  • a pointer to the section on empty variant types.

Ultimately, I think just making type foo = [] invalid wouldn’t be enough to fix the gotcha: there needs to be a longer-form explanation somewhere (although this could be linked from a hint, as is done for some of the other warnings).

Relatedly, there was some discussion back in 2013 about banning redefinition of (::) altogether, but this doesn’t seem to have led anywhere.

1 Like

Those are not all redefinitions, for instance the (::) in batsat are in rust code.

Speaking of footguns, it appears that if you redefine () to take an argument, it completely fubars the parser or something:

# type foo = () of int;;
Warning 65 [redefining-unit]: This type declaration is defining a new '()' constructor
which shadows the existing one.
Hint: Did you mean 'type foo = unit'?
type foo = () of int

# 2 * (1 + 1);;
Error: The constructor () expects 1 argument(s),
       but is applied here to 0 argument(s)

# let f x = x;;
Error: The constructor () expects 1 argument(s),
       but is applied here to 0 argument(s)
3 Likes

Nice find! I wonder how long this has been lurking in the shadows :grin:

Redefining () as

type t = () of int

does not mess up with the parser. (and you can even construct value of this type with

let x= () 0

The responsible is utop which is rewritting

1 + 2;;

as

let () = 1 + 2

which does not work when () is not the expected unit.

2 Likes

I see that it works fine in a source file rather than at toplevel. But let () = 1 + 2 doesn’t typecheck even ordinarily – 1 + 2 is not of type unit – so that can’t be exactly what utop is doing, can it?

Because that’s the wrong syntax for pattern matching variants, essentially you’re doing, notice that all let is a pattern matching.

type t = Variant of int
let Variant = 1 + 2