Sum type constructor declaration subtlety (and documentation wanted!)

Howdy! On the discord channel this morning, someone brought up the fact that in

type foo = 
| A of int*int
| B of (int*int)

that A and B are quite different. A is a constructor taking two arguments, even though it looks like it takes a tuple, while B actually takes a tuple.

This means that the two cannot be used interchangeably and in fact have different enough meanings to matter in a lot of contexts.

(This is confusing to a beginner because it would seem like the parentheses are just meaningless precedence grouping, but they aren’t, they’re syntactically quite significant, and it would seem like the * in the A case means “tuple” but it does not.)

I didn’t know about this distinction myself. I haven’t been using OCaml that long, but it has been long enough that I was surprised I never understood this difference before. It is not well explained in the documentation (or perhaps I simply missed it being pointed out), and it would probably be good to be much more explicit in the official manual, and for books like RWO to mention this prominently for beginners.

4 Likes

Yeah, this deserves a mention in RWO. I’ll put it on the list…

2 Likes

One thing @bluddy noted during the discussion which was valuable for having it click in my head: if constructors had function types a la Haskell, then the obvious distinction is:

A: int -> int -> foo
while
B: (int * int) -> foo

I filed an issue on this btw: https://caml.inria.fr/mantis/view.php?id=7783

See also here: Constructors with several arguments .
The ‘revised syntax’ tried to fix that. I wonder what Reason does here?

This sounds like a good idea, I think it should be possible to a small paragraph at the end of the variants and records section of the manual.

That would seem like the right location. I’d propose something like the following language:

Note that although the syntax is almost the same, there is a significant difference between a variant constructor that takes multiple arguments and one that takes a single tuple as an argument. Consider the following type:

type foo = 
| A of int*int
| B of (int*int)

Although the syntax for A and B seems nearly the same, the parentheses are quite significant. A is a constructor that takes two arguments, both of which are integer typed values, while B is a constructor that takes a single argument, a tuple of two integers. These behave quite differently in many circumstances, such as when pattern matching. Be careful not to confuse them.

This is a good start but I think your text is context-heavy for a beginner, the many circumstances would be really unclear for this audience. An alternative path might be

Beware that the syntax for a constructor with multiple arguments

 type t = A of float * float

looks a lot like a tuple but this is not the case: all arguments are kept separated inside the constructor block, and it is not possible to extract those as a tuple without constructing a new tuple

let to_tuple (A (x,y)) = (x,y)
let wrong_to_tuple (A x) = x

If you want to pack all arguments together, it is possible to use a tuple as a constructor argument at the price of a bulkier memory representation:

type s = B of (float * float)
let to_tuple_bis (B x) = x

I find the beginning of what you wrote there very clear, i.e.:

Beware that the syntax for a constructor with multiple arguments

type t = A of float * float

looks a lot like a tuple but this is not the case:

but I think as a beginner I would have found the rest quite confusing. It discusses things like memory representation and tradeoffs on that which a beginner will not be prepared to think about, and brings in a couple of different let declaration styles as well.

How about:

Beware! The syntax for a constructor with multiple arguments:

type t = A of float * float

looks confusingly like it takes a tuple as its argument. However, A is a constructor that takes two float arguments. It is different from:

type t = B of (float * float)

in which B is a constructor that takes one argument, a tuple of two floats.

The two will behave differently under pattern matching, as A has two arguments, and B has only one.

(It might be reasonable to then show a difference in pattern matching between A and B.)

Yes, the part about memory was too technical.
I agree that an example would complement nicely your text:

let a_to_tuple a = match a with
 | A (x,y) -> (x,y)
let b_to_tuple b = match b with
| B t -> t 

Also, I would rename the second type to s, to avoid using type-directed disambiguation (which is now described in a subsection at the end of this section) at this stage.

At last, I think that in a tutorial, it is not fair to introduce a potential choice without either explaining the potential consequence of this choice or advising on a branch, so I would add:

In doubt, it is generally more efficient to use the multiple arguments version.

Maybe with a reference to the C runtime chapter?

Reason does the same (different syntax for each), but specifically mentioned about it.

Be careful not to confuse a constructor carrying 2 arguments with a constructor carrying a single tuple argument:

type account =
 | Facebook(string, int) /* 2 arguments */;
type account2 =
 | Instagram((string, int)) /* 1 argument - happens to be a 2-tuple */;

Ok, but to construct the values, does it also allow the same syntax in both cases like OCaml syntax does?

I.e. is let a = Facebook("", 0) and let a2 = Instagram("",0) valid?

I find that aspect the most confusing.

IMO the most confusing part is that

let b_to_tuple b = match b with B (x, y) -> (x, y)

is also valid.

1 Like

I see your point, but since B t is a valid pattern, it seems coherent that expanding t to (x,y) works.

Yah. It might be important to note that but note the difference in what it means. Because the syntax is confusing, it would be best to err on the side of explicitness and clarity here.

Yes. Maybe one could point out that A t does not work to make the difference clear? Not sure.

An example from the Mantis thread seems particularly appropriate.

type t = A of int * string
let x = (3, "haha")
let y = A x (* This fails, because A needs two parameters. *)

That sounds like the right example to illustrate the difference.

one could add

let y = A (3, "haha") (* This succeeds, because here (3, "haha") is parsed as two constructor arguments. *)

I think this is a wart in the language but fortunately not encountered so frequently.