Sum type constructor declaration subtlety (and documentation wanted!)

perry · April 23, 2018, 3:45pm

Howdy! On the discord channel this morning, someone brought up the fact that in

type foo = 
| A of int*int
| B of (int*int)

that A and B are quite different. A is a constructor taking two arguments, even though it looks like it takes a tuple, while B actually takes a tuple.

This means that the two cannot be used interchangeably and in fact have different enough meanings to matter in a lot of contexts.

(This is confusing to a beginner because it would seem like the parentheses are just meaningless precedence grouping, but they aren’t, they’re syntactically quite significant, and it would seem like the * in the A case means “tuple” but it does not.)

I didn’t know about this distinction myself. I haven’t been using OCaml that long, but it has been long enough that I was surprised I never understood this difference before. It is not well explained in the documentation (or perhaps I simply missed it being pointed out), and it would probably be good to be much more explicit in the official manual, and for books like RWO to mention this prominently for beginners.

Yaron_Minsky · April 23, 2018, 4:50pm

Yeah, this deserves a mention in RWO. I’ll put it on the list…

perry · April 23, 2018, 5:17pm

One thing @bluddy noted during the discussion which was valuable for having it click in my head: if constructors had function types a la Haskell, then the obvious distinction is:

A: int -> int -> foo
while
B: (int * int) -> foo

perry · April 23, 2018, 9:09pm

I filed an issue on this btw: https://caml.inria.fr/mantis/view.php?id=7783

n4323 · April 24, 2018, 9:20am

See also here: Constructors with several arguments .
The ‘revised syntax’ tried to fix that. I wonder what Reason does here?

octachron · April 24, 2018, 11:40am

This sounds like a good idea, I think it should be possible to a small paragraph at the end of the variants and records section of the manual.

perry · April 24, 2018, 1:04pm

That would seem like the right location. I’d propose something like the following language:

Note that although the syntax is almost the same, there is a significant difference between a variant constructor that takes multiple arguments and one that takes a single tuple as an argument. Consider the following type:
type foo = 
| A of int*int
| B of (int*int)
Although the syntax for A and B seems nearly the same, the parentheses are quite significant. A is a constructor that takes two arguments, both of which are integer typed values, while B is a constructor that takes a single argument, a tuple of two integers. These behave quite differently in many circumstances, such as when pattern matching. Be careful not to confuse them.

octachron · April 24, 2018, 1:40pm

This is a good start but I think your text is context-heavy for a beginner, the many circumstances would be really unclear for this audience. An alternative path might be

Beware that the syntax for a constructor with multiple arguments
 type t = A of float * float
looks a lot like a tuple but this is not the case: all arguments are kept separated inside the constructor block, and it is not possible to extract those as a tuple without constructing a new tuple
let to_tuple (A (x,y)) = (x,y)
let wrong_to_tuple (A x) = x
If you want to pack all arguments together, it is possible to use a tuple as a constructor argument at the price of a bulkier memory representation:
type s = B of (float * float)
let to_tuple_bis (B x) = x

perry · April 24, 2018, 2:21pm

I find the beginning of what you wrote there very clear, i.e.:

Beware that the syntax for a constructor with multiple arguments
type t = A of float * float
looks a lot like a tuple but this is not the case:

but I think as a beginner I would have found the rest quite confusing. It discusses things like memory representation and tradeoffs on that which a beginner will not be prepared to think about, and brings in a couple of different let declaration styles as well.

How about:

Beware! The syntax for a constructor with multiple arguments:
type t = A of float * float
looks confusingly like it takes a tuple as its argument. However, A is a constructor that takes two float arguments. It is different from:
type t = B of (float * float)
in which B is a constructor that takes one argument, a tuple of two floats.

The two will behave differently under pattern matching, as A has two arguments, and B has only one.

perry · April 24, 2018, 2:28pm

(It might be reasonable to then show a difference in pattern matching between A and B.)

octachron · April 24, 2018, 2:33pm

Yes, the part about memory was too technical.
I agree that an example would complement nicely your text:

let a_to_tuple a = match a with
 | A (x,y) -> (x,y)
let b_to_tuple b = match b with
| B t -> t

Also, I would rename the second type to s, to avoid using type-directed disambiguation (which is now described in a subsection at the end of this section) at this stage.

At last, I think that in a tutorial, it is not fair to introduce a potential choice without either explaining the potential consequence of this choice or advising on a branch, so I would add:

In doubt, it is generally more efficient to use the multiple arguments version.

Maybe with a reference to the C runtime chapter?

bobbypriambodo · April 24, 2018, 4:00pm

Reason does the same (different syntax for each), but specifically mentioned about it.

Be careful not to confuse a constructor carrying 2 arguments with a constructor carrying a single tuple argument:
type account =
 | Facebook(string, int) /* 2 arguments */;
type account2 =
 | Instagram((string, int)) /* 1 argument - happens to be a 2-tuple */;

n4323 · April 24, 2018, 4:10pm

Ok, but to construct the values, does it also allow the same syntax in both cases like OCaml syntax does?

I.e. is let a = Facebook("", 0) and let a2 = Instagram("",0) valid?

I find that aspect the most confusing.

n4323 · April 24, 2018, 4:14pm

IMO the most confusing part is that

let b_to_tuple b = match b with B (x, y) -> (x, y)

is also valid.

octachron · April 24, 2018, 4:31pm

I see your point, but since B t is a valid pattern, it seems coherent that expanding t to (x,y) works.

perry · April 24, 2018, 4:45pm

Yah. It might be important to note that but note the difference in what it means. Because the syntax is confusing, it would be best to err on the side of explicitness and clarity here.

n4323 · April 24, 2018, 5:10pm

Yes. Maybe one could point out that A t does not work to make the difference clear? Not sure.

perry · April 24, 2018, 5:56pm

An example from the Mantis thread seems particularly appropriate.

type t = A of int * string
let x = (3, "haha")
let y = A x (* This fails, because A needs two parameters. *)

octachron · April 24, 2018, 8:59pm

That sounds like the right example to illustrate the difference.

n4323 · April 25, 2018, 8:42am

one could add

let y = A (3, "haha") (* This succeeds, because here (3, "haha") is parsed as two constructor arguments. *)

I think this is a wart in the language but fortunately not encountered so frequently.

Topic		Replies	Views
Why is (intint) not the same thing as intint in OCaml Learning	4	2465	April 17, 2019
Constructors with several arguments Learning	4	6410	August 25, 2017
Why constructors are not curried? Learning	7	837	January 8, 2024
Confused about the application operator's behavior with type variant constructors Learning	8	606	December 16, 2022
OCaml compiler development newsletter, issue 2: May 2021 Community compiler , news , compiler-newsletter	8	2783	June 23, 2021

Sum type constructor declaration subtlety (and documentation wanted!)

Related topics