Howdy! On the discord channel this morning, someone brought up the fact that in
type foo =
| A of int*int
| B of (int*int)
that A and B are quite different. A is a constructor taking two arguments, even though it looks like it takes a tuple, while B actually takes a tuple.
This means that the two cannot be used interchangeably and in fact have different enough meanings to matter in a lot of contexts.
(This is confusing to a beginner because it would seem like the parentheses are just meaningless precedence grouping, but they aren’t, they’re syntactically quite significant, and it would seem like the * in the A case means “tuple” but it does not.)
I didn’t know about this distinction myself. I haven’t been using OCaml that long, but it has been long enough that I was surprised I never understood this difference before. It is not well explained in the documentation (or perhaps I simply missed it being pointed out), and it would probably be good to be much more explicit in the official manual, and for books like RWO to mention this prominently for beginners.
One thing @bluddy noted during the discussion which was valuable for having it click in my head: if constructors had function types a la Haskell, then the obvious distinction is:
That would seem like the right location. I’d propose something like the following language:
Note that although the syntax is almost the same, there is a significant difference between a variant constructor that takes multiple arguments and one that takes a single tuple as an argument. Consider the following type:
type foo =
| A of int*int
| B of (int*int)
Although the syntax for A and B seems nearly the same, the parentheses are quite significant. A is a constructor that takes two arguments, both of which are integer typed values, while B is a constructor that takes a single argument, a tuple of two integers. These behave quite differently in many circumstances, such as when pattern matching. Be careful not to confuse them.
This is a good start but I think your text is context-heavy for a beginner, the many circumstances would be really unclear for this audience. An alternative path might be
Beware that the syntax for a constructor with multiple arguments
type t = A of float * float
looks a lot like a tuple but this is not the case: all arguments are kept separated inside the constructor block, and it is not possible to extract those as a tuple without constructing a new tuple
let to_tuple (A (x,y)) = (x,y)
let wrong_to_tuple (A x) = x
If you want to pack all arguments together, it is possible to use a tuple as a constructor argument at the price of a bulkier memory representation:
type s = B of (float * float)
let to_tuple_bis (B x) = x
I find the beginning of what you wrote there very clear, i.e.:
Beware that the syntax for a constructor with multiple arguments
type t = A of float * float
looks a lot like a tuple but this is not the case:
but I think as a beginner I would have found the rest quite confusing. It discusses things like memory representation and tradeoffs on that which a beginner will not be prepared to think about, and brings in a couple of different let declaration styles as well.
How about:
Beware! The syntax for a constructor with multiple arguments:
type t = A of float * float
looks confusingly like it takes a tuple as its argument. However, A is a constructor that takes two float arguments. It is different from:
type t = B of (float * float)
in which B is a constructor that takes one argument, a tuple of two floats.
The two will behave differently under pattern matching, as A has two arguments, and B has only one.
Yes, the part about memory was too technical.
I agree that an example would complement nicely your text:
let a_to_tuple a = match a with
| A (x,y) -> (x,y)
let b_to_tuple b = match b with
| B t -> t
Also, I would rename the second type to s, to avoid using type-directed disambiguation (which is now described in a subsection at the end of this section) at this stage.
At last, I think that in a tutorial, it is not fair to introduce a potential choice without either explaining the potential consequence of this choice or advising on a branch, so I would add:
In doubt, it is generally more efficient to use the multiple arguments version.
Yah. It might be important to note that but note the difference in what it means. Because the syntax is confusing, it would be best to err on the side of explicitness and clarity here.