How to represent tuples in AST?

Let’s say newcomers are writing miniML interpreter with tuples. How should these tuples be represented?

  • | PExp_tuple of expression list is done in the compiler, but this representation allows empty and singleton tuples. i.e. the single value could have multiple representations which doesn’t look algebraic, and will lead to extra code to handle invariants.
  • | PExp_tuple of expression * expression * expression list may look cumbersome to use (without active patterns) but is solid enough.

Which way should be recommended to beginners? I kind of tend to 2nd one, but it is difficult to parry the reference to OCaml compiler. Maybe there is a 3rd way without diving to dependent types?

1 Like

I think the first one. Even though, as you indicate, some cases cannot occur because of syntactic restrictions in the language, there’s nothing unsound about the unit and singleton cases. In other words, you will not need any special code to handle or disqualify those cases, the same code that works for n-tuples for n >= 2 will also work for 0- and 1-tuples, it is just that those cases will never be exercised.



One thing that I have seen in many introductory texts about ML-like languages is that n-tuples with n > 2 are just syntactic sugar for nested pairs. So the AST would just have a constructor

| PExp_pair of expression * expression 

A tuple (a, b, c) would then be parsed as either ((a,b),c) or more likely (a, (b, c)). With this you cannot have empty or unary tuples, and the evaluation routine also gets simplified. It also allows you to define the fst and snd functions like in OCaml, and they would be well defined.

No, type:

# # (1,(2,3));;
- : int * (int * int) = (1, (2, 3))
# ((1,2),3);;
- : (int * int) * int = ((1, 2), 3)
# (1,2,3);;
- : int * int * int = (1, 2, 3)
# fst (1,2,3)
Error: This expression has type 'a * 'b * 'c
       but an expression was expected of type 'd * 'e

The 3 first tuples have different types. And the last expression shows that a 'a * 'b * 'c is not a pair. It can be compared to a structure with 3 fields. (I guess the memory representation is similar).

Yes, such a representation will permit 0-tuples or 1-tuples. But the idea is to have a grammar which doesn’t permit it. Something like this tuples := expr ( ',' expr )+. (Adapt it to your prefered grammar compiler). Then, in OCaml, you won’t be able to have such tuples. (42) can’t be interpreted as a tuple.

Yes, in OCaml those three have a different type. But if you were to write your own miniML interpreter (as I understood was the intention here) you may choose to only have pairs and parse tuples as nested pairs. Then fst and snd are always well defined for any tuple.

So basically , would just become an infix operator for constructing pairs, and you would need to choose if it’s right or left associative.

However, you do pay in terms of expressivity, as the expressions (1, 2, 3) and (1, (2, 3)) (assuming , to be right associative) would both end up having the type int * (int * int), and sometimes you might want to make them distinct.

But it should not be the way to explain functional programming to newcomers! They can already write the code with many corner cases and implict invariants in Java. We should be able to do better.

I guess that newcomers have some math background. When in math, we have a dimension 3 vector in R^3, we mean RxRxR, not Rx(RxR), then I do not see the pedagogic interest of decomposing a 'a * 'a * 'a as a 'a * ('a * 'a). We have such a decomposition about lists because they are implemented like this and this permits a recursive processing with them, but it may not be so intuitive. I mean that when I read [1; 2; 3], I see a 3 items unit, not fundamentally different from [|1; 2; 3|]. The main difference is that it is easier to remove the first item in a recursion. (Operators complexity are also different, obviously).

Perhaps newcomers come from Python, and here, (1,2,3) is not (1,(2,3)). The only difference between Python and OCaml tuples is that Python can express 1-tuple with a special notation: (1,), and the 0-tuple, (), is considered as a tuple where OCaml name it a unit.

I have already made a langage M interpreter. Having a lists in the AST (like a set of arguments or a set of record fields) doesn’t make any problem.

Sure, but type theory is not math. And if you think HoTT is math then it’s the other way around: all tuples are built from pairs.

P.S. “functional programming” and “type theory” are not synonymous.

P.P.S Its been a very long time, but in my experience mathematical pedogogy pays no attention at all to type theory.

That makes it sound like 2 different names for the same thing. But they are totally different animals, not in any way comparable. IOW OCaml ‘unit’ is not in any sense related to () in Python.

And so, what it is if it’s not math?

Not really. Flatten tuples or the one built from pairs are isomorphic, and that’s all we need to know if we want to reason about their denotation. But, from an operational point of view (space and time consumption), one representation could be better than the other.

They are related in a way: 0 length tuples and unit only admit a single possible value.

But sure, the Python () can be used with all functions which deal with tuple (len…) and is surely stored as an 0-array with a tuple tag. In OCaml, we don’t have generic tuple function (no len…), then we can’t compare further. OCaml () is also implemented differently (unboxed value).

And Python doesn’t consider different types for different length of tuples.

Yes, really! If you don’t think all n-tuples are built inductively from pairs, then you must be committed to infinitely many tuple constructors. Build me a 10-billion tuple without that.