Syntax sugar for records behind constructors

Preface

I want to try my hand at implementing and sending off an RFC to the language for what I perceive to be a small quality-of-life feature around inline records. I don’t have much experience with language implementation, so I’m posting this here to gauge how difficult/disruptive it actually is to do this, and whether it was attempted/suggested/turned-down before.

Consider the following

type r = R of { mutable a : int }

let r = R { a = 42 }

The nice thing about inline-records used like this is that they allow better disambiguation without runtime overhead, the not-so-nice thing about them is that you get lots of syntactic overhead in dealing with values of those types.

Let’s explore the potential for improvement here:

TL;DR

This post juggles a couple of ideas that make working with inline records and tuples nicer, the ideas are tangentially related, and don’t interfere with one another, but together would be better than their sum. I’m trying to get community & maintainers opinions on them. One of the ideas might crystallize and become a full-blown RFC if the attitude towards it appears to be good. The ideas are, in short:

  1. being able to pattern-match and/or construct (inline-)records positionally, without naming the fields
  2. making tuples regular records with numbered fields
  3. bringing the same syntactic forms we have for records to inline records, bypassing the constructors

I’m personally most interested in no. 3

Pattern matching and Construction

Records add names to constructor fields, thus making them order-independent, but that doesn’t change the fact that the declaration has syntactic ordering. We can use that fact to our advantage:

type r = R of { a : int; b : int; c : int }

let r = R(1, 2, 3) (* R { a = 1; b = 2; c = 3 } *)
let f = function
  | R(x, y, z) -> x + y + z

now the function f might not be convincing at first, since we already have the punned form: R{a; b; c}, but notice here that I’m not forced to use the field names, and x, y, z are full-fledged patterns. This will later prove useful.

This could also be generalized to regular records, but doing so will mean we need to think about how we name tuple fields…
perhaps tup._1 etc… I think this already exists as #1 tup in SML which predates OCaml, so I’m curios about why OCaml tuples were designed to be distinct.

Deep nested access and modification

type r = {a : s}
 and s = {b : t}
 and t = {mutable c : int}

let access r = r.a.b.c
let modify r = r.a.b.c <- 42

these forms are only possible for normal records, add constructors and it becomes much more verbose:

type r = R of {a : s}
 and s = S of {b : t}
 and t = T of {mutable c : int}

let access (R { a = S { b = T {c} } }) = c
let modify (R { a = S { b = T t } }) = t.c <- 42

having positional field patterns, as mentioned above, alleviates this almost to a level close to that of the dot syntax:

let access (R S T {c}) = c

but I’m convinced there is good value in having anonymous access at hand, where you don’t have to pattern-match and create bindings. r.a.b.c + r.x is certainly easier and more straightforward than creating two bindings with potentially large constructor names.

The issue of refutability and ambiguity

Ambiguity will be the same we deal with today in the language, the difference being that now the programmer is empowered to comfortably disambiguate when needed, assuming they make it so record declaration always follows the inline style… But records previously were irrefutable. Now however, this is a real added issue.

We don’t have untagged union types (thankfully imo). One cannot write the following declaration:

type u = { a : int } | { a : string }

but they can write it tagged like this

type u = I of { a : int } | S of { a : string }

this means r.a can have an ambiguous type. And even if the language implicitly picks one type, you still end up with a partial match on desugaring.

My answer to this is a last-declaration resolution similar to that of

type u = { a : int }
type v = { a : int }

let x = { a = 42 } (* val x : v *)

so in this case r.a would desugar to match r with S { a } -> a.
And you get the usual warning no. 8… There’s no way I can think of around this. There probably shouldn’t be a way around this.

There shouldn’t be ambiguity around positional pattern-matching, but for positional construction specifically (i.e. R(1, 2, 3) => R{ a = 1; b = 2; c = 3 }), regular untagged records would face some difficulties and wouldn’t benefit much from the feature. Singleton records would have no way of constructing positionally, and if two records share the same shape, they won’t be disambiguated.

type a = { b : int; c : float }
type x = { y : int; z : float }
type h = { j : int }

let r = (1, 2.3) (* : a ? x *)
let s = (4) (* : ?? h ??? *)

My answer to this is to always treat positionally constructed records as tuples, and only override that when there’s no ambiguity. i.e. let r : x = 1, 2.3 works fine. But we could be even more strict and say this feature is only available for inline-records. Maximal symmetry isn’t that noble of a goal.

RFC(-ish)

As you can see, my ideas are currently completely on-paper, and mostly stolen from other programming languages. I’ll probably also steal their homework when I actually get to implementation, but I want to hear your opinions on what’s currently presented. Is this a horribly bad idea for a reason I didn’t consider? is it horribly complex to actually implement? Is it a welcome change by both maintainers and community? etc…
I don’t want to start working on what turns out to be a fool’s errand (at least at my current experience level, little background in theory, a couple of ppxes, and only coincidental look at the compiler’s internals).

That’s all, thank you for reading!

3 Likes

Slightly related, you might want to also look at f-star if you haven’t already which tackles this problem with auto generated projectors: Inductive types and pattern matching — Proof-Oriented Programming in F* documentation

Some quick personal opinion on the propositions:

1: is a no: fields are position independent by design. One should not be able to refer to field positionally.
2. is a breaking change: there is no tuple type family currently. It is better to implement a heterogeneous array type which can be done in user-land (with a bit of Obj.magic) without compiler support.
3. is an even more complex form of type-directed disambiguation. I am unconvinced that the change is worth the additional implicitness.

1 Like