Extensible Records in OCaml

gilbert · June 16, 2018, 2:43am

Hi, I’m relatively new to OCaml but have been digging in deep for the past 6 months (thanks to ReasonML). OCaml has turned out to be a wonderful language with a lot of great features, but one feature that is sorely missing for me is extensible records.

In this post I describe how implementing extensible variants is a worthwhile effort, covering:

Why extensible records?
The ideal solution
The interim solution

Why extensible records?

Extensible records shine most when writing and working with frameworks. Take a web server framework for instance. It’s common for frameworks to support composable, reusable middleware that you can easily plug into your app. See how the following authenticate_user middleware function is used:

let me_route =
get("/me")
>>> authenticate_user
>>> fun req ->
  User.json req.ctx.user |> Response.send(200)

In our pretend web framework, a route has a context. This context happens to be an extensible record, allowing middleware to extended it with arbitrary fields. In this case, the authenticate_user function extends req.ctx with a new user field.

This is a powerful pattern, evidenced when you start working with multiple middleware functions:

let update_team_route =
patch("/teams")

(* Adds req.ctx.team *)
>>> param(id => { team: Team.get(id) })

(* Adds req.ctx.user *)
>>> auth_user

(* Checks req.ctx.user has access to req.ctx.team *)
>>> auth_team_admin

(* Adds req.ctx.body *)
>>> json_body
>>> fun req -> ...

Extensible records take some of the best aspects from both the dynamic and static programming worlds. Even though our functions add arbitrary fields, OCaml will statically disallow you from composing middleware that try to read fields that don’t exist.

The Ideal Solution

In an ideal world, OCaml would support extensible records directly. Due to how current “normal” records are optimized for speedy constant access, this would have to be separate from that.

I have two ideas on how to approach this, but I don’t want to take too much focus away from the interim solution.

See the ideas

Idea 1: New Datatype

As a new, built-in datatype, extensible records would only need (I think) minimal syntax to introduce into the language. For starters, an extensible record can be marked by a backtick (giving a nod to polymorphic variants), whereas updating a record can be marked using a pipe character, a with keyword, or nothing at all:

let x_rec = `{ x = 10 }

let xyz_rec = `{ x_rec | y = "hmm"; z = "nice" }
(* or *)
let xyz_rec = `{ x_rec with y = "hmm"; z = "nice" }
(* or *)
let xyz_rec = `{ x_rec; y = "hmm"; z = "nice" }

Field access will probably require more bikeshed, but here’s one way to do it:

let x_rec = `{ x = 10 }
let n = x_rec`.x

Whichever operator is chosen, type inference will of course be as streamline as the rest of the language:

let f record = record`.x + record`.y

Here f is inferred as having the type `{ x: int; y: int; 'r } -> int

Idea 2: Extensible Objects

When it comes to typing, OCaml’s object system is already very close to extensible records. The only missing feature is OCaml cannot extend an arbitrary object with a new method. This might look something like:

let obj2 = object extend obj1 with method y = 20 end

A major downside is the mismatch where how objects have internal state while extensible records do not. This isn’t a blocker to the feature, but it might be a blocker to optimization. A way to mark an object as “pure” might solve this problem.

The Interim Solution

My primary motivator for wanting extensible records is to use it with BuckleScript. This is why I present a small addition that could be implemented and used today, without blocking any potential future implementation of our extensible friends.

The following code is already valid OCaml:

type t = < m : int >
type u = < n : int; t; k : int >

However, the following is not:

# type 'a add_x = (< .. > as 'a) -> < 'a; x : int >;;
Error: The type < .. > is not an object type

If something like this could be supported in the type system, we could immediately start using it in BuckleScript external types compiling to JavaScript objects. In other words, this would be useful even without a native OCaml extensible records implementation.

Conclusion

Extensible records are a powerful feature that allow writing APIs that are both succinct and type safe. I really, really want extensible types for the web framework I’m writing in BuckleScript!

I’m new to the OCaml community, so I have to ask: Is this feasible? How do features get approved to implement? I’m willing to contribute and/or help start a fund for this feature. Obviously I’m very excited for this

(special thanks to @octachron for patiently answering my workaround questions)

paurkedal · June 16, 2018, 2:53pm

The issue is that the 'a parameter in your add_x should be a row variable, which isn’t supported by OCaml. I think it would be a useful extension to both object types and polymorphic variants; I’ve hit the issue myself a few times.

Another solution you may consider is hmap. This wouldn’t typecheck the middleware usage, only the keys, since the container itself has no type information about which keys are present.

pveber · June 17, 2018, 6:30am

Have you had a look at ppx_poly_record:

? It seems close to achieving what you want.

gilbert · June 17, 2018, 6:42pm

@pveber Unfortunately you cannot extend ppx_poly_records:

The same restrictions of objects apply to poly records too. For example, you cannot newly add fields by !{ e with l = e’}: the expression forces e contain the field l

pveber · June 17, 2018, 11:30pm

Right, sorry. There was also an older prototype by Jacques Garrigue that could extend records too, although completely satisfactorily. It’s called polymap on this page:

https://www.math.nagoya-u.ac.jp/~garrigue/code/ocaml.html

gilbert · June 18, 2018, 5:35am

Ah man, polymap is so close to what I want! It’s unfortunate that accessing non-present fields causes a runtime error.

It’s fascinating how the syntax they chose is almost identical to the one I proposed. We just might be on to something

leostera · June 18, 2018, 6:21am

How would you know otherwise when two different middleware extend the record with the same fields? What about overriding a currently existing field with a different value? How can we assure subsequent middlewares are not implicitly relying on a particular middleware’s output? Can we statically guarantee correct-ordering?

I think we could build a datatype that represents how the request is being handled:

let req: Http.Request.t; /* let's assume this one */
let mid1: Mid.Cors.t(Http.Request.t) = Mid.Cors.run req
let mid2: Mid.ParseBody.t(Mid.Cors.t(Http.Request.t))) = Mid.ParseBody.run mid1
let mid3: Mid.Auth.t(Mid.ParseBody.t(Mid.Cors.t(Http.Request.t))) = Mid.Auth.run mid2

This would mean your middleware consumption would happen as recursive function that either builds whatever record you want with it, or ideally an implementation of a Foldable functor that can fold these down to a record.

You still get the advantages of a record by the end, but in the meantime every middleware you throw in the chain adds relevant type-information to it.

Just my two cents on the matter!

PS: the idea of extending records have been explore academically before too, take a look at: https://www.microsoft.com/en-us/research/wp-content/uploads/1999/01/recpro.pdf and https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/scopedlabels.pdf — you may find something useful there!

gilbert · June 18, 2018, 4:22pm

How would you know otherwise when two different middleware extend the record with the same fields? What about overriding a currently existing field with a different value? How can we assure subsequent middlewares are not implicitly relying on a particular middleware’s output?

The answers to these questions are all found explicit in the extensible record types. It’s true that a middleware gets more freedom, but they also can’t do anything “in secret” – at least, in the system I have in mind.

Can we statically guarantee correct-ordering?

What kind of ordering do you mean?

This would mean your middleware consumption would happen as recursive function that either builds whatever record you want with it, or ideally an implementation of a Foldable functor that can fold these down to a record.

Can you expand on what this would look like? The solutions I could find required working with option types, or required writing a new type definition & converter function for each route.

PS: the idea of extending records have been explore academically before too

Yes! Scoped labels is what I have in mind, and one I’ve personally implemented before elsewhere.

bcc32 · June 18, 2018, 6:16pm

FOR AMUSEMENT PURPOSES ONLY

You can simulate some aspects of the requested behavior using labeled arguments and continuation-passing style.

module Http_method = struct
  type t =
    | PATCH
    (* and others... *)
end

module Request = struct
  type t =
    { path : string
    ; method_ : Http_method.t
    ; params : (string * string) list
    }
  ;;

  let path t = t.path

  let method_ t = t.method_

  let get_param t key = List.assoc key t.params
end

module Response : sig
  type t

  val of_string : string -> t
  val send : t -> unit
end = struct
  type t = string

  let of_string x = x

  let send = print_endline
end

let add_team k req =
  let get_team id =
    if id = "1"
    then ("the one and only team")
    else (failwith "not a team")
  in
  let team_id = Request.get_param req "teamid" in
  k req ~team:(get_team team_id)
;;

let auth_user k req =
  let username = Request.get_param req "username" in
  let password = Request.get_param req "password" in
  let authorized = username = "admin" && password = "letmein" in
  k req ~authorized
;;

type route = (Response.t -> unit) -> Request.t -> (unit -> unit) -> unit

let (>>) f g x = f (g x)

let make_response k _req ~authorized ~team =
  let resp =
    if authorized
    then ("your team name is \"" ^ team ^ "\"")
    else "not authorized..."
  in
  k (Response.of_string resp)
;;

let update_team_route : route =
  fun k req next_route ->
    if (Request.path req = "/teams" && Request.method_ req = PATCH)
    then (
      let stack = add_team >> auth_user >> make_response in
      stack k req)
    else next_route ()
;;

let all_routes k req =
  update_team_route k req (fun () ->
    failwith "out of routes")
;;

let main () =
  all_routes Response.send
    { path = "/teams"
    ; method_ = PATCH
    ; params = [ "teamid", "1"
               ; "username", "admin"
               ; "password", "letmein" ]
    }
;;

let () = main ()

Unfortunately, I haven’t been able to find a way to get the compiler to be less picky about the order of labels (reversing ~team and ~authorized in the arguments to make_response causes a type error). Also, you have to list all of the arguments you will receive, which is very clumsy.

Anyway, I thought this might give somebody a chuckle.

davesnx · December 17, 2021, 5:11pm

“Extensible records” can imply a little more than a new type for the type-checker. As @leostera points out.

I’m thinking in Elm now, but you are able to have a function that accepts an extensive record by only one field (or a few fields, but not all) and will accept some sort of polymorphism here.

Where any record with that field would be valid for the type-checker.

Without having any idea on how to do that in a toy language or even think about doing it in OCaml, I have the impression that it’s a massive change.

With that being said, it’s an incredible feature from Elm that I wish there was possible in OCaml, I agree there!

yawaramin · December 17, 2021, 6:53pm

I’ve actually implemented the extensible records use case you describe in my framework, using plain objects: Ch05_Filters (re-web.ReWeb__Manual.Ch05_Filters)

Note, that’s in ReasonML syntax but the equivalent example in OCaml would be:

let validate_session next request =
  match request |> Request.cookies |> List.assoc_opt "SESSION" with
  | Some session ->
    let ctx = object
      method prev = Request.context request
      method session = session
    end
    in
    request |> Request.set_context ctx |> next
  | None ->
    `Unauthorized |> Response.of_status |> Lwt.return

Nothing fancy, just recursively keep stashing the previous version of the context in a new object method prev, in each new middleware that changes the context. So you might end up walking backwards through the chain of method like (Request.context request)#prev#prev#xyz to grab the piece of context you want from some middleware farther back in the chain.

This is all typechecked of course, so middlewares naturally need to be in the correct order.

struktured · August 18, 2022, 8:26pm

This topic appears to be dormant the last couple years.

Was there ever an RFC made for this feature? I went through them and did not see anything relevant.

Anyone actively working on one? If so, would like to learn more.

Topic		Replies	Views
Returning an object with additional fields Learning objects	43	760	October 7, 2024
Default field values. Partial records? Extensible records? Learning help	6	372	August 4, 2025
Expand records returned by functions Learning	6	1460	August 26, 2018
Major OCaml pain points Ecosystem language-design	33	3586	January 16, 2020
Avoiding extra matching with extensible GADTs Learning gadt , type-inference	19	968	September 21, 2024