Is there any ppx to reduce the boilerplate from translating from one record to another?

There are several situations where I do have very similar record types. They are usually 90% similar, but some fields are different: needs to be mapped to a different value, needs to be omitted or a new field needs to be added.

In Ocaml code, this usually means that I need to manually write functions that copy all the other fields and add the little extra required information. I was wondering if there is some PPX that can help reduce this boilerplate?

This usually happens to me when crossing Domains. For example, records extracted from the database are very similar to records that contain some extra computed fields, but I wish I don’t have to copy all the extra fields.

2 Likes

Perhaps you could nest the DB record in the second record? Then there should be no duplication boilerplate, right?

Can you provide some self-contained examples?

[full disclosure: I think the camlp5-based PPX rewriter pa_ppx_migrate might do the trick; it isn’t going to be possible to use it, b/c it’s based on camlp5, but if it is otherwise suitable, you might find that implementing something like it did the trick. So trying the experiment (which I’d be happy to to do) might give you data-points]

[perhaps this has already occurred to you; if so, please ignore]

This problem is related to the problem of migrating from one version of the OCaml AST, to the next (or prior). Fields appear, disappear, sometimes change shape. Ditto for constructors in algebraic datatypes. But most fields/constructors stay about the same. So it’s possible to write a tool that generates all the boilerplate for the stuff that stays the same, while allowing you to shove in the few bits for what changes.

I’ve written something to do that (pa_ppx_migrate) (but again, based on camlp5). I’m sure that such a tool could be really useful for many OCaml problems: I use pa_ppx_migrate all the time, all the time.

B/c in addition to “migrating” from one type to another, it trivially migrates from a type to itself. And since the “migrator” object allows to override any type’s function with another that can use the original function as a fallback, one can customize the migrator in a very precise way to do other interesting things.

I considered record nesting, but it is not a good idea for our use-case for several reasons. The first one is that it breaks encapsulation. If you still have access to the original record with it’s original fields you can have all kind of bad ideas about how to just shortcut to pick some fields from there.
But it also can be confusing, specially in the scenarios where the feature I’m asking about makes most sense. Imagine you have a record with 50 fields, where 15 of them needs to be transformed, and the rest may be kept the same. It will be very confusing to remember which ones needs to be picked from the root of the record and which ones are supposed to be read from the originally nested records. Apply this transformation a couple of times and the thing may become a worse mess than hundreds of lines of boilerplate.

1 Like

You could possibly use first-class modules, but that adds a challenge as well. In general, I don’t think there is a good solution to this. Certainly you could make your own. But I think any solution will only work at small scale. Imagine if you have a business object that needs to be serialized to the DB and also to the API, and these are subtly differently types. So you need to represent three very similar but not quite equal types. I think most ways you slice that problem, it’s going to be awkward. But could be an interesting PPX to try to write.

1 Like

I run into this problem more than I’d like. Thinking about it a bit, ppx_fields_conv gets you halfway there.

Given:

type t = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
  mutable cancelled : bool;
} [@@deriving fields]

It will generate:

module Fields : sig
  (*...*)
  val create
    :  dir:[ `Buy | `Sell ]
    -> quantity  : int
    -> price     : float
    -> cancelled : bool
    -> t

That gives you a curryable record composer.

The other half, which doesn’t exist yet AFAIK, would be to have a generated curryable de-structurer:

   val destructure
     : t -> (   dir: [`Buy | `Sell)
             -> quantity:int
             -> price:float
             -> cancelled:bool
             -> 'a)
     -> 'a

Then you would be able to write an adapter that leans heavily on currying to avoid concerning yourself with the fields that didn’t change.

module Foo2d = struct
  type t = { x: float; y: float } [@@deriving fields]
end

module Foo3d = struct
  type t = { x: float; y: float; z: float } [@@deriving fields]
end

let upgrade_to_3d foo2d =
  Foo2d.Fields.destructure foo2d (Foo3d.Fields.create ~z:0.0)

let downgrade_to_2d foo3d =
  Foo3d.Fields.destructure foo3d (fun ~z:_ -> Foo2d.Fields.create)

Might be a fun exercise to write this destructure ppx.

4 Likes

This does sound quite useful, and while reading your comment, I thought “surely fields_conv” already has this!" Alas, it doesn’t. Feels like a very natural enhancement there.

I was thinking in something similar (in case the answer to this question was: “no, there is any”).
The part I like about your solution is that it seems to scale to any number of field without any extra declarations or requirements from the user side. The only thing I don’t like about it is that it has positional arguments (hence, easy to get the wrong argument at the wrong position) and that it seems to be dependent on the declaration order of the record.

My idea has to variants, and I don’t know which one I like.
First, given you have the following structures:

type foo = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
}

type to_bar = {
canceled: bool
}  [@@from foo]

Then the PPX will generate a function that will take a foo, the intermediary representation to_bar and will return you a bar (which is also a type generated by the PPX:

val foo_to_bar: (foo, to_bar) => bar

This has the advantage that you can construct the intermediary record with the values you want, and with all the modifications you need and the type system will not allow you to provide the wrong fields with the wrong values.

The other alternative, rather than creating the intermediary record type is to declare both the source and destination types, and let the PPX compare those and create a intermediary record with those values that are present or different in the target structure, and generate the same function:

type foo = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
}

type bar = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
  canceled: bool
}  [@@from foo]

This alternative has the advantage of being more clear when reading the code, being able to see both structures in their final form and letting you select the name of both rather than the PPX trying ti figure it out. The disadvantage is that it may need extra annotations for those fields that you may want to change but are equal in both types.

Same here, I thought fields_cons was going to have that functionality, but it does not.

Since you posted an example, I thought I’d work thru it with pa_ppx_migrate. The code is here: sandbox-public/migrate-records at master · chetmurthy/sandbox-public · GitHub

Notes:

  1. rec_types.ml has your types; I had to change the polymorphic variant into a regular one, b/c pa_ppx_migrate doesn’t support the former, and for some reason it isn’t just ignoring it. But that’s an easily-fixable bug. [ETA: fixed per below]

  2. rec_migrate.ml has the “migration”. It specifies the two types, specifies that the field dropped should be skipped, and the new field canceled should be computed by the expression __dt__.aux (which is the auxiliary field of the dispatch-table, used to pass arguments like this into the migration.

  3. I’ve included the result of the PPX rewriter, in rec_migrate_ppo.ml, so you can see what a PPX rewriter might produce.

Here’s a transcript. As you can see, when we construct the dt object, we pass false, and that gets used as the value of canceled. That aux value can be of course arbitrarily complex.

# #load "rec_types.cmo";;
# #load "rec_migrate.cmo";;
# open Rec_migrate ;;
# let f = { dir = Buy ; quantity = 10 ; price = 1.0 ; dropped = true } ;;
val f : Rec_migrate.foo =
  {dir = Buy; quantity = 10; price = 1.; dropped = true}
# let dt = make_dt false ;;
val dt : bool Rec_migrate.dispatch_table_t =
  {aux = false; migrate_dir_t = <fun>; migrate_foo = <fun>}
# dt.migrate_foo dt f ;;
- : Rec_types.bar =
{Rec_types.dir = Rec_types.Buy; quantity = 10; price = 1.; canceled = false}
# 

ETA: fixed the problem that was blocking using polyvariants. added file include_ml to load everything in the toplevel.

I think ppx_stable does exactly what you want.

module Foo = struct
type t = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
}
end

module Bar = struct
type t = {
  dir : [ `Buy | `Sell ];
  quantity : int;
  price : float;
  canceled: bool
} [@@deriving stable_record ~version:Foo.t ~remove:[canceled] ]
end

EDIT: changed add to remove so that this example compiles

4 Likes

Yes! That is exactly what I was looking for. Too bad that none of my google search terms lead me to that ppx.

By the way, it seems to me that their example of adding a field is wrong?

module V1 = struct
  type t =
    { x0 : X0.t
    ; x1 : X1.t
    }
end

module V2 = struct
  type t =
    { x0 : X0.t
    } [@@deriving stable_record ~version:V1.t ~add:[x1]]
end

let convert_to_v1 (v2 : V2.t) : V1.t =
  V2.to_V1_t v2 ~x1:(X1.of_int 1234)

If I want to add a field, what I want is to go from V1 to V2, not the other way around.

The ppx generates both a to_V1_t and an of_V1_t function, I think, so you’re able to go in both directions – but I guess my example should say “remove”? I don’t remember exactly

That sounds awesome.
I think the problem is that the described work flows goes against my intuition.
Putting it in terms the ppx uses what I want is:

  • Go from V1 to V2
  • Add the annotations always on the newest/target type: V2
  • Define the operations in terms of what needs to be done to to from V1 to a V2. This means that add should mean add fields that are new for V2 and not present in V1. And that remove should mean remove fields that are present in V1 but that are not going to be present in V2.

Does that make sense?

I edited my examples so that it compiles. So now the operations are the ones that need to be applied to v2 to get v1 (remove in this case). That should make more sense. I think the ppx is on opam, maybe try out a couple of examples to get a better intuition, I’m mostly working from memory so I might not be a good source for the exact convention

1 Like

Really appreciate the effort you put in putting this all together. The tool looks impressive, but I think it is for a much more advanced use-case than mine? The example you gave have more code than the average “conversion” situation that I usually face. But it is good to know that such a tool exists in case I have to deal with more complicated situations.

well, it’s fair to say I wrote it to automate things like converting between versions of the OCaml AST (it comes with conversions between all the 4.X AST versions) but I use it routinely in many projects for both simple conversions, and also for map/iter over complex AST types.

For flat record-types it’s probably overkill. That said, I think the approach – which is to use an attribute of the typedecl that carries info about which fields to remove, which to add, and how to compute the values of the fields-to-add, based on the old record’s fields and an auxiliary argument, is the right way to go. It’s very simple to write such attributes down.

I don’t know how hard it would be to write a PPX rewriter for this using the standard tooling, but I think that’s your way forward.

It wasn’t much effort. Like, literally, a few minutes, and then after noticing why I had to use a hack, another few minutes.

I was thinking about your use-case, and it seems like you could do something really straightforward, based on the extension syntax:

[%migrate: ty1 -> ty2]

where each of ty1 and ty2 is a record type. And of course, you could use ppx_import to pull in those types. Then maybe a little bit attributes to provide indications for how to deal with new fields in ty2. Should be straightforward.