There are several situations where I do have very similar record types. They are usually 90% similar, but some fields are different: needs to be mapped to a different value, needs to be omitted or a new field needs to be added.
In Ocaml code, this usually means that I need to manually write functions that copy all the other fields and add the little extra required information. I was wondering if there is some PPX that can help reduce this boilerplate?
This usually happens to me when crossing Domains. For example, records extracted from the database are very similar to records that contain some extra computed fields, but I wish I don’t have to copy all the extra fields.
[full disclosure: I think the camlp5-based PPX rewriter pa_ppx_migrate might do the trick; it isn’t going to be possible to use it, b/c it’s based on camlp5, but if it is otherwise suitable, you might find that implementing something like it did the trick. So trying the experiment (which I’d be happy to to do) might give you data-points]
[perhaps this has already occurred to you; if so, please ignore]
This problem is related to the problem of migrating from one version of the OCaml AST, to the next (or prior). Fields appear, disappear, sometimes change shape. Ditto for constructors in algebraic datatypes. But most fields/constructors stay about the same. So it’s possible to write a tool that generates all the boilerplate for the stuff that stays the same, while allowing you to shove in the few bits for what changes.
I’ve written something to do that (pa_ppx_migrate) (but again, based on camlp5). I’m sure that such a tool could be really useful for many OCaml problems: I use pa_ppx_migrate all the time, all the time.
B/c in addition to “migrating” from one type to another, it trivially migrates from a type to itself. And since the “migrator” object allows to override any type’s function with another that can use the original function as a fallback, one can customize the migrator in a very precise way to do other interesting things.
I considered record nesting, but it is not a good idea for our use-case for several reasons. The first one is that it breaks encapsulation. If you still have access to the original record with it’s original fields you can have all kind of bad ideas about how to just shortcut to pick some fields from there.
But it also can be confusing, specially in the scenarios where the feature I’m asking about makes most sense. Imagine you have a record with 50 fields, where 15 of them needs to be transformed, and the rest may be kept the same. It will be very confusing to remember which ones needs to be picked from the root of the record and which ones are supposed to be read from the originally nested records. Apply this transformation a couple of times and the thing may become a worse mess than hundreds of lines of boilerplate.
You could possibly use first-class modules, but that adds a challenge as well. In general, I don’t think there is a good solution to this. Certainly you could make your own. But I think any solution will only work at small scale. Imagine if you have a business object that needs to be serialized to the DB and also to the API, and these are subtly differently types. So you need to represent three very similar but not quite equal types. I think most ways you slice that problem, it’s going to be awkward. But could be an interesting PPX to try to write.
This does sound quite useful, and while reading your comment, I thought “surely fields_conv” already has this!" Alas, it doesn’t. Feels like a very natural enhancement there.
I was thinking in something similar (in case the answer to this question was: “no, there is any”).
The part I like about your solution is that it seems to scale to any number of field without any extra declarations or requirements from the user side. The only thing I don’t like about it is that it has positional arguments (hence, easy to get the wrong argument at the wrong position) and that it seems to be dependent on the declaration order of the record.
My idea has to variants, and I don’t know which one I like.
First, given you have the following structures:
type foo = {
dir : [ `Buy | `Sell ];
quantity : int;
price : float;
}
type to_bar = {
canceled: bool
} [@@from foo]
Then the PPX will generate a function that will take a foo, the intermediary representation to_bar and will return you a bar (which is also a type generated by the PPX:
val foo_to_bar: (foo, to_bar) => bar
This has the advantage that you can construct the intermediary record with the values you want, and with all the modifications you need and the type system will not allow you to provide the wrong fields with the wrong values.
The other alternative, rather than creating the intermediary record type is to declare both the source and destination types, and let the PPX compare those and create a intermediary record with those values that are present or different in the target structure, and generate the same function:
type foo = {
dir : [ `Buy | `Sell ];
quantity : int;
price : float;
}
type bar = {
dir : [ `Buy | `Sell ];
quantity : int;
price : float;
canceled: bool
} [@@from foo]
This alternative has the advantage of being more clear when reading the code, being able to see both structures in their final form and letting you select the name of both rather than the PPX trying ti figure it out. The disadvantage is that it may need extra annotations for those fields that you may want to change but are equal in both types.
rec_types.ml has your types; I had to change the polymorphic variant into a regular one, b/c pa_ppx_migrate doesn’t support the former, and for some reason it isn’t just ignoring it. But that’s an easily-fixable bug. [ETA: fixed per below]
rec_migrate.ml has the “migration”. It specifies the two types, specifies that the field dropped should be skipped, and the new field canceled should be computed by the expression __dt__.aux (which is the auxiliary field of the dispatch-table, used to pass arguments like this into the migration.
I’ve included the result of the PPX rewriter, in rec_migrate_ppo.ml, so you can see what a PPX rewriter might produce.
Here’s a transcript. As you can see, when we construct the dt object, we pass false, and that gets used as the value of canceled. That aux value can be of course arbitrarily complex.
The ppx generates both a to_V1_t and an of_V1_t function, I think, so you’re able to go in both directions – but I guess my example should say “remove”? I don’t remember exactly
That sounds awesome.
I think the problem is that the described work flows goes against my intuition.
Putting it in terms the ppx uses what I want is:
Go from V1 to V2
Add the annotations always on the newest/target type: V2
Define the operations in terms of what needs to be done to to from V1 to a V2. This means that add should mean add fields that are new for V2 and not present in V1. And that remove should mean remove fields that are present in V1 but that are not going to be present in V2.
I edited my examples so that it compiles. So now the operations are the ones that need to be applied to v2 to get v1 (remove in this case). That should make more sense. I think the ppx is on opam, maybe try out a couple of examples to get a better intuition, I’m mostly working from memory so I might not be a good source for the exact convention
Really appreciate the effort you put in putting this all together. The tool looks impressive, but I think it is for a much more advanced use-case than mine? The example you gave have more code than the average “conversion” situation that I usually face. But it is good to know that such a tool exists in case I have to deal with more complicated situations.
well, it’s fair to say I wrote it to automate things like converting between versions of the OCaml AST (it comes with conversions between all the 4.X AST versions) but I use it routinely in many projects for both simple conversions, and also for map/iter over complex AST types.
For flat record-types it’s probably overkill. That said, I think the approach – which is to use an attribute of the typedecl that carries info about which fields to remove, which to add, and how to compute the values of the fields-to-add, based on the old record’s fields and an auxiliary argument, is the right way to go. It’s very simple to write such attributes down.
I don’t know how hard it would be to write a PPX rewriter for this using the standard tooling, but I think that’s your way forward.
I was thinking about your use-case, and it seems like you could do something really straightforward, based on the extension syntax:
[%migrate: ty1 -> ty2]
where each of ty1 and ty2 is a record type. And of course, you could use ppx_import to pull in those types. Then maybe a little bit attributes to provide indications for how to deal with new fields in ty2. Should be straightforward.