Idea: Standard OCaml runtime type representation

Idea: OCaml should provide a “upstream integrated and mandated runtime type representation”

In the thread My Thoughts on OCaml vs Haskell/Rust in 2023 - #12 by dbuenzli @dbuenzli wrote

I said it more than once but I think this is a red herring. The pain point is a lack of upstream integrated and mandated runtime type representation.

Even without a built-in deriving mecanism that would improve the eco-system by orders of magnitude, allowing users to define a single value M.repr for their M.t types which can then be used generically by a diversity of libraries of typed indexed combinators (formatters, comparison, equality, ui editors, random value generators, codecs with your pet serialization format, etc.).

I think it is a good idea. I think it can be helpful for debugging in ocamldebug also.

Here are some libraries that provide runtime representations that were mentioned here and here:

Clearly there is a need for this in the OCaml ecosystem. There are at least 4 (!) libraries people wrote for our small ecosystem!

Each library unfortunately will have advantages and disadvantages. It would be very useful if OCaml unified around a standard.

The simplest approach would be to “choose” one of the libraries from above see how we can “standardize” the functionality it provided into OCaml.

12 Likes

There are also

Incidentally, at LexiFi we have had a fork of OCaml for over 20 years that tightly integrates a runtime representation of types with the typechecker, and the result is a very powerful combo (it is the core technology behind most of our development).

I agree that having a common representation of types upstream would be great. But as already remarked by @dbuenzli, the difficult part is to reach a consensus on the design (especially as some of the questions may not have clear answers). In any case, we already have a module for it: https://github.com/ocaml/ocaml/blob/trunk/stdlib/type.mli :slight_smile:

Anyway, for those that want to learn more about the backgound:

Cheers,
Nicolas

5 Likes

As many might agree, compromise is an integral aspect of language design and advancement. In order to lock in many gains we will get from a mandated OCaml type representation we need to forgo the need to get everything perfect in advance and 100% agreement with everybody.

Here we can have a standard library model: If you don’t like the OCaml standard library then you can provide your own (e.g. janestreet core). If you don’t like a future OCaml mandated type representation you can bring your own.

1 Like

There is also a thread with a discussion of how to encode dynamic types from scratch Types as first class citizens in OCaml It has been extremely useful for me as I’m trying to design a dataframe library for OCaml to make it more usable for data science and ML use cases.

3 Likes

I wrote another one (and decided recently to rewrite it). I have opinions about what I think would be a useful addition for type witnesses in the forthcoming Type module in OCaml 5.1, but I’m not ready to defend the details yet. I do think type witnesses are useful, and we should probably have them in the standard library.

Has anyone done a comparative analysis of these libraries?

3 Likes

I will likely do that shortly as I want this for Dream.

5 Likes

Maybe it’s also worth checking dynamic types in SML# 13 SML# feature: dynamic types and typed manipulation of JSON ‣ Part II Tutorials ‣ SML# Document Version 4.0.0 - SML# Project

4 Likes

Library comparison results here.

3 Likes

Thanks! I will comment here from the perspective of a general-purpose runtime type representation. We need a runtime representation that allows inspecting and traversing the runtime structure and writing transformers on top of that. I agree with sentiments expressed here that that would alleviate the need for almost all deriving PPXs. LexiFi’s lrt seems the most sophisticated and it can even auto-derive the representation, but I was unable to get it to compile (ppxlib version mismatch). Filed an issue in the repo.

I then tried out Dyn (thanks @anmonteiro for the pointer) instead as it is simple and likely to be already installed in a lot of switches. There is value in its simplicity. If we had a PPX that derived a Dyn.t codec, we could fairly easily build a lot of interesting things on top of it. Here’s a simple example:

(* #require "dyn";; *)

(* Given any runtime representation, we can derive a pretty-printer for it. *)
let rec dyn_pp ppf = function
  | Dyn.String s ->
    Format.pp_print_string ppf s
  | Record fields ->
    Format.pp_print_string ppf "{\n";
    List.iter (fun (k, v) -> Format.fprintf ppf "%s = %a;\n" k dyn_pp v) fields;
    Format.pp_print_string ppf "}"
  | _ ->
    failwith "TBD"

module Person = struct
  (* We have a type *)
  type t = { name : string; email : string }

  (* Get a dyn representation of its values *)
  let dyn { name; email } =
    Dyn.(record ["name", string name; "email", string email])

  let undyn = function
    | Dyn.Record fields ->
      begin match List.assoc "name" fields, List.assoc "email" fields with
      | String name, String email ->
        { name; email }
      | _ -> invalid_arg "Person.undyn"
      end
    | _ ->
      invalid_arg "Person.undyn"
end

let () = Format.printf "%a\n" dyn_pp (Person.dyn { name = "a"; email = "a@b" })

A pretty-printer is just one use case. Obviously we can imagine writing derivers for all sorts of codecs and other useful tooling. Just to note again here that this doesn’t address antron’s needs for Dream, that is better discussed in the new thread.

Note that this is not a runtime type representation. It’s a uniform representation for values that you inject your values into or project from.

For example a runtime type representation allows you to deconstruct a value generically without having to convert it to another value like here. By deconstructing the runtime type representation in parallel with the value.

3 Likes

Thanks for the clarification. If I’m not mistaken, either representation can be used to create things like roundtripping codecs to various formats?

Sure but the runtime costs are likely to be different, with Dyn you need to convert to/from an intermediate universal representation before doing anything. Effectively you are making a shallow copy of your value.

A runtime type may also allocate when you unfold it, e.g. on recursion steps for recursive types but likely significantly less and does not for example on non-recursive types like Person.t.

1 Like

I didn’t know about that one. I didn’t fully grok some of the structural bits there but it looks quite interesting. It would have been nice to have examples of generic functions written with it though.

However one thing I feel is missing from that one is the ability to attach heterogeneous dictionaries to nodes of the type representation. Either on each node directly or as a special node tagging another one with the dictionary (the latter being likely much more inconvenient for processors and users).

I think this is quite important to have that as it allows libraries acting on the type representation to declares typed keys that users can use to enrich their representation to guide the behaviour of the generic functions provided by the libraries.

For example an UI or HTML form library would declare a key for storing human readable labels (or keys to them for i18n), a serialization library would provide a key to tag data that should be ignored for serialization (and a default value to use on deserialization in but in this case I don’t think that could be typed), etc. The hypothetic Stdlib.Type.Repr module could even define a few standard keys.

LexiFi seems to have that albeit in the typical stringly fashion of unityped languages. That’s what happens when you work outside the rich ML language :–)

3 Likes

Experience report: I also want to appreciate GitHub - thierry-martinez/refl: OCaml PPX deriver for reflection – I found it worked well for some complex types that I was trying to derive a show for. I had problems with repr and ppx_deriving deriving show (its possible I was doing something wrong in my usage of those libraries).

1 Like

@let-def shared a while ago a fork of the compiler with syntactic representation for most OCaml values: GitHub - let-def/ocaml at tagl

(There are some limitations). It works on 64-bit platform by repurposing some bits of value headers to store enough metadata that you can give a meaningful representation with high certainty.

Useful enough to implement a polymorphic printf / debugger, not for safely serializating/deserializing data.

The nice thing is that the changes to the compiler are very close to the one needed by bucklescript to pass the same information to javascript runtime. I wonder if it could benefits JSOO too.

3 Likes

Hi @gasche @octachron ! You’re some of the active OCaml compiler contributors – I’d request you to have a look at the above thread in case you’ve not already.

The large number of projects trying to provide OCaml with a runtime type representation and the discussion above seems to indicate this is a real need.

Does the compiler team have a view on this? Could OCaml provide a mandated runtime type representation? A mandated runtime type representation cannot and would not ever be perfect but it would help the OCaml ecosystem move forward in many areas.

I would suggest possibly just picking one of the above mentioned projects, improving it in ways that are necessary and adopting it into the OCaml default distribution. It would save a lot of effort. What do you think?

Hot take: typerepr mechanism blessed by the compiler should be completely opt-in and not mandated.

We could instead make it possible to “crystallize” the type info of a type expression at compile time, which would suffice most use cases imo.

The use of the word “mandated” can be a source of friction. Indeed things should be as opt in as possible as far as runtime type representation goes.

The compiler of course, can provide a lot of power to this runtime representation and do things that are difficult for libraries to do. All this would depend on the final design if the compiler team ever wanted to have this in OCaml core.

If you like typed representation, you should pick one of the libraries that you referenced and start using it.

Personally, I have no intents to make that decision for you.

The existence of multiple libraries for typed representation is more a sign that those libraries are straightforward to write(I have written one for instance) for small subset of the OCaml type system, but impossible to design to cover well all features for OCaml.

Thus, it doesn’t seem like a good idea to freeze the exploration of typed representations outside the compiler at this stage.

There are nothing stopping people from starting to use one of the libraries above right now. I don’t see how imposing a choice would help anyone.

That sounds like you would be interested on working on such project, what are you waiting for?

Personally, I am unconvinced that typed representation are a panacea. There are useful in very precise use cases (serialisation/deserialisation for instance). But even after reading the back-and-forth in the current discussion, they don’t seem really useful in general. Note for instance, that as far as I understand the use of typed representation à la Lexify is infectious and one end up carrying typed representation as argument to many(most?) functions.

Moreover, this leads us to the other elephant in the room: typed representations break abstractions.

Consequently, I am personally opposed to the idea of integrating of typed representation in the compiler at this time.

3 Likes