Idea: Standard OCaml runtime type representation

To a certain extent, the dynamic type is more flexible because you expose it once, and it can be used in any number of way by someone else in some other piece of code. Conversely, a ppx deriving statement is normally issued at the definition point.

The same distinction can also be observed between most OCaml derivers (ppx_deriving_yojson, ppx_deriving_sexp, etc.) which are specific to a single format; and, say, serde in rust which is derived once and then used with any number of serialization formats. That means that in practice, you can use a library that has serde derivers, and use it with an obscure serialization format.

4 Likes

If I understand correctly what you (@yawaramin, @dbuenzli and @c-cube) said, we just need an 'a typ and then we will have structurally polymorphic function such as:

to_json : 'a typ -> 'a -> Json.t
to_protobuf : 'a typ -> 'a -> Protobuf.t
to_string : 'a typ -> 'a -> string
print : 'a typ -> 'a -> unit
eq : 'a typ -> 'a -> 'a -> bool

Is that right ?

My second point was that if you uncurryfy the previous types and pack the product argument in an existential, you will have something like:

to_json : dyn -> Json.t
to_protobuf : dyn -> Protobuf.t
to_string : dyn -> string
(* etc *)

I believe it was the fear of @octachron, and basically that’s what does Eio with its use of the object system (an interface in Golang is just an object, as in OOP, that is automatically generated by the compiler, as with the implicit argument in modular implicit).

There’s dariusf/ppx_debug: Tools for record-and-replay debugging (github.com).
It does parse source files but relies on compile-time type information.

I’m not sure why that’d be a fear. That sounds extremely useful, sign me
in! I’m fairly confident that a lot of people would also be relieved
if serialization/printing/typed SQL queries/… became that easy.

2 Likes

A bunch of PPXs would certainly become obsolete and be replaced by simple OCaml functions.

I understand @octachron and @gasche’s reluctance, even though pushing the implementation of dyntypes to the compiler/stdlib would be the surest way of getting past the collective action problem of getting everyone to agree on a common representation for dyntypes.
But here’s the crucial thing: We reap the benefits of dyntypes as long as everyone agrees on a common representation, regardless of whether or not they are “built-in” into stdlib. Pushing dyntypes into the stdlib solves the collective action problem but it’s not a technical requirement.
Would it be too idealistic to presume that the “community” can at least attempt to get past the collective action hurdle by picking one of the existing dyntype libraries and coalescing around it?
(I wrote “community” because that’s a nebulous concept and I don’t expect everyone to jump on board; but maybe if enough people start that ball rolling then momentum can eventually build.)

1 Like

It depends on the interface chosen for the generic functions. I would say that this is ok:

val to_string : 'a typ -> 'a -> string

whereas this is awkward in OCaml (a taste of Python):

val to_string : dyn -> string

Why should I pack my value to convert it to a string ? In such a form I can’t use it anymore with the specific method of its type.

By the way, its clearly not obvious how to design such a 'a typ. As stated by @yallop in the article cited by @gasche, this type must capture the shape of all possible data type that we can define in OCaml. Thinking out loud, there was previous works that used combinators: they can serve as inspiration both for the 'a typ data type and for its module interface. It can be an AST of the DSL of these combinators and each generic function will have to interpret this AST according to its need.

That’s of course not the idea.

The idea is that the type representation allows you to quickly derive implementations of well-known functions signatures. At a certain point you may still want to hand-code them for a variety of reasons (e.g. performance, or genericity not doing the right thing™).

So the idea is not to have say:

module M : sig 
  type t 
  val typ : t typ
end

But rather:

module M : sig 
  type t 
  val equal : t -> t -> bool 
  val pp : Format.formatter -> t -> unit
  val typ : t typ
end

The relief here is simply that, at least initially, you can quickly get to something with:

module M = struct
  type t = … 
  let typ = …
  let equal v0 v1 = Fun.Generic.equal typ v0 v1
  let pp ppf v = Fun.Generic.pp typ ppf v
end

Rather than repeating the same structural boilerplate in equal and pp.

6 Likes

It is not possible to have a one true representation but it’s good to choose a reasonable one.

There is a wealth of work for instance in Haskell about how one might represent the runtime structure of a type. Here is a useful link GHC.Generics - HaskellWiki . See the section “Representation Types”. There are some combinators i.e. constructors defined that can be used to manually construct a runtime representation of a type and there is a Generic deriving mechanism that does that for you (using these constructors just mentioned).

There is also a paper A Generic Deriving Mechanism for Haskell mentioned in the Wiki which gets into more detail. This is available in Haskell today and is is not a dead end research project. It is provided as a language extension.

See A generic deriving mechanism for Haskell | Proceedings of the third ACM Haskell symposium on Haskell to get this directly.

Also see 6.19.3. Generic programming — Glasgow Haskell Compiler 9.8.1 User's Guide for more upto date information.

I haven’t looked at any of the OCaml library implementations but I suspect they use some similar approaches.

So, in the OCaml stdlib we can decide to:
(a) simply provide the combinators and leave the full construction of the runtime type representation to the end user or ppx
or (b) Also build functionality in the compiler/standard library that does the automatic derivation of the runtime type using the combinators defined in (a).

Mapping this onto:

If I had only option (a) available, let typ = ... would need to be constructed manually by me by using the combinators mentioned above. For any sufficiently complex types it quickly becomes totally unmanageable to do manually. ppx-es could do that for me using these combinators defined in Fun.Generic but I don’t know how useful that would be because the objective of this thread is to lower the barrier to doing things like debug printing a type for even OCaml newbies, without ppxes.

As an end user, I would like option (b) for the Compiler/Standard Library code to be able to simply build the runtime type representation for me by using these combinators. I could always do it myself by using the combinators or ignoring everything altogether by providing my own definitions in module M above or using one of the existing runtime type libraries that we already have today.


In closing, a related question is the following: How come Haskell, which is a more principled language in (some) respects than OCaml is able to achieve consensus on these issues that seem far more controversial in OCaml-land?

I’m not talking about a complex invasive high end type theory related feature but a simple opt-in functionality that allows me to pretty print my type without having to choose from a menagerie of libraries and fiddly ppxes?

Rather than this go into an endless discussion that gets repeated every decade or so, I want to persist and try to bring the issue to a conclusion (at the risk of a sharp retort from one of the compiler contributors :slight_smile: ).

Would the compiler team be interested in proving this feature? @gasche @octachron ? If it is Dead-On-Arrival, (as it seems to be so far) then lets say it is again and then move onto other things. Also let us assume that we’re not doing this via Modular Implicits because that is many years into the future.

P.S. I don’t have enough OCaml expertise to be able to suggest what exact design approach this feature would assume, just that as a user I need this feature. Even if it means adding some special compiler sauce to do the automatic derivation, I’m for it (We have marshalling!). Why wait endlessly for a generic mechanism that does this? OCaml is full of inconsistencies and some ugly things due to historical evolution (already); why not build something that is totally opt-in and non-invasive to provide this feature? It will be a big win for users ! Any “non-principledness” added to OCaml by this feature will be far outweighed by the usefulness of it !

1 Like

You can persist as much as you want on this forum. That’s not the way things are going to move on. In general if you want something in the stdlib you need a concrete proposal that is a little bit solid and convincing.

I have been looking a bit in the stuff again and may propose something at some point. But even before considering upstreaming it, it would definitively need to first be battle tested outside the stdlib and convince myself that’s what we really want in the stdib.

Regarding the automatic derivation. I’m personally not interested in it since the representation you want is contextual. You may want private and public representations, you may want to have a different representation for equality and your printer, you may want to have multiple versioned representations for abstract types to support schema evolution of stuff you stored away in a given representation, etc. There is also the question on how to attach metadata to the representation.

I don’t think automatic derivation is essential you could aswell have an external tool using compiler-libs to spit out a default representation that you then cut and paste in your code and tweak to your wish.

2 Likes

Great – I look forward to that – I hope you are able to make forward progress on that.

Which is why I have been advocating choosing something that is already available and hence “battle tested”. Why not improve refl, repr or the other libraries to what you want to achieve? Lets say you build something, how long does it take to get built, battle tested, accepted? Why not skip the building and “battle testing” stages by going with something that exists already.

It’s all too easy to fall into the NIH (Not invented here syndrome) trap. But it’s possible that you do have ideas that are truly different from what exists already. As you have an impressive track record of contributions to OCaml, I think there is definitely hope for something good to happen !

A point here. Haskell’s generic runtime type representation is probably general enough that you could do all this too. The point is to build a general enough generic runtime representation of the type. How you interpret it to deal with the printing case, equality case, etc. that you mention is upto the support code that consumes these runtime type data structures.

I think the emphasis, according to me, should be to just build a general, detailed representation. Consumers of the representation could ignore parts of the data structure that are not relevant to their use case in printing, equality, serialization etc.

Of course, you could always get more fancy and have more sophisticated scenarios that require more power and flexibility. I personally will be just happy with something simple (and even limited) out of the box.

That’s precisely why you want metadata and/or multiple representations. A given generic function can’t guess, for example, what part of the representation it should ignore, but it could either consult metadata bits to guide its behaviour, or you could feed it with a different representation.

This could be cheaply achieved by simply providing a metadata field in the various constructors provided. Continuing with the Haskell analogy, See 6.19.3. Generic programming — Glasgow Haskell Compiler 9.8.1 User's Guide .

(Pasted here for convenience)

-- | Unit: used for constructors without arguments
data U1 p = U1

-- | Constants, additional parameters and recursion of kind Type
newtype K1 i c p = K1 { unK1 :: c }

-- | Meta-information (constructor names, etc.)
newtype M1 i c f p = M1 { unM1 :: f p }

-- | Sums: encode choice between constructors
infixr 5 :+:
data (:+:) f g p = L1 (f p) | R1 (g p)

-- | Products: encode multiple arguments to constructors
infixr 6 :*:
data (:*:) f g p = f p :*: g p

You could just add a metadata record field in all the constructors for U1, K1 etc. in the OCaml equivalent.

So here is my vision:

  • Let the compiler/stdlib build a default generic representation for you, sans any special metadata. In the backend, the compiler/stdlib is simply using combinators similar to the Haskell ones. This will allow you to deal with simple cases like Debug printing
  • Should you as the user find the automatically derived representation provided by stdlib/compiler lacking in some way or are unhappy with only having one representation, you have the option to always build your own representation with any metadata you wish to add by using the same constructors (combinators) we have been discussing. This representation could be built manually, via a ppx, or through some tool.

Here BTW, a separate tool feels worse than ppxes – I personally would then just use ppxes then rather than ad-hoc tool then.

In any case, the objective (for me) is to not have any external tools, and new steps and just be able to do out-of-the box in OCaml. That would help newbies also.I feel that we need something that is simple and built in.

In any case, we can already do complicated things with refl etc. In fact you could probably just deal with the scenarios you suggest without using the ppx side of those libraries and take full control of building out the runtime representation with the combinators provided in those libraries! Why do you wish to introduce another library that will suffer from xkcd: Standards ?

Of course, you do point out that you eventually want OCaml to adopt this. But the path you wish to embark on is long because you’re advocating building, battle testing, getting it accepted etc.

I personally am out of steam. Since you seem willing to “show the code” and/or “show the technical proposal” (at some point), I will defer to your best judgement !

Why? A library that integrates with compiler-libs to generate boilerplate code for serialisation would work with Merlin, with LSP-server, with Dune promotion rules, and with other build systems.

1 Like

This is what I would love:
(a) A library that integrates with compiler-libs and comes with the OCaml default distribution. It provides combinators that allow you to build runtime type representations (ala Haskell) with any metadata you want attach to those combinators.

(b) The very same library allows you to automatically derive a default runtime type representation using the combinators mentioned in (a). (In this case the metadata would be basically blank). This one should be sufficient for most purposes like debug printing. This is where I differ with @dbuenzli who does not want automatic derivation. Rust has automatic derivation for #[Debug], #[Eq] etc. but you can always override it!

(c) If anybody wants to build more complex, custom runtime representations with metadata by feeding in type definitions via pure OCaml or a DSL (like capnp) to this future external tool (which uses this library) to generate a runtime type representation that is OK. If that is what @dbuenzli wants then that’s cool!

But for me the target is the 80% use case which is (b). Most users will just want to use (b). As long as that is provided, I’m happy.

For me, most of the time (especially newbies) just want to generate equality or debug printing for data structures without wrestling with ppxes. That is the what I want OCaml to get. It’s great if you we build a lovely tower of functionality using (c) eventually. The road to (c) is long.

However, (a), (b) can be achieved fast by looking at repr, refl etc. seeing if the libraries are any good and see if their non-ppx related base code can be incorporated into the Ocaml distribution as a core library. Of course we would need to add automatic derivation of runtime type representations without using the ppx infrastructure. That might require some custom compiler code. Since we don’t have typeclasses this code might need to be ad-hoc. To me the “non-generic” nature of this is OK given the huge benefit this gives the average user and the fact that modular implicits in many years into the future and there exist special case things like Marshalling anyways as a precedent.

2 Likes

Just to answer this question - they have much more expert bandwidth than we do. I don’t know what the current status is, but Haskell used to have a whole research center paid for by Microsoft. In OCaml, the we have a lot of average consumers of the language (like me), many beginners, and then a very small group of experts concentrated in the core team + a growing expert group within Jane Street (that mostly focuses on whatever interests Jane Street). Our experts just don’t have the bandwidth to deal with experimenting with a lot of different things, whereas Haskell has masses of dedicated PL experts proposing complex changes to the language, as well as a built-in mechanism for activating various experimental language exensions.

1 Like

To add to this, they do seem to have a lesser emphasis on backward compatibility in GHC, and to put additions in language extensions. Reading the disccussions around breakage on the Haskell discourse makes me personally quite happy with the way the OCaml compiler team handles things.

4 Likes

I didn’t say this is impossible, but not obvious to choose and design such a type. I just skim quickly over the Haskell wiki, it seems they choose something rather natural (that’s the first idea I had in mind) for algebraic data type: polynomial (possibly infinite à la Taylor series) on types.

For instance, the type of list is this infinite polynomial: 1 + A + A² + A³ + A⁴ + ...., or in other words a list is empty or has one element, or two elements, or three elements and so on. A funny point with this representation is that if you do formal derivation on it, you got 1 + 2*A + 3*A² + 4*A³ + ... which is the type of zipper on list. This principle is general: a zipper is a derivative. Consequently, that would also be a possible use case for this generic type representation: automatically derive the zipper type of some data structures.

Just another point: modular implicit won’t solve the problem at hand, you will still need to write the boiler plate code that this generic approach of type representation try to avoid.

Sounds to me like this would have all the bad sides of ppx (depending on
non-stable APIs such as compiler-libs), in addition to committing
generated code to git. Generating the first time is one thing, but if
you modify your type, now you need to remember to update its type
representation as well?

2 Likes