Idea: Standard OCaml runtime type representation

One of the reasons pointing at the examples of Rust and Haskell doesn’t sway me is that both of those languages chose to make canonical coherence a property of their ad-hoc polymorphism mechanics, and I think that’s a detraction not a feature. We know that canonical coherence is not ever going to be a feature of any kind of OCaml ad-hoc polymorphism, e.g. modular implicits, because it’s just not theoretically possible in the face of the OCaml module sublanguage.

Like @dbuenzli, I’m interested in seeing a good proposal to expand the standard library support for type representation. Having written one of my own (and having encountered reality with it to some less than optimal results), I’m wary about efforts to translate what works in Rust and Haskell to OCaml. (I’m working now on a refactoring of the type representation system in Orsetto, and if I learn anything useful in the process, then I’ll come forward with what I find.)

1 Like

I don’t think any usage of ad-hoc polymorphism is implied in the discussions above. This is just plain and simple about providing a runtime type representation for OCaml types. (And not going via the Modular Implicits route because that does not exist).

The reference to Rust is to illustrate how simple it should be to debug print a type, provide equality comparisons etc. OCaml should provide the same ability but through a underlying different mechanism, and that is using a generic runtime type representation to build a pretty printer etc. This is what OCaml libraries like refl do today. This discussion to me is mostly about asking OCaml people to subsume some of this functionality into the default distribution rather having to use ppx-es and additional libraries.

2 Likes

I wish people would stop quoting that cartoon. Let people create in peace.

I’m not sure whether you had a close look at these libraries, but some are of a daunting complexity, others a largely undocumented, some are completely ad-hoc, I honestly do not understand everything that some of these do and dislike some of their ergonomics. So if someone is going to propose one of these upstream it’s not going to be me.

Now I have written enough type indexed APIs and used them manually on my types longing for a good simple typerep. I also have been using constrained forms of type representation to map rectangular data to arbitrary sequences of OCaml types and back here and there and would be interested in seeing how a more general type rep could fit in that story.

I’m just going to try to first have something that suits me for using with my type indexed combinators and see how it fares in practice in real applications. Whether that ends up as a proposal upstream at some point is another question, but a totally premature one, I do not have enough experience with these to confidently advocate for inclusion upstream.

4 Likes

You’ve created some good libraries and I hope you’re able to create something simple and fresh in this space !

My comments were mainly to see if something pre-existing could be leveraged. If your creative juices flow from doing something from scratch then you definitely must do it :slight_smile: !

There are a lot of smart OCaml folks out there. The libraries (e.g. Refl) look frighteningly scary and they do often work impressively well. The issue is that anything truly generic does often get complex because it needs to deal with so many scenarios. So I put it down to the complexity of the domain rather than any deficiency in the library author’s approach.

It would not be so simple in Rust (or Haskell) if traits (or type classes) did not have the coherence property. That’s my point.

1 Like

That’s not necessarily the case. Coherence has no impact on whether we can derive a runtime type from a static type definition using a standardized or built-in PPX. Also, if we leave auto-derivation aside for a moment, we can obviously hand-write the runtime types as all the libraries mentioned above show.

Of course, but I still contend it’s not that simple. When you can automatically derive more than one, and the scope of each is different, then you can’t have the simple interface that Rust and Haskell provide. Moreover, in the absence of coherence, the applicability of a unified standard type representation is less clear. Maybe you need more than one type representation according to the scopes where they are to be used. That’s why I’m wary of proposals like this. It’s easy to get it wrong.

1 Like

There are always tradeoffs. Every new feature in any language comes with additional ways of going wrong. But given that OCaml is quite explicit, the scope for going wrong doesn’t seem too much to me.

How do we do this today? We write down a type definition and use some ppx plugin.

type t = A of int | B of int [@@deriving show]

Now we have some additional functions autogenerated in our module called pp, show etc. The functions are generated within the module that type t is in. It does not pollute the global environment.

Lets take a more complicated example. This time using the refl library to generate a runtime type representation that you can use to generate a debug printer among other things.

(Example reproduced from https://github.com/thierry-martinez/refl/blob/master/README.md – see that document for more details)

type 'a binary_tree =
  | Leaf
  | Node of { left : 'a binary_tree; label : 'a; right : 'a binary_tree }
        [@@deriving refl]

Now we can obtain the runtime type representation for string binary_tree by using the ppx form [%refl: ... ]

# Refl.show [%refl: string binary_tree] []
    (Node { left = Leaf; label = "root"; right = Leaf });;
- : string = "Node { left = Leaf; label = \"root\"; right = Leaf }"

(Note that we need to explictly specify what type appears for 'a. Here it is string).

Now to show something, we need to also explictly pass the runtime type representation to the Refl.show function (here that representation can be obtained by using the ppx form [%refl: string binary_tree]) .

Note that there was no inference of any kind about obtaining the runtime type representation from the ambient environment anywhere

The discussion of this thread has been mainly about moving some of the tedium and
custom nature of using ppxes, various opam libraries etc. to incorporate this functionality into the standard Ocaml distribution. This is so that newbies (and others) could just be able do this without having to learn ppxes, choose from a menagerie of libraries, have to change the dune config to use the ppx etc. in order to simply debug print something.

This would continue to be explicit in the future too, if what we have been discussing on this thread is ever realized.

Here refl built a runtime representation for me because I used their ppx. I could have used the combinators provided by refl and built a runtime representation myself too if I didn’t use refl’s ppx. (Of course, this manually build representation using refl’s combinators would need to be consistent with the structure of the type to be useful within refl).

In the future, you could have a representation auto generated by the Ocaml stdlib (the equivalent situation in which refl’s ppx did that for me above). This auto generated representation should be OK for most users. But then you could have other alternate representations build by different tools and ppxes perhaps with some additional metadata. Everytime you would call a method that required a runtime representation you would need to pass it explictly with the value and it would not be inferred for you.

So in that respect it is different from typeclasses in Rust/Haskell.

Also, I don’t think anybody is claiming that there would be a unified single representation – you could have multiple representations for different use cases.

Everything remains explicit. The scope for messing up is not very much now with ppxes and neither will that scope be increased if this functionality were to be provided in the OCaml standard distribution in the future in some way.

1 Like

It would be by the compiler not the stdlib. But the more I look at this the less I’m convinced by that.

The design space and usability trade-offs for producing and consuming representations looks rather large. I’m not sure trying to find the sweet spot and then build it into the compiler itself is a good idea (having a sweet spot, usable and good enough representation in the stdlib for interop in the eco-system is another question).

It looks to me that a better idea would be for the compiler to rather provide good magic primitives (think __LOC__ and co) that effectively provides some kind of type safe/type level Obj interface. Mainly this all seem to revolve about being able to easily materialize constructors and projectors from product type definitions (records, tuples, variant cases). But I expect even more push back from that idea :grin:.

1 Like

Yes, some support for this feature (that should be in stdlib) will be required from the compiler (in the way I am envisioning this feature) because now the ppx mechanism will not be available. I have mentioned this here.

The design space could be large but but it doesn’t matter so much as long as your representation is sufficiently generic. The Haskell example is a good one that is sufficiently generic.

The case for being able to generate multiple representations is also overrated in my view. It seems to me that refl, repr, lrt ppxes are producing only one representation via their ppx interface. Why are libraries that are mature and battle tested happy producing one representation only? It would seem to me, arguing from a software evolution point of view, that by now if multiple representations were so important we would have them!

My theory is this: multiple representations are no doubt cool and offer some compelling use cases. But most of the time, we just need a single generic-enough representation. Haskell (that we have been describing above) AFAIK generates a single representation.

In any case, the way I envision this feature, multiple representations are definitely possible. The stdlib (with help of the compiler) will generate one for you (this is the refl-ppx analog case). You can use other approaches via a ppx, external tools etc. to generate any alternate ones with metadata, should you want it.

Some summary points from me:

(1) If you’re thinking of providing an external tool (as I think you are @dbuenzli ) to generate representations that does not sound too exciting to me at least. I might as well stick to repr, refl with their ppx workflow. Ppxes though fiddly are used by so many other tools and integrate well by now into OCaml. I don’t want a custom tool ideally. See here for a comment by @c-cube about external tools which I agree with.

(2) If it is not designed from the very beginning with buy-in from the Compiler team to have this in some shape or form in the OCaml distribution there is no point going through this exercise. Some of these existing libraries with a ppx approach are quite satisfactory already. The whole point of this to me is to allow OCaml user to have things like debug printing of a type out-of-the-box without ppxes and external tools.

I will watch this space in OCaml in the coming years because progress likely will not be fast. In the meanwhile I will keep adding #[derive(Debug)] or #[derive(Eq)] in my Rust code and continue to (internally) groan how tedious it is to achieve something similar in OCaml :slight_smile:

I would still like to understand the tradeoffs of a runtime type representation vs the current ppx approach. It seems to me like the ppx approach digs into the type representation (i.e. fairly unstable API) of the compiler, whereas a runtime type representation would be more declarative and stable (potentially). However, there also seem to be costs involved with the runtime representation approach, not least of which is the performance cost of perusing the representation in real-time. This isn’t a big deal for show, which is slow anyway, but as a method for equality and comparison it could be problematic.

But even in Rust or Haskell, you have to type derive something. So if the “community” reached a large-enough consensus on (possibly a variant of) an already-existing lib+ppx proposition and organized its maintenance, it seems to me it would be a good start.

Good point. One could provide more stability guarantees if the runtime type representation is being built by the compiler. Ppxes need to peruse the surface structure of a type to build a representation. Compiler could use deeper knowledge of the type they have potentially. The disadvantage here is that some people would complain that representations not built by the compiler would be second class citizens as they would not have access to any “special” information the compiler decided to use. I don’t know what we should do here.

(Note: Both the ppx and the compiler, if it had such a feature, would need to construct a runtime type representation. You can’t escape from that)

I am in favor of this. It is indeed a good start. This is “dune” approach. dune is not the only build system that is possible in OCaml. Over the years it has been able to build momentum and is now the de-facto standard (rather than a de-jure standard like cabal might be in the Haskell world).

A Chapter in Real World OCaml that uses this “consensus” library would be helpful. Currently many ppxes are used in that book e.g. [@@deriving sexp] but I don’t think there is usage of any runtime type representation library/ppx like refl, repr etc.

However in general this is still unsatisfying because it relegates to shared knowledge and convention what should be a core ability of OCaml.

Of course if you’re really worried about efficiency you would write proper OCaml code today without any of these ppx-es and runtime types etc. and define Eq, Comparisons for the types in question yourself. Autogenerated code via ppxes for this use case is always going to be slower.

Obviously not. Ppxes generated code is as efficient as hand crafted code. What you get with [@@deriving eq] is something that you’ll get with let equal a b = Fun.gereric.equal typ a b after the pass of a very, very, very, very aggressive inliner (thing that doesn’t exist nowadays, and which is unlikely to exist some days).

1 Like

I think it would be very good to bless certain ppxs as standardized and expect them to be automatically included with any OCaml compiler distribution. I don’t necessarily want to increase the burden on the compiler team, but nowadays I can’t imagine writing code without certain ppxs such as show, equal, and compare. dune could include these ppxs by default.

1 Like

Still disagree. Ppxes have no domain specific knowledge of your data structure. They might compare fields in the wrong order or may not know that you can establish non-equality/equality/some other property in a very specific way – this could be due to some complex data structure invariant that is not expressed in the type system or any other number of reasons.

If your interest is only debug printing, then I’d rather suggest to invest time providing better source level debugger support in OCaml. I think we are definitively having different perspective here. Runtime type representation are useful in general for interfacing your types with other systems.

Sweeping the representation aspect under the carpet as being simply just take the most generic one is not a good idea. You want to capture an appropriate amount of structure. E.g. you could capture lists as a generic sum or expose it as a special case in the representation. The day you interface with a system that has a built-in notion of lists, that matters.

A good design balancing act to has to be made here between expressiveness and convenience to enable working programmers to devise their own typed indexed functions (who understands the 691 lines of undocumented types of refl ? is that really the level of details you want to expose to programmers ? who has experience working with that representation ?).

Regarding using ppx to generate code equivalent to type indexed functions, of course you can do anything with such a crude tool. But it’s non compositional, problematic with code you don’t own, brittle and high maintenance. Devising your own also requires quite some expertise and I don’t think that it should be the case.

A type representation that lives in the language is more approachable to use by programmers, good enough (versus generated code) for many tasks and quite flexible (e.g. you can describe stuff you don’t own).

It should be stressed again that whether the representation should be generated and how (ppx, compiler, tool) is an entirely different and orthogonal question. It seems people keep on wanting to conflate the two aspects.

3 Likes

I keep using Debug printing as an example but my interest is also other fundamental operations like Equality, Comparison etc. However, my use cases are indeed simpler than what you wish to plan for.

What you have stated is true – whether or not the representation should be generated is separate from the concept of how it is generated (ppx, compiler, tool). But merely indicating a design point which is:

  • I want generation of the runtime representation
  • Compiler can generate it for you. This will be the situation in most cases for simple Debug printing, non-complex Equality of data structures etc… The representation is best effort and may not capture every nuance and does not have metadata
  • Others (ppx, external tools) can generate alternate representations (optionally) for more complex situations. You can attach metadata to guide tools that wish to consume this representation.

I have repeated the above a few times. Voicing this design point doesn’t mean the person does not understand other options and possibilities.

In fact, one often reads the line in such debates as “Let us not conflate X with Y” strikes me as implying that the other party does not understand all the axes. Maybe the party does and is voicing an approach which is most meaningful to them?

I think I am going to bow out now. Its been an interesting debate nevertheless. As you correctly pointed out, in order to make a difference one will need to produce something other than words on a forum. I am rooting for you to hopefully find some time to contribute in this area since you indicated you might be interested. I unfortunately cannot because I’m working on something else outside OCaml at the moment. Have a great Sunday !

1 Like

Standardizing on one PPX

Perhaps I can move this thread forward a slight bit. I’m planning to provide refl in the global environment for DkML users. That means for beginners who use the DkML Windows installer (and the DkML macOS installer when I get around to packaging it), they don’t need to install refl. Immediately after installing they can type utop and #require "refl";;.

I will also be privileging a few other libraries: base for Real World OCaml readers, SDL for graphics, and many of @dbuenzli’s libraries.

From a high level, I’m shifting the conversation from adoption into the standard compiler and/or standard library to adoption into a distribution compiler and/or distribution library. Less risk! And the DkML distribution has been a good vehicle over the past two years to get a (huge) amount of patches upstreamed to various places.

PS. I am not the biggest fan of PPX-es. But it is what we have today!

Compiler Patches

For patching the compiler (ex. LexiFi style patches), as long as the change is source-compatible with standard OCaml compiler and has some community vetting in this forum and a sniff test from OCaml compiler devs, I’ll be happy to accept the patches.

Lifecycle

The changes might sit in a distribution for a couple years, but it will unblock beginners immediately. And, if the changes are popular, that will be a good signal for adoption upstream.

Thoughts? (I won’t be responding until tomorrow)

5 Likes

This is a really good idea. DiskuvML’s OCaml distribution is exactly the kind of initiative that will move the ecosystem forward. It’s clear from responses here that the core team doesn’t believe in a batteries-included approach like Python or Go. But anyone else is free to include all the batteries they want.

1 Like