Idea: Standard OCaml runtime type representation

Thanks – appreciate your comprehensive reply. It was direct and provided reasons.

My OCaml skills right now are nowhere near what would be necessary to implement something like this better than existing libraries and try to get this into OCaml. Here, I was hoping to get a discussion going and seeing what could come of it!

I spend a lot my time in Rust nowadays. But I love OCaml and I think it really has gotten pretty amazing over the years!

I still think it is very frustrating for newcomers to not be able to (debug) print a data structure and have to resort to additional libraries and ppxes. This ability should be available out of the box. As an OCaml expert one might often lose sight what what newcomers to the language struggle with.

Adding #[Debug] to a data structure in Rust without having to install a single extra library and then be able to println! it is really very powerful. A mechanism whether through runtime type representation, automatic derivation (via a future modular implicits) or some other approach should be provided !

3 Likes

That’s a red herring. So does Marshal, with the added benefit, that it will segfault your programs if your representation changes and you unmarshal a previous representation (or if a bit flips).

I’m not sure I see what’s infectious. Take Fmt for example. You would have the current type indexed combinators plus one that acts on the type representation.

Whether you want to craft your pretty printer manually with the type indexed combinators or be happy with the generic one that acts on the type representation is up to you. There’s no infection, it will just be useful for your modules to provide a typerep value, rather than say only M.pp which already often does the moral equivalent albeit in a specialized way, as a few libs will be able to derive stuff automatically for you from that.

I’m not sure what to respond to that. We are likely programming in different worlds.

I’m personally absolutely not interested in having a representation in the compiler (understood as a magic operator 'a -> 'a Type.Repr.t).

I’m just interested in eventually having a good representation in the Stdlib that allows me to represent the structure of types and attach typed metadata to this AST (but I’m not in the let’s just take one hurry that @sid is in). I’m perfectly fine with hand-crafted values especially since the magic operator would likely not provide me the appropriate hooks to annotate the AST with interesting typed metadata.

5 Likes

Everyone would like a version of Haskell’s deriving that works well in OCaml, and we have discussed this many times in the past. (You might be curious about Jeremy Yallop’s deriving proposal back in 2007; the work of Grégoire Henry and Pierre Chambart presented in 2013 was an attempt to upstream the Lexifi approach in OCaml.)

The main difficulty is that no one, to my knowledge, has proposed a design that clearly works well in presence of type abstraction in modules, so all existing proposals work well in simple cases and don’t work in a well-understood way when the whole language is considered. The work of Grégoire Henry in 2013 tried its best to solve this problem; at the time the solutions sketched was not felt to be convincing enough to push for finishing and upstreaming the work. As far as I know, other works in this area have not tried to address the problem – they represent fully known types, or have types without a representation, but don’t particularly consider the case of abstract types that are unknown outside the abstraction boundary and known inside.

I think that your ( @sid ) general argument in this thread is that the usability benefits of making a half-baked approach easy to use are larger than the costs of the half-bakedness. I don’t know if that is the case. In general I think that it is rather healthy to have an upstream language evolution process that prefers to be conservative in new features and enforce high standards for growing the language.

Note: I’m not sure what proposal is being made here. Integrating an existing library in the standard library? Supporting automatic derivation for datatype definitions, without requiring a ppx mechanism? Something else? How should the proposals be evaluated, what are the use-cases that people would want to cover? Maybe things would be clearer if people considered articulating the design that they have in mind, for example by writing a RFC or pre-RFC to explain it.

Personally my diffuse, non-expert impression of this general area is that type classes / modular implicits would also help with many of the same needs that type-representations are aimed at (possibly with a better potential in terms of runtime efficiency), and also other uses (they enable convenient programming that is polymorphic over a functor or container data-structure for example). If I wanted to personally get involved in helping progress there, I would try working on modular implicits. (But this is a personal judgement, and this evaluation may not hold for people that don’t have a background in type systems as may be useful to work on modular implicits.)

8 Likes

I think this thread underlines an important issue with modern OCaml,
compared to other languages. Everyone complains at some point about the
difficulty of just printing a value, something trivial in most other
languages (even F# has “%A”, for example). Serialization is also a
task that appears in most programs, and we have no unified way of
tackling it (aside from libraries like
GitHub - leostera/serde.ml: Serialization framework for OCaml and others).

The main difficulty is that no one, to my knowledge, has proposed a
design that clearly works well in presence of type abstraction in
modules, so all existing proposals work well in simple cases and don’t
work in a well-understood way when the whole language is considered. The
work of Grégoire Henry in 2013 tried its best to solve this problem; at
the time the solutions sketched was not felt to be convincing enough to
push for finishing and upstreaming the work. As far as I know, other
works in this area have not tried to address the problem – they
represent fully known types, or have types without a representation, but
don’t particularly consider the case of abstract types that are unknown
outside the abstraction boundary and known inside.

Honestly, these days, OCaml’s answer to typeclass-ish behavior is to use
conventions around names, combined with scoping/local open. It’s true
for let-operators (let* seems to be the classic monadic bind), infix
operators like + or (::), printers (val pp : Format.formatter -> t -> unit), serializers (to_yojson/of_yojson/to_sexp/… etc.), and
so on.

So in this vein, had we a standard dynamic type representation, abstract
types should just expose a value val dyn_ty : t ty or something like
that. It might be what was proposed in 2013, and be imperfect, but has the situation
changed? Are modular implicits going to happen, ever?

OCaml is a language with many pragmatic choices, some of which are
honestly a lot uglier than exposing a dynamic type that could break
abstraction. Let’s not forget the existence of Marshal or “=” (both of
which don’t work with opaque types in general, since these might contain
closures.) The existence of ppx is also reliant on the unstable
Parsetree format, and yet without it these printer/serializer/… problems
would be even greater.

9 Likes

(Also in reply to @gasche)

Some amount of “freezing” helps. Let me explain with an example: Haskell and Golang “freeze” you into a IO system. The advantage is that every library that performs IO can work with another. In Haskell every library that performs IO can work with another library that performs IO. Same with Golang.

In OCaml, we need to ask : Do you use Async, Lwt, blocking calls, now Eio etc? All mutually incompatible with another to a large extent. It splinters an already small ecosystem. By settling on way to perform IO Haskell/Golang bought themselves some future inflexibility but in the interim created a strong foundation where hundreds of libraries could flourish. Similarly some amount of “freezing” around typed representations will help by providing a basic foundation for debugging, serialization etc.

Sounds like Modular implicits is a very tough problem in OCaml, tougher in the face of things like abstract types. Modular implicits is in the future and who knows it may never arrive! Modular implicits could be OCaml’s Nuclear fusion! Always destined to be in the future :slight_smile: ! Runtime typed representations could solve much of the pain to just getting simple debug deriving working without modular implicits.

But this does not imply I’m advocating a half baked approach. refl for instance struck me as a nice library, lets see if it can meet the needs of a 80% of the cases! In fact, @gasche in one of the responses says the OCaml’s compiler follows the 80/20 principle – provides 80% of benefits with 20% of the complexity. Lets extend that to runtime typed representations! Lets deal with 80% of the use cases, provide then to everyone via the Stdlib ! Putting something in Stdlib does not prevent better alternatives that more sophisticated users can use. Vectors in OCaml is another sore point for me – OCaml should provide dynamically sized arrays in its stdlib but does not yet. Hopefully PR can reach some consensus soon !

@dbuenzli has strong feelings on this and would like more primitives to be provided as far as runtime typed representations go rather than one blessed library! I’m OK with that too! Something, anything :slight_smile: !

TL;DR: The lack of being able to simply derive a pretty printer for a data structures is huge gap. Automatic derivation via Modular implicits is the principled way to implement this no one knows when OCaml will get modular implicits. Failing this, we have the runtime type representation route – it locks you in to a specific approach (like Marshalling in OCaml does as correctly mentioned above) but it does not deeply have to affect the language itself. It would be in stdlib – most people would use it, people who didn’t want it could build their own.

4 Likes

There is indeed no propagation if you are using the type representation in a static way that could be seen as a delayed metaprogrammation layer.

Trying to reframe my thoughts: I agree that generating derived modules for debugging/serialization/iteration is useful. However, I disagree that dynamic type representations are a good fit for metaprogrammation and they open the door to many other problems.

Thinking out loud, as anyone ever tried to write a debug library generator that would generate debug functions from installed cmi files à la odig? After all, for non-abstract types, there is no need in general to couple debug functions generation with the initial parsing of source files.

2 Likes

I agree with your general sentiment, but it’s worth noting that things aren’t linear or predictable. The Asnyc-Lwt split (which we all dislike) may allow us to replace it with something far better in the form of Eio. This is similar to the way that before dune, there were many different build systems, and that painful fragmentation made the community eager to adopt a single superior solution once one was available.

It’s true, but notice that there’s going to be a very heavy debate to get this kind of thing into the stdlib. I think the key thing is this: where is the superior solution that massively outperforms ppx? Right now I can derive a pretty printer and serialization for all of my types using ppx, with only a few of them requiring any extra work. Without a serious advantage to a different method, it’s going to be hard to convince the core team to make large changes to the status quo.

3 Likes

Very correct. However, Haskell got its multi-threaded IO runtime system in ~2000. OCaml got Eio + Multicore in 2022. It took 2 decades. In that time period, so many users and companies that could have come to OCaml went elsewhere. The result is a more impoverished OCaml library ecosystem, lesser number of users, companies, OCaml Ph.Ds, researchers… For a language to thrive, continue being relevant, get funding, get users, libraries etc. change needs to now happen faster.

On the positive side, OCaml’s usage is now increasing, the fundamentals of the language are strong and a brighter future still lies ahead. But, taking another two decades for Modular implicits for instance would not be possible. We need to solve the issues on this thread in one way or another sooner.

1 Like

I wonder how this will work for some obvious examples of type derivers. For instance, when deriving printers and equality comparators for the OCaml parsetree, you want to:

  1. print location_stack as “-”
  2. usually (but not always) print Location.t as “-”
  3. when you print Location.t in a meaningful way, you don’t want to print all of it, and you definitely want to print it in a manner different from the OCaml constructor/record representation, which is far too verbose for a data-type that is present everywhere
  4. in comparators, almost always you want to ignore location_stack, Location.t (that is, “equal=(fun x y → true)”)
  5. but sometimes, e.g. in tests, you want to actually check for equality, in order to verify that locations are copied forward in the expected way

All of this seems difficult to accomplish with a single universal representation into which data-types are transformed, before show/equal are appliied to them.

Just a few examples: there are many more.

I think that this hypothesis is perhaps a little too optimistic. Many issues (cancellation from a technical point of view or ownership of eio from a community point of view) are still being debated and silent solutions (in terms of communication) exist in parallel. The development of eio was not done at the same time as multicore but afterwards (the two projects, even if they are similar in what is possible and required, are orthogonal in their management).

More generally, the multitude is always healthier than a solution (with pros and cons) that would monopolise the community space - and I think we should continue to nurture this multitude rather than impose a certain solution. This multitude ensures that everyone (person, company or association) can be satisfied with a legitimate place in the community - and I prefer this kind of community to being able to auto-magically generate a pretty-printer :upside_down_face: .

More seriously, indeed, this multitude requires that some work on the existing (as you have just done) be done upstream to understand the subtleties and differences - but again, I think it is part of an engineer’s job to: 1) be aware of these possibilities 2) make an informed choice. OCaml has historically delegated a lot of issues to the community (and the state of the standard library somewhat confirms this attitude) but I’ve always appreciated the room for the language to do what I wanted it to do without requiring me to remake a whole world to do it or alienate myself from something I don’t really have control over (and I’m talking about both technical and social alienation here).

As far as dynamic types are concerned, we are satisfied with repr which does an excellent job according to our needs (the latter has very quickly integrated the question of the serialization of an int63 integer in an optimal way according to whether the value can be immediate - 64-bits - or boxed - 32-bits).

2 Likes

Type representations are not about metaprogramming (as in generating code for a given representation) they are about polytypic or generic programming (as in making algorithms that work on a generic representation of types).

Could you please maybe be more explicit about these many problematic door they open ?

We are talking about values that belong to the programming language. It’s an extremely low risk addition, at worst it doesn’t get used (like many other things in the stdlib did) because it’s too constrained or too slow – though I’m pretty sure it can be made good enough for tons of applications.

Besides seing that as being useful only for debugging and serialization is short-sighted. There are many other applications, comparisons, random value generators for property based testing, deriving user interface representations for values, etc.

5 Likes

Having a type representation means that you also introduce a dynamic type

type dyn = Dyn: 'a typ * 'a -> dyn

and end up with an unityped sublanguage inside OCaml.

Sometimes the unityped sub-language is useful! It’s opt-in and not worse
than having, say,

type printable = Printable : 'a printer * 'a -> printable

which is just an impoverished, ad-hoc version of the dyn type.

3 Likes

For sure not. Your printable type is just equivalent to an interface in Golang:


type Printable interface {
   print() string
}

whereas the dyn type is equivalent to the infamous interface {}.

1 Like

In general, yes, I spoke too quickly. But if the printer you package along with the value comes from a deriver… Then you could cut to the chase and provide the type itself so that the user can do what they want.

1 Like

You don’t have to. But basically you are against something that exists in the type system. I find these arguments rather weak, not to mention that a universal type exists for a long time in the language.

4 Likes

It’s an extremely low risk addition, at worst it doesn’t get used (like many other things in the stdlib did)

I wanna say I’m excited with this addition if it is to be considered. That said, I worry about the opposite of this quoted phrase, I worry that a solution which is not too adequate arrives, but then a better solution which greatly overlaps with it is later presented, and now we’re stuck with two slightly different ways of doing something, with subtle differences, causing confusion or question marks. I get this is a natural thing that happens as any language/stdlib evolves, but IDK what’s the maintainers’ current attitude to this. For the sake of backwards compatibility, we’ve been stuck with such peculiarities in Stdlib that often present themselves as easy targets for dismissing the language.
I get that we can’t satisfy everyone or get everyone on board, I’m just curious whether Stdlib is better equipped today to deprecate outdated pieces of code than it was before.

What will happen when implicits land? will Stdlib deprecate all t_of_u functions, dotted arithmetic functions, print_* functions, magical comparison functions, etc…?

2 Likes

For sure, that’s why I don’t understand what is the gain with runtime type representation compare to ppx deriving system (if you don’t want to write your printer by hand). And the fact that’s it’s better to write structurally polymorphic function instead of writing ones against a packed type (interface design à la Golang) is what I dislike in the design of Eio (you’re losing type information with such existential wrapper).

Runtime type representation and PPX deriving system are not mutually incompatible. You can have a PPX which derives the runtime type from a static type. This runtime type is a value. Then you can use this value to construct other values e.g. JSON encoding, Protobuf encoding, diffs and patches of values of the type, etc. Instead of having a different PPX for each kind of derivation, you can have a single PPX which does a universal derivation and then lets you program in terms of normal OCaml functions and values.

I don’t understand your second point, nor what Eio has to do with the current discussion about runtime types.

2 Likes

As @yawaramin suggests you can have a ppx that generates your runtime type representation if you wish. But you may also want to have multiple representation for the same type (e.g. to support schema evolution).

Also deriving automatically via a ppx is not necessarily the best way of adding more typed metadata to representation (see this comment).

Besides processors of the type representation need not to deal with ppx at all. In general the system is much more flexible.

1 Like