Run-time types: uses and wants

Not in production code by any means but I hope my experience is useful:

I tried to roll a lightweight dyn+trep once using extensible variants and functors, for a discussion I was having with friends. Unfortunately I can’t remember where I put the code… What I can remember is the issues I ran into. There were two while iterating on the design with friends:

  • they didn’t survive marshalling (later learned Type.Id has a similar limitation)
  • ensuring their uniqueness in the general case proved difficult.

Could be a skill-issue but I guess if we’re in the process of getting a compiler-supported implementation, these two things could be addressed more automatically.

That’s as far as wants go. As for uses, I can see it making heterogeneous containers nicer to use. Could also be the basis for a checked downcast operator. Could also be enabled globally in repl and when compiling for the debugger, to print arbitrary values without having to install printers.

Erasure makes both code size and memory profile lean, so I hope the upcoming runtime types design doesn’t incur a cost everyone has to pay all the time. I worry that without mature DCE that might be hard.

I have a DynType module in my unpublished Acero library that I could replace with Typegist without too much grief. It’s basically a refactoring of the Type and Type.Id modules to add optional witness attributes to identifiers. I use it for the same kinds of things you would do with Typegist, i.e. to encapsulate intermediate representations of structured messages. I also use it for containing heterogenous contexts in my various syntax tree analyzers.

If the standard library had something like this then I would use it.

In the approach to runtime types that we have at LexiFi (sketched in the presentation linked to above), we do not modify the runtime model of OCaml in any way: in spite of its name, there is nothing “runtime” about “runtime types”: values representing types are ordinary values like any other; building and serializing these values has a (small) runtime cost but you certainly do not pay anything unless you use the feature.

Cheers,
Nicolas

Reading @hyphenrf and @jhw’s answers I think there is again (see the previous discussion I linked to) a misunderstanding about what is being discussed. This is not about providing a universal/dynamic type in the standard library, it’s about reflecting types as runtime values.

and just to be even more precise

“runtime values” here just means some normal values handled by the program during execution of the program

(not something modifying the runtime of the language, not something handled dyring and erased by compilation)

Sure, and I would add that, while implementing a universal type as type universal = Universal: ‘a * ‘a Type.Id.t → universal is possible, it’s tedious to match multiple possible projection types using only the Type.Id.provably_equal function.

My DynType module has a universal type that uses its !'a identifier type instead of 'a Type.Id.t and that allows for extracting the type witness value from the type identifier value and matching on the witness. It means taking care to attribute the witness properly when injecting, but that’s less tedious than a sprawling tree of nested project calls.

I’m very much in favor of having a well-designed representation of types in values.

Something like Typegist has all the components that I’d want in such a design.

About automatically deriving the type reflections: I think it would still probably be a convenient feature to have; it would make it even less work to call convenience generic functions such as one for debug printing, like we have in the toplevel, if the runtime type can be synthesized for simple types. Having the ability to specify metadata in the type definition could be a plus, but not necessary or could be thought out later, since there is always the possibility to construct the type reflection by hand.

I expect that several libraries that use “runtime types” will implement a way to override the default behaviour like @dbuenzli does in with Jsont, and/or to implement a specific behaviour for a given abstract type. Indeed, in the literature on type-indexed functions in OCaml there are three possible attitudes toward abstract types:

  • fail on them since their structure is unkown. Rather limiting for our type-indexed functions.
  • go through alternate representations aka “views” via a bidirectional mapping. This is what Typegist does.
  • Implement a mechanism for registering custom behaviours on specific abstract types, effectively implementing a form of ad hoc polymorphism similar to Haskell typeclasses. Indeed, going through an alternate representation can be limiting sometimes. For instance, if I implement type-safe generic serialization, I may want to let the user plug their own (de)serializers for things like collections, instead of forcing potentially large collection to e.g. arrays in order to serialize them.

Approaches 2 and 3 are not exclusive and what is nice is that this ad hoc polymorphism can be implemented on top of runtime types as a library and doesn’t need to be part of the core functionality.

I don’t think your 3. fits into the picture of 1. and 2. here. You are no longer talking about representing the type as a value you are talking about processing the representation.

In typegist however, point 3. is precisely achieved via the typed metadata of typegist that you mentionned earlier. That is for a given processor you can provide your own specialized implementation of the generic function in a metadata field of the described abstract (or not) type.

For the Jsont deriving function you can provide your own Json.t value to entirely bypass the automatic derivation, but this can be functions too. For example if you are unhappy about the default pretty-printing preformed by Fun.Generic.pp you can fully override its behaviour by using your own formatter and store it in the Fun.Generic.Meta.Fmt metadata key of the type gist of your type.

Sure, but is it extensible by the user? For example, if a collection library exports a typegist for Collection.t, and someone wants to use some type-indexed serialization library on that type (among others), which wasn’t foreseen by the author of collection, won’t the user have to redefine a new Type.Gist.t themselves with the additional metadata they need?

In any case, I’m not denying the value of metadata which can be useful in many circumstances.

Yes you are right.

I can see that being sufficient in some cases and annoying in others. For example if this type gist is used by other type gists then you don’t get your modified version and possibly a mix of type gists for the same type during generic processing. Very bad.

However it seems to me that there is an easy way out to provide something like you’d like to have (currently not provided I will add it), each metadata dictionary should automatically have a Type_id key with a 'a Type.Id.t value.

That way you can recognize specific type gists and the type it represents without having to modify them (I’m wondering whether we could then perhaps fully specify the metadata on the side via maps on the type identifiers, but for nested structures it would likely be annoying)

Yes, I think that would be very useful.

In fact the whole metadata story is a good part of the todo before release.

Answering the original post (I’m a bit late to the party).

I did define “types of types” multiple times. The goals varied and included:

  • displaying values for debugging
  • de/serializing values from/into various binary or JSON-like file formats for storage (sometimes even converting from one format to another)
    • or for inter-process communication
    • or for an HTTP server and client
    • or to pass JSON values to a JavaScript backend
  • generating random values (e.g. for property-based tests)
  • generating documentation

I never used a PPX for this. I avoid things that make my projects harder to go back to years after they are written. As a general rule this means I avoid dependencies when I can, only using well-made libraries that solve a very specific problem very well and that would be too much for me to reimplement (e.g. I don’t hesitate to use the js_of_ocaml compiler, or the tsdl library). Preprocessors are particularly problematic in this regard because one needs to understand how they fit into the build system, and PPXs in particular seem to break more easily after compiler upgrades. I remember being particularly upset that an old project didn’t compile anymore because the js_of_ocaml preprocessor was no longer available after being rewritten as a PPX. It took me a while to understand what changed exactly; the opam package names were different in a way that was not obvious; and of course the whole syntax had changed as well. So I feel third-party preprocessors just create more problems for me in the long run. Maybe I’m wrong but that’s why I avoid them.

If something was available in the stdlib / in the compiler itself, it would be an entirely different story and I would probably use it. Having a Type module (the types of types) in the Stdlib would be a great start, just like adding the result type allowed all libraries to be compatible instead of them redefining their own.

Nowadays however, I think my main use case for this is actually just debugging. I’m fine with writing (de)serializers by hand. It allows me to control exactly what the values look like in the target format, and when breaking changes are introduced. I’m fine with writing random value generators by hand. It allows me to control the distribution of the result. It doesn’t actually take a lot of time (and I suspect that this time can be greatly reduced with AI nowadays) and with a bit of discipline it is not as error-prone as one could think. Repetitive, no-brain tasks don’t annoy me as much anymore.

But displaying values for debugging, that’s really the main, if not the only feature I miss from OCaml. Being able to just write debug x or something that would print x = { … }\n would relieve so much pain! Multicore and algebraic effects are nice and all, but it’s nothing compared to the time lost by context switching back and forth between “I’m debugging this very complex bug” and “I need to write a display function to print this complex data structure”. Or being too lazy to write those display functions and trying to guess what went wrong for a couple of hours, only to give up and write the display functions anyway.

(I’m exaggerating — multicore and algebraic effects are great :stuck_out_tongue: )

Another option to solve the debugging problem would be to have all values carry information about their type. Before multicore, I actually had a look at it and I think that on 64-bit architectures, value tags were large enough to do that. Not sure now since I believe the runtime representation changed slightly. The benefit of this approach is that one could display any value, including abstract ones. You could even register custom print functions for each specific tag. It would probably not have a significant performance impact — instead of allocating a value with tag 123, you would allocate a value with tag 718300000123, or something, and the relevant data (e.g. “this is a constructor named Some with one parameter”, or “this is a record with fields x, y and z”) would be in a table that would optionally be linked with the program. A debugger could use this information as well. It would not solve other use cases (deserialization in particular) though, and some unboxed values would be mistaken for other things, but I don’t care.

TLDR: yes please