This is an informal survey to assess the interest in run-time types from various users. If you are not sure what this is about check lexifi’s talk at funocaml and specifically the type reflection section (timecode on the video: 33:00, slide number: 32).
the questions
Answer in any format you like
Do you use type reflection / run-time types / whatchacallit? If not, would you use it if it was in the Stdlib or in an established lib in the ecosystem?
Do you use some in-house solution or some off the shelf library? (Are you able to show the code?)
What do you use it for? What would you use it for?
Do you think it’d be useful even without any compiler support? What kind of tooling would make this viable?
the why
I’ve seen these things re-implemented multiple times. (I’ve even had to implement a couple of them myself.) I’m not sure there can be a stdlib module that covers all uses cases completely, but this survey and the ensuing conversation could maybe reveal that a large portion of use cases could be covered.
I’m keen for this to exist in the Stdlib but I’m not keen to make a proposal that fails before it even leaves prototyping phase.
You may want to have a look at typegist which is designed to be an eventual proposal for the stdlib. It’s still not released but I’m planning to use part of my OCSF grant this year to bring it to release along with a few more directly useful tools like this translation to jsont types.
As you argue in your interesting slides, I feel PPXes are too brittle but they do fill some purpose (easy pretty-printing, conversions…). A more robust approach based on a type representation, some compiler support (like your fork does), and a commonly-agreed-upon representation in Stdlib would be most welcome. Hopefully, the most active people in this area could fight fragmentation and devise a common type representation then propose it to upstream… I guess an issue is what level of abstraction you want to provide to the user.
The prototypical uses are: generic printing/scanning, encoding/decoding between OCaml types and various data encoding formats, deriving user interfaces/database schemas/cli interfaces/etc from type definitions. In short, many of the uses of PPX, but without the brittleness, and all done in the same “stage” as the rest of your code.
There are two parts to a system in the style of what is described in the slides:
The definition of the “type of types”, and
the programmer API to build values of this “type of types”.
The crux of the matter is 1, which is a prerequisite for 2. The choice of this “type of types” defines the universe of types that you are able to represent in your system. The design surface is large, and all choices come with tradeoffs. One tricky issue is the treatment of abstract types.
With this in hand, 2. is relatively small potatoes Compiler integration (ie having the compiler build the values that represent types on demand) is what makes the system ergonomic, but it is relatively trivial part of the sytem.
I am not arguing for PPX, but it seems to me that most cases where one wants type introspection, it’s actually better to have that in the compiling phase (more efficient and avoids runtime errors).
Modularity and generic functions. That is given your type description as an OCaml value I can devise a function you did not forsee or need to support that works with your data type.
The issue with ppx is that you need to know in advance what you want to
support. You ship a library, but you didn’t add ppx_deriving_cbor? Too
bad, can’t serialize to CBOR. And it’s not super reasonable to ask that
everyone supports everyone else’s ppx.
Side note: in Rust where serde is king, we see generic libraries such as https://facet.rs/
appear for similar reasons (no compiler support but derive this one
dynamic type, and use it for everything).
Besides the very good reasons that have been mentioned to argue for the use of runtime types":
While it is true that in principle compile-time code generation can be more efficient, PPX code generation can also substantially increase code size and compilation time, which comes with its own set of downsides.
Note that there is nothing inherently unsafe about “runtime types”. Suitable typed APIs can be provided so that code written against them is guaranteed not to trigger runtime errors.
I don’t currently use this, but would love to build systems that would depend on a facility like this (and which I would have to build myself, duplicating similar work).
It would also help me reduce the code size (at the expense of extra runtime cost) of libraries like smaws which generate marshalling code from bindings, which could just be types + static marshalling data with a generic marshalling layer, instead of dozens of megabytes of marshalling code.
Isn’t this solved by having a PPX for the generic runtime type representation like typegist? Once the PPX derives the typegist, anyone can use it to derive their own serialization.
An additional unrelated downside is that PPX-generated code is not hygienic. E.g., in ppx_deriving (the poster child of ppx) you cannot use some derivers if your type contains a variant also present in the ppx deriver lib’s runtime (Error because ppx_deriver needs to re-export the constructor for compatibility reason, but the deeper issue is that ppx-generated code lives under an unspecified scope where modules might have been open).
By contrast, if I provide a library with the function val to_json : 'a reflection -> 'a -> json, the function is written within a scope I control. No need to re-export the stdlib to defensively re-shadow modules that might have been shadowed.
IMO yes. It’s easier to convince a library author to use one single ppx-deriving from which you can use generic functions, than to convince them to add one more ppx, just one more i promise, one more we’ll be good.
This highlights the need for a library like typegist which is the whole discussion of this thread.
But also, maybe we are ok with a ppx to generate reflection, but maybe we’d actually prefer to have some different tooling. What if you could just instruct the build system to generate those reflection values for types, even types that are in libraries you are using rather than writing? Would that be easier or maybe more likely to cause issues? idk
It should be added that one advantage of typegists is that I can even devise some for your abstract types using your public interface even if you don’t use them (assuming a well-behaved constructor and accessor interface) .
Personally I’m not really convinced about having a PPX deriving typegists (though you always can). In practice data types can be quite messy – for example cache fields – and a generic function may need to be guided with directives in order to be useful. And for these directives to be usefull they need to be typed according to the type of the typegist which is going to be difficult to specify as annotations.
For example in the jsont derivation I mentioned above you can ignore fields using the generic ignore directive or if you are unhappy about the generic 'a Jsont.t derivation plug your own chosen 'a Jsont.t value rather than let the function derive it.
From that perspective I think that the small potatoes of @nojb may be a field of them. In fact I’d be less interested in the compiler generating typegists as giving me meta protractions to efficiently devise them (e.g. being able to get functional constructors for a record value/variant case). Combined with a real macro system perhaps that could allow both flexible and concise type gist definitions.
Btw. it seems to me that so far we are having the same discussion as three years ago.
Not sure - also not qualified to think through all the pros/cons tbh. Without compiler support a PPX seems like the only alternative. It would tackle the mechanical (but verbose) construction of the type representation and produce a unified target representation that can then be used by all downstream libraries. Presumably it would need a ppx_import like functionality of deriving representations for libraries that don’t have the ppx annotation? I wonder whether there are edge-cases where the untyped AST doesn’t have enough information (without running a typing pass) to determine the type correctly, even for the subset of values we allow to have a runtime type? Also, when you’re writing your own generic function say print, not annotating types (as one sometimes does) and relying on the compiler’s inference, you could pass in the wrong arguments at the callsite and scratch your head about why val print: 'a repr → ‘b → unit is failing at runtime - compiler machinery could prevent this presumably?
EDIT: P.S. Whatever happened to MacoCaml btw? Will it ever be a thing beyond academia?
Indeed, but the “old school of metaprogramming” way to deal with that would be to ask the compiler to extract the definition of such a type and then use some external tool to generate (during compilation phase) all the needed functions.
It’s still under active development. Getting some parts of the design right is taking a while, but we’re happy with the direction things are going, and will share more when it’s at a point where it’d be useful for the community to experiment.
The future can be quite difficult to predict, but that’s certainly the plan.