Answering the original post (I’m a bit late to the party).
I did define “types of types” multiple times. The goals varied and included:
- displaying values for debugging
- de/serializing values from/into various binary or JSON-like file formats for storage (sometimes even converting from one format to another)
- or for inter-process communication
- or for an HTTP server and client
- or to pass JSON values to a JavaScript backend
- generating random values (e.g. for property-based tests)
- generating documentation
I never used a PPX for this. I avoid things that make my projects harder to go back to years after they are written. As a general rule this means I avoid dependencies when I can, only using well-made libraries that solve a very specific problem very well and that would be too much for me to reimplement (e.g. I don’t hesitate to use the js_of_ocaml compiler, or the tsdl library). Preprocessors are particularly problematic in this regard because one needs to understand how they fit into the build system, and PPXs in particular seem to break more easily after compiler upgrades. I remember being particularly upset that an old project didn’t compile anymore because the js_of_ocaml preprocessor was no longer available after being rewritten as a PPX. It took me a while to understand what changed exactly; the opam package names were different in a way that was not obvious; and of course the whole syntax had changed as well. So I feel third-party preprocessors just create more problems for me in the long run. Maybe I’m wrong but that’s why I avoid them.
If something was available in the stdlib / in the compiler itself, it would be an entirely different story and I would probably use it. Having a Type module (the types of types) in the Stdlib would be a great start, just like adding the result type allowed all libraries to be compatible instead of them redefining their own.
Nowadays however, I think my main use case for this is actually just debugging. I’m fine with writing (de)serializers by hand. It allows me to control exactly what the values look like in the target format, and when breaking changes are introduced. I’m fine with writing random value generators by hand. It allows me to control the distribution of the result. It doesn’t actually take a lot of time (and I suspect that this time can be greatly reduced with AI nowadays) and with a bit of discipline it is not as error-prone as one could think. Repetitive, no-brain tasks don’t annoy me as much anymore.
But displaying values for debugging, that’s really the main, if not the only feature I miss from OCaml. Being able to just write debug x or something that would print x = { … }\n would relieve so much pain! Multicore and algebraic effects are nice and all, but it’s nothing compared to the time lost by context switching back and forth between “I’m debugging this very complex bug” and “I need to write a display function to print this complex data structure”. Or being too lazy to write those display functions and trying to guess what went wrong for a couple of hours, only to give up and write the display functions anyway.
(I’m exaggerating — multicore and algebraic effects are great
)
Another option to solve the debugging problem would be to have all values carry information about their type. Before multicore, I actually had a look at it and I think that on 64-bit architectures, value tags were large enough to do that. Not sure now since I believe the runtime representation changed slightly. The benefit of this approach is that one could display any value, including abstract ones. You could even register custom print functions for each specific tag. It would probably not have a significant performance impact — instead of allocating a value with tag 123, you would allocate a value with tag 718300000123, or something, and the relevant data (e.g. “this is a constructor named Some with one parameter”, or “this is a record with fields x, y and z”) would be in a table that would optionally be linked with the program. A debugger could use this information as well. It would not solve other use cases (deserialization in particular) though, and some unboxed values would be mistaken for other things, but I don’t care.
TLDR: yes please