Thanks @gasche for the very clear explanation. However, I would like to emphasize one aspect. The wording “breaks type abstraction” implies that runtime types somehow break some fundamental property of the language, while in fact runtime types “break” type abstraction in much the same way as any serialization that allows round-tripping does: if you can jsonify an abstract type, you could modify the json and un-jsonify and you will have “broken” type abstraction. Runtime types are just a more powerful version of that.
As explained, runtime types do not “break” anything, they are just a more powerful version of any kind of serialization that allows round-tripping. If you want to keep your type “really” abstract, then you just don’t expose a type witness for it, or you only expose specific functions derived from it, or you expose an “opaque name” type witness (as mentioned in @gasche’s post).
We have a lot of experience working with runtime types at LexiFi and the fact that they “break” type abstraction in this sense has never been a source of issues. Also, we implement a version of the “opaque names” that @gasche mentioned, which allows to expose “abstract” type witnesses when needed.
I think the point is that the library author loses control with this method. Once you expose the module’s internals with runtime types, the user can do whatever they want with that information, including things they shouldn’t be doing based on specific module internals. The opposite approach is opting in to specific functionality and supporting it, as @gasche stated. This is the approach taken by type classes and by the generation of code as needed via ppx.
It’s hard to deduce from Lexifi’s experience (or that of any specific company) in this case. Lexifi is one company and that allows for coordination between clients and providers of libraries. An open language ecosystem doesn’t have this advantage.
Generally I haven’t had problems with multiple ppxs, particularly if they use ppxlib. ppx-deriving provides multiple functionalities so they’re essentially guaranteed to work together. In any case, runtime type representation is a controversial choice. OCaml has a lot of experience with fundamental projects that made controversial choices, for which we end up paying years down the road.
I’ve only been watching this discussion cursorily, but I’ve used PPX’s with large collections of types including many that are abstract. And I don’t see how this refl proposal would be any different than using other PPX derivers. When I use a PPX deriver on an abstract type, typically you lose round trip capability. So converting to Yojson an abstract type turns into a token of some kind.and converting back, you get an exception. Now obviously I could have applied the deriver to the concrete type that was made abstract. But that isn’t a requirement, standard PPX derivers work fine without it. Why would things be different with refl ?
Indeed I think that "breaking abstraction’ is inherent to serialization process: in order to deserialize your value you have to “leak” as much information, up to isomorphism, that your type contain.
As I and others said above, there is choice. As an abstract module/type writer, I can choose which ppxs to support and which ones not to support. Once I allow refl, I give the user absolute permission to do whatever they want. For example, I can disallow comparison functions (because of some domain-specific reason, for example), but allow serialization. You can try to activate ppxs, but without a concrete implementation (which you can create by extending the module sometimes and by vendoring it otherwise), there is still some say by the module author about which behaviors are supported.
Certainly true. There’s also an implicit understanding that serialization is brittle, and that the slightest internal change will mess it up unless a lot of work is done (and the library author can choose to create a serialization implementation which will conform to the derivers and still be customized to the degree he chooses). With refl, there is no abstraction barrier with regard to any possible action anymore. The module client is now fully in control.
But, we can already break type abstraction with GADT.
type (_,_) eq = Eq : ('a,'a) eq
module M : sig
type t
val make : int -> t
val typ : (int, t) eq
end = struct
type t = int
let make i = i
let typ = Eq
end
(* the value `i` has an abstract type *)
let i = M.make 2;;
val i : M.t = <abstr>
(* abstraction boundary protect me to use it as an `int` *)
i + 2;;
Error: This expression has type M.t but an expression was expected of type int
(* I break abstraction boundaries *)
match M.typ with Eq -> i + 2;;
- : int = 4
How? If you provide an abstract type, can a third-party user use refl to punch through the module signature and access the implementation of the type? Based on ppx_deriving, I don’t think so?
Indeed if, as a producer / module writer, you only provide a deserialising function with this signature:
deserialise : t typ -> serialisation_format -> t
you are still in full control of what can be done with your type.
In the example I gave above, when I break type abstraction the result is of type int and not of type M.t, and so any other method that I could have defined on M.t would be unusable. But, in the case of the GADT I gave, it’s even worse, because I could really break type abstraction with any invariant that it contains.
match M.typ with Eq -> (i + 2 : M.t);;
- : M.t = <abstr>
Summary, using already mentioned concrete scenarios:
An OCaml newcomer should be able to debug their code by printing its values, without writing excessive boilerplate and without installing an opam switch with extra packages.
An OCaml newcomer should be able to implement hash/compare/equals, without writing excessive boilerplate and without installing an opam switch with extra packages.
A module writer who handles sensitive data should be able to practice defensive programming (make it burdensome to get at the sensitive data).
An opam package publisher who wants to hide internal private details should be able to practice good public API design (make it burdensome to get at the private details).
An OCaml developer, whether newcomer or not, should be able to print/hash/compare/equals (generalizing scenarios 1 and 2).
An OCaml developer, whether newcomer or not, should be to implement from_json/to_json, from_protobuf/to_protobuf, etc.
A developer wants to provide other developers (or themselves) generic functions that work on runtime type values.
The answers:
1+2 are just pick a PPX and make it available. Any PPX. I’ve picked refl, with some pushback coming from (unrelated!) scenario 4. I haven’t seen a showstopper for providing refl to newcomers
3+4 are edge cases but trivial. If you want to hide something, use an .mli interface and be intentional on what you expose. Do not expose from_json, the refl value (a, structure, arity, rec_group, 'kinds, positive, negative, direct, gadt) desc, etc unless you really want to
5+6 are not standardized, but nothing blocks an OCaml developer from using their favorite PPX or handwriting their own pp, show, etc. functions
7 has no agreement. (Personally, I’d love to see something like LexiFi’s RTTI compiler patches!)
I think that there is a big difference in nature between the sort of fragility you get from serialization (where it is well-understood that poking at the serialized form and transforming it is unsafe and will not feasibly be future-compatible, unless very strict versioning guarantees are documented) and the one you get from a general runtime type information, where you advertise the full type information by default, in a way that is designed to be observed easily and used programmatically.
“Debug printing” sits somewhere in the middle, it provides slightly more fragility than serialization (it will easily find itself in reference output of some tests for example) but much less than exposing an interface designed for programmatic use – and is also hugely convenient to have.
OK that’s a good point. So does that mean that if one is utilizing a type abstraction, refl is of no help, and one still needs to provide the appropriate show/equal/compare/serialize functions?
If so, I think that brings me back to the point that we need to standardize on the main ppxs and their related functions rather than a runtime type representation. If the stdlib had these ppxs built-in in some way or if they were blessed in some way, everyone could/would provide these functions for their abstract types.
Refl is no help to the external consumer of the module with the abstract type. The module author can still use refl to derive the runtime type and provide show/equal/compare/etc. functions to the module consumer. In any case what we are saying is that you would need only one deriving PPX, not a bunch of them, for every derivation.
Yes, exactly, and as a beginner-friendly logging/printing/etc. mechanism we are proposing to sacrifice some theoretical performance gains in exchange for general applicability. People who care about every microsecond of performance or have special correctness requirements can always avoid it and hand-write their own functions or use other third-party PPXen.
Just a quick note that, based on my initial measurements, the performance overhead for the kind of generic type representations (tagged GADTs encoding with type witnesses) discussed here tends to be x5 compared to hand-written (or ppx derived) implementations.
There might be room for improvement of course, but I doubt it will be possible to get anywhere close to the baseline without some kind of staging.
I’d be happy to share the benchmark results when I have time.
It’s important to put this in context though. If the application is mostly I/O then the cost of runtime type operations is a rounding error compared to the cost of I/O.
As users of LexiFi’s patched compiler (in a different company), our experience with the runtime types has been quite good. We have written several libraries using the runtime types functionality (property-based testing, type-safe database accessors, serialization libraries, etc.) - sometimes even operating on values of abstract types (C++ objects), using the “opaque names” approach - which does not break abstraction, and has not required any coordination with LexiFi or internally between library authors and clients.
In practice, a big advantage of the runtime types over ppxes has been, that for a ppx, if you want to, for example, serialze type t = {foo : Some_module.t; bar: int} you need Some_module to have been preprocessed also, which isn’t always the case if it’s coming from someone else’s library - but the runtime types are always available - at least for non-abstract types.