Idea: Standard OCaml runtime type representation

Um, is that really true? If a type from module M is not abstract, you can always use ppx_import to import it, apply derivers, and then use those derivers on the original type, no? Is there something else going on that I’m missing?

[perhaps that ppx_import is tedious, b/c you can’t import entire groups of types, but only one type at a time?]

3 Likes

Oh nice, that is pretty cool - I did not know that ppx_import existed!

So I guess if I had:
a.ml with type t = {foo: B.t}, b.ml with type t = {bar: C.t}, and c.ml with type t = {baz: int}, I’d have to import C.t and then B.t in order to write derivers for A.t?

Does appear a bit tedious on the face of it, but good to know that it’s doable!

Yes: if you look at the ppx_import documentation, you’ll see that there’s a way to rename
“type module-identifiers” (like M.N.t) and you often have to combine that with importing multiple types. It’s a little tedious, and when I implemented my version (pa_ppx_import) for Camlp5, I added support for things like importing groups of types. But regardless, you should be able to almost-always import types from far-away, as long as they’re not abstract, and apply arbitrary derivers, which then can be used on values of those original types.

I do it all the time. I. Mean. All. The. Time.

2 Likes

It would be nice if you could write it up as an article explaining it, either on ocaml.org or ocamlverse.

I think in general, we need to coordinate more about ppxs and their usage. I think part of the reason people turn to refl is because it’s “simpler”, but ppxs are quite good as they are. I use ppx_import, ppx_deriving, ppx_deriving_yojson – I would love to see them become more featureful but for that we need to focus on one for each task and talk/guide/discuss how to use them most effectively. refl is unfortunately a distraction from that and a split in effort.

3 Likes

If you’re talking about documenting how we could use ppx_import to do @@deriving for types that are defined in other modules that we don’t “own”, here is a useful snippet of documentation that uses ppx_import (taken from GitHub - ocaml-ppx/ppx_import: Less redundancy in type declarations and signatures) that tells you what to do.

It's possible to combine import and deriving to derive functions for types that you do not own, e.g.:

type%import longident = Longident.t [@@deriving show]
let () =
  print_endline (show_longident (Longident.parse "Foo.Bar.baz"))
(* Longident.Ldot (Longident.Ldot (Longident.Lident ("Foo"), "Bar"), "baz") *)

Note that you need to require import before any deriving plugins, as otherwise deriving will not be able to observe the complete type.

Indeed this is a powerful trick mentioned by @Chet_Murthy in reply to @ahem 's comment. There are more fancy things mentioned by Chet, I’ll leave him to document that. I wanted to concretely comment on this for people who didn’t know about it already.

Please note that technically we don’t need to use ppx_import – it merely makes it more convenient and maintainable.

We could have explicitly copy-paste the type definition like this to achieve the same result.

(* We have copy pasted the Longindent.t definition *)
(* We didn't use ppx_import here but achieve the same result *)
type t = Longindent.t = 
    Lident of string
  | Ldot of t * string
  | Lapply of t * t
  [@@deriving show];   

This uses the ability of the OCaml type system to tell the compiler that two types are equal. Here the point is that Longindent.t is an external type to us so we need to redefine it locally (so that we can do @@deriving ), tell the compiler that it is equal to to Longindent.t and then use [@@deriving]

It goes without saying that all functions that take Longindent.t will work just the same you give them a t (as the types are the same).

As this is a facility that the OCaml type system gives you, the same thing works will work in refl also when it comes to deriving things for types we don’t “own”.

1 Like

PPXs have costs as well:

  • Complexity of dealing with PPXs
  • Maintenance burden of supporting and upgrading PPXs
  • Relying on OCaml parsetree data structure
  • Compilation speed impact
  • Increase in size of output executables
  • Moving more operations into magic macro land away from regular OCaml value land
  • Forcing users to assemble a menagerie of PPXs and their dune configs
3 Likes

I don’t expect you to believe me, but:

#2 is simply not an issue. If you or others finding writing and maintaining PPX rewriters to be burdensome, that’s b/c you’re using subpar infrastructure.

#3: I did a diff once between versions of the OCaml AST from 4.10 thru 5.0. You would be surprised at how little changes. Truly very little changes. And so, to demonstrate that this isn’t a problem I did the following:

  • implemented pervasive quasiquotations for the 5.0 OCaml parsetree
  • then ported it backwards to each of 4.14…4.10 (to demonstrate that maintaining it in the face of a changing AST is easy)
  • then converted ppx_jsobject_conv (which I know/knew nothing about – it was just the most-recent example cited by others as “wow, PPX rewriters suuuuck” in this discussion forum) to use these pervasive quasiquotations
  • and THEN hacked ppx_jsobject_conv build so that it forced the use of different OCaml parsetree versions – to demonstrate that the PPX rewriter code worked with all the various syntaxes from 4.10 thru 5.0, without any changes needed.

My point being: no, dealing with the changing OCaml parsetree isn’t even breaking a sweat, unless you explicitly use some feature that isn’t supported by other parsetree versions. But that’s vanishingly rare.

#last [dune configs] This is why I use Makefiles. They work, and they’re simple.

#1 I don’t know what you mean here. PPXes are much simpler than dealing with GADTs and crazily complex type families, I would think.

ETA: I should note that the pervasive quasiquotation support is implemented using Camlp5, but when it is used, it works with the standard ppxlib infrastructure. And it could be re-implemented to be independent of Camlp5.

2 Likes

When I was a newcomer I purposefully avoided anything that was ‘extra’ in any way in order to reduce the mental load, which is a strategy I would recommend to anyone. I therefore expect many newcomers to avoid PPXs, which means no debug printing for them. Now, if a subset of PPXs were made part of the core of the language, then a newcomer could be expected to use them from the start. They would need to be documented in beginner tutorials, updated always in time for compiler releases etc.

Is that good or bad? Are we hoping for something better than PPX mid-term and want to avoid locking them in?

3 Likes

I don’t think so. There’s no plan for anything more complex than PPX. What people have been “waiting” for is that the people responsible for PPX infrastructure decide on the best libraries to write PPXs with. I believe ppxlib is it now. ppxlib creates principles for combining PPXs so they have minimal side effects. What we want to do is to make it as easy as possible to use PPX extensions. I agree that average people are staying away from them because

a) They require a little more info in the dune file. Even a bit more configuration seems intimidating. Making them run by default would really help IMO.
b) There’s very little documentation explaining them or how to deal with issues relating to them.
c) Writing PPXs seems intimidating - you need to know the OCaml Parsetree - and that pushes people away in general.

@rgrinberg would it be possible to have a selection of key ppxs be opt-out in dune rather than opt-in?

Until dune is capable enough to actually fetch and install these ppx’s when they’re absent, the workflow would be quite poor IMO.

As a side note, I do have some related ideas for making ppx’s easier to use in dune. For example, it would be nice if ocamldep could spit out a list of extension points and annotations present in a source file. So that if a user writes [@@deriving show], dune would know that a particular ppx must be present in the dune file. To build a map from annotations to opam packages, we could include additional metadata in the opam file:

$ cat ppx_deriving.opam
...
x-deriving: {
  ppx_deriving.show: [ "show" ]
  ppx_deriving.eq: [ "equal" ]
}

It wouldn’t be quite as automatic as including them by default, but it would at least guide user on how to get started with them.

5 Likes

I’m interested to see your benchmarks - optimising generic functions by pre-computing the closures is usually quite efficient. It’s what repr is doing by using a staging API, for instance.

1 Like

I understand that having a hard dependency on these ppxs would create a chicken-and-egg problem since with dune we’re already so close to the “bare metal”, so to speak. But could we have a soft dependency? If they are found, they could be run automatically.

Huh, you make me instantly think of a “ppxdep” that would be like “ocamldep”, but would scan a source file and compute the list of PPX rewriters that it probably depends upon. That seems like a really trivial thing to write. One would want it to be driven by a little database of patterns, so as to be extensible in an easy way that wouldn’t require adding code. Maybe PPX rewriters could come with “pattern files” that would be added to that database.

This doesn’t solve the problem of ensuring that those rewriters are installed, but it could solve the problem of ensuring that they get quasi-automatically invoked when needed.

@bluddy without having a way to tell the users what went wrong if they’re not found automatically, the user experience would be poor.

@Chet_Murthy The way I imagine it, there’s no need for ppxdep to know about a database. It would be enough to just return the names of all extension points and annotations. The build system could manage this database because it knows about a ppx rewriter. When defining a ppx rewriter, one should define all the patterns it matches on outside the rewriter itself. This is so that we can build this pattern database without building and executing the rewriters themselves. In other words, we want this database to be available for all packages, installed or otherwise.

2 Likes

Fair enough: I certainly would prefer this. What I meant though, is that I think before adding support to the opam-file, we ought to implement “ppxdep” and see what it looks like, what complexity is required in the database. Also, there’s the slight problem that there can be more than one PPX rewriter that support particular attrs/extensions. Not sure how that gets dealt with. All of that will have to be hashed-out, so best to experiment before modifying opam.

But it’s a good idea, to automatically infer which PPX rewriters are required.

In the case of a conflict, I would imagine that we list all the available options to the user.

There would be no need to experiment with this. The opam maintainers allow us to add arbitrary metadata to packages via x-* attributes. I think this should be all we need.

1 Like

Followup: The just announced DkML 2 has refl automatically available in the global environment: [ANN] DkML 2.0.x Releases . The Quick Start shows what can be done in utop-full immediately after installation. When it is ready, I also plan to make @dbuenzli’s typegist available for folks who want more control (performance, no PPX, etc.).

2 Likes

Regarding the opinion that type representations could break abstraction: I don’t think this would be a problem in practice. A library which doesn’t want to expose the internals of an abstract type should not export a representation for this type, only the printers, comparators, generators etc. that the library intends to offer.

Failure to export a useful printer, comparator, etc. should be treated as a defect of the library, much like today a library can fail to provide some useful functions on an abstract type.

We could also have to “customize” parts of the runtime representation of a type by a mechanism based on attributes, like what is done with ppx_deriving. This would allow to “hide” some “sensitive” parts of a type, if need be.

Such a standard type representation would allow to replace some use cases of Ppxlib with type-safe alternatives. In addition, type representations could be used in PPXes themselves to replace AST pattern-matching. The maintenance of Ppxlib has a non-negligible cost for the community (shout out to @pitag), so in the long-term phasing out the parts of its API that can be replaced with manipulation of type-representing values is very interesting, I think.

1 Like