Using Ppxlib with Module Signatures

ngernest · April 3, 2023, 6:29pm

Hi all! I’m currently working on an automated tool for property-based testing of OCaml modules that uses the Core.Quickcheck library — I’m trying to write a PPX deriver that will take in a module signature, and derive appropriate ADT / function definitions for PBT purposes.

I’ve looked into using Ppxlib.deriving and Ppx.metaquot, although it appears that Ppxlib doesn’t have the ability to generate code from a module signature (based on this blogpost). I was wondering if anyone knows of any PPX derivers that take in an AST corresponding to a module signature and add more AST nodes, or if we should be using other PPX libraries (eg. ppx_deriving) to write the preprocessor instead? Thanks!

Chet_Murthy · April 4, 2023, 12:34am

In principle, it should be impossible for a type-deriver applied to a typedecl in a module signature to generate code. A type-deriver is supposed to act only locally, inserting items immediately after the typedecl it’s acting upon (due to an annotation). Heck, it’s not supposed to modify the typedecl, even. So since it’s restricted to adding items immediately after the “item” that is the typedecl, it follows that in a structure, a type-deriver acting on a structure-item typedecl can only produce structure items. And respectively in a signature, a type-deriver acting on a signature-item typedecl can only produce signature-items.

What are you trying to do? Can you provide an example?

Chet_Murthy · April 4, 2023, 1:41am

After some thought: maybe you’d like to produce a structure (a list of structure-items) based on a type-decl that is found in a signature? That could be accomplished by just putting the type-decl into a .ml file (as a structure-item). And since that would entail two copies of the same type-decl, you could use ppx_import to copy the type-decl from the signature/module-type, into the .ml file (structure). And then apply the type-deriver there ?

Maybe that’s what you’re looking to do?

ngernest · April 4, 2023, 1:51am

Thanks so much for the reply! We’ll look into using ppx_import.

We’re currently trying to create something along the following lines:

Suppose we have a module signature StackIntf (for a stack data structure):

module type StackIntf = sig
  type 'a t
  val push: ‘a -> ‘a t -> ‘a t
  val pop: ‘a t -> ‘a option
  val peek: ‘a t -> ‘a option
  val is_empty: ‘a t -> bool 
  val length: ‘a t -> int
end

We were hoping to create a PPX deriver which would take in the module signature and generate a module of the following form:

module StackTest = struct

  (** ADT denoting symbolic commands *)
  type cmd =
    | Push of char
    | Pop
    | Peek
    | Is_empty
    | Length 
    [@@deriving sexp_of, quickcheck]

  (** Generator of symbolic commands 
      (takes in a state argument for state-dependent command generation) *)
  let gen_cmd (st : state) : cmd Generator.t = ...


  (** Given a command [cmd] and the current state [st], move 
      the model's state to the next state by interpreting [cmd] *)
  let next_state (cmd : cmd) (st : state) : state =  ...

  (** Further property-based testing functions omitted**)
end

Here, the StackTest module property-based testing code akin to the QCSTM state-machine PBT library for OCaml, by defining an ADT representing symbolic commands (commands over the Stack with the same name as the Stack signature’s function names) and relevant QuickCheck generators for this ADT.

We were wondering whether this process could be automated (and applied to generic module signatures) by writing a PPX deriver.

Chet_Murthy · April 4, 2023, 1:58am

Oh how interesting. If you can provide more examples, I think one could assess the feasibility of your proposal. Looking at that QCSTM thing, I see some examples there. It seems like the hard parts are:

figuring out which parameter in each entrypoint is the “state” (as input and output)
figuring out how to “read out” the state to check for correctness

Then there’s the question of who’s going to build the test-scripts ?

But a priori, this seems like a doable thing. You aren’t looking for a type-deriver at all. Rather, just something that takes a signature and generates a full module of code. There’s nothing wrong with that – nothing at all.

P.S. ppx_import almost certainly won’t help. B/c all it does is import types and what you want to (uh) “transform” is entire signatures.

ngernest · April 4, 2023, 4:13am

Thanks for your reply! Here is another example.

Another capability we would like our tool to have is to take two modules that implement the same interface, and see whether they are behaviourally / observationally equivalent.
For example, suppose we have an interface SetIntf (representing finite sets):

module type SetIntf = sig
  type 'a t
  val mem : 'a -> 'a t -> bool
  val add : 'a -> 'a t -> 'a t
  val size: 'a t -> int
  val union: 'a t -> 'a t -> 'a t
  val inter: 'a t -> 'a t -> 'a t
end

Also suppose we have two modules respectively containing different implementations of SetIntf (eg. one implementation uses binary search trees, the other uses lists without duplicates).

We would ideally like to be able to generate a functor CompareSetImpls that takes in two modules of module type SetIntf and returns a test harness module as follows:

(** Functor that returns a test harness 
    comparing two modules that both implement the [SetIntf] signature *)
module CompareSetImpls (A : SetIntf) (B : SetIntf) = struct 

    type state = Set.M(Int).t

    (** The respective types of the systems under test (SUTs) *)
    type sutA = int A.t 
    type sutB = int B.t

    (** Generic SUT type *)
    type sut = AImpl of sutA | BImpl of sutB 
    
    (** Symbolic commands for the set ADT *)
    type cmd = 
      | Mem of int 
      | Add of int 
      | Remove of int 
      | Union 
      | Intersection
      | Size 
      | Is_empty 
      [@@deriving sexp_of, quickcheck]

     (** Generate symbolic commands based on the model's current state 
      *  TODO: figure out what to do about binary operations eg. [Union] *)
     let gen_cmd (st : state) : cmd Generator.t = ...

    (** Given a command [cmd] and the current state [st], move 
      the model's state to the next state by interpreting [cmd] *)
    let next_state (cmd : cmd) (s : state) : state = ...

    (** Interprets the symbolic command [cmd] over the two SUTs,
        & checks whether the result of interpreting [cmd] over [sutA] 
        & [sutB] agree with each other. 
        Here, [st] refers to the model's state prior to executing [cmd]. *)
    let compare_cmd (cmd : cmd) (st : state) (sutA : sutA) (sutB : sutB): bool = ...

    (** Checks whether preconditions hold for a given command and state *)
    let precond (cmd : cmd) (st : state) : bool = ...
end

This second example may potentially be too complicated for automation purposes.
In this example, ideally the compare_cmd function would be able to compare the state of the two implementations to check for correctness by examining some property across the two. For example, to check that the add operation for both Set implementations works, we would check that the size of both Set implementations increases by 1 after calling add.

However, I agree that figuring out which parameter in each entry point is the “state” will be challenging (especially for binary operations such as union & intersection in the example above). Hopefully our tool will feature some ability to build test-scripts (although this is very far down the line and not fully well-formed).

Chet_Murthy · April 4, 2023, 4:31am

More questions:

I assume that all these states are immutable?
Do you have fully-worked examples you can reference? So I can see how you intend to use these generated functions ?
It would be useful to understand how you will examine a state – e.g., in “compare_cmd” how you will compare these two states sutA and sutB.
I would think that (in your set example above) mem and size should be inspection operations, and add, union, inter (and probably “init”/“empty”) should be the operations ?

octachron · April 4, 2023, 7:17am

Overall, you are not describing a ppx but a code generator. In particular, if you want to generate code from module types, the good starting point is not the parsetree but the module type itself (that you can obtain using compiler-libs).

Starting from there, using ppx.metaquot can be a good way to generate valid parsetree before either marshalling the AST or printing it as text.

ngernest · April 4, 2023, 3:30pm

Thanks @Chet_Murthy for the questions!

(1 & 2) I assume that all these states are immutable? Do you have fully-worked examples you can reference? So I can see how you intend to use these generated functions ?

Yep you’re correct, we’re currently working with purely functional data structures for now. This repo contains some tentative worked out code for the aforementioned Set & Stack examples. (There are some low-level particulars eg. shrinking for QuickCheck generators which need to be worked out)

The specifics of our implementation are still up in the air, but hopefully I can report back with a link to a research poster in a few months’ time when this project is complete.

(3 & 4) In compare_cmd how will you compare these two states sutA and sutB? I would think that (in your set example above) mem and size should be inspection operations, and add, union, inter (and probably “init”/“empty”) should be the operations ?

Here’s a tentative implementation of compare_cmd, specialised to the Sets example. Essentially, we pattern match on the command, and use inspection operations (eg. mem, size, is_empty) that don’t depend on the underlying representation to compare the behaviour of the two modules.

We think operations that return some 'a t (eg. add, union) should be the operations (since they modify the underlying state), while functions like mem and size that return some type (eg. bool or int) independent of the underlying representation should be the inspection/observation operations. Ideally, when comparing two implementations of the same signature, we would only rely on the observation operations instead of interrogating the underlying state of the two modules.

ngernest · April 4, 2023, 3:34pm

Thanks @octachron for the feedback!

if you want to generate code from module types, the good starting point is not the parsetree but the module type itself (that you can obtain using compiler-libs).

Would you happen to have a pointer to the appropriate compiler-libs tools for obtaining the module-type? Thus far, I’ve been using Parsetree & Ast_helper to inspect the AST (and trying to find the module type therein), but perhaps there are better tools in compiler-libs for dealing with modules (I’d be grateful to hear about them thanks!)

octachron · April 4, 2023, 4:00pm

Typemod.type_implementation and Typemod.type_interface are the entry point for typing compilation units. Note that it might be even better to start from cmt or cmi files since they already contain all the type information (without needing to retrieve the build context). Reading cmi (resp. cmt files) files can be done with Cmi_format.read_cmi (resp. Cmt_format.read_cmt).

ngernest · April 4, 2023, 5:25pm

Thank you! I was wondering if you had any pointers to resources / example code for learning how to use Typemod and Cmi_format?

I tried looking online for various tutorials and found:

Jun Furuse’s compiler-libs-hack example repo from 2013
This blog post from 2013

However, I wasn’t sure if there are more updated sources that I should refer to.

Thank you!

pitag · April 4, 2023, 6:07pm

Btw, just as a side-note coming back to your first question/statement:

The problem about blog posts is that they get out of date: the post you’re referring to is from before ppxlib added support for derivers on module types. Here’s the up-to-date API documentation, which includes support for module types. However, you seem more interested in generating new modules than in deriving code inside a module, so this is just a side-note.

ngernest · April 4, 2023, 6:14pm

Thanks for the info, this is helpful! I’ll take a look at the source code for the add function in the Ppxlib.Deriving module.

Topic		Replies	Views
PPX: `[@@deriving fold_sig]` that generates a signature and a functor Ecosystem	18	680	February 7, 2023
Deriving make for abstract types in signatures Ecosystem ppx_deriving , code-generation	2	1032	September 13, 2018
Writing my own type-deriver Ecosystem ppx , core	3	1221	September 26, 2017
Ppxlib: preprocess structure and signature item simultaneously Learning ppx , ppxlib	1	718	June 29, 2021
Ppxlib.metaquot and identifiers Learning ppx	5	589	November 13, 2023

Using Ppxlib with Module Signatures

Related topics