[Solved] Piping into constructors

I often write code that uses pipes, and sometimes have constructors on the end of those pipes:

type val = DInt of int

let myfunc : val = 5 |> DInt

This was all well and good when I was using Oasis. When I try to use jbuilder to build instead, I get this error:

Error: The constructor DInt expects 1 argument(s),
       but is applied here to 0 argument(s)

If I amend my code to:

let myfunc = DInt 5

then everything works great.

Since I like to pipe into constructors and was very happy doing it, my question is this: what was oasis doing that allowed me do that, that jbuilder does not?

This was my _oasis file:

Name: ConstructorWeirdness
Version: 0.1
License: PROP

OASISFormat: 0.4
BuildTools: ocamlbuild
Plugins: DevFiles (0.4)

Executable "constructorWeirdness"
  Path: src
  MainIs: main.ml
  BuildDepends:
    cohttp.lwt,
    yojson,
    core,
    ppx_jane,
    threads,
    postgresql

  # use byte because osx wants us to click a button every time
  CompiledObject: byte

Here’s my jbuild:

(jbuild_version 1)

(executable
  ((name main)
    (public_name main)
    (libraries (core cohttp.lwt yojson threads ppx_jane postgresql))))

Any thoughts? Thanks!

1 Like

I suspect you will not be able to reproduce this. In OCaml constructors are not functions.

(Also please be careful not to abuse the pipe operator it sometimes make code more obscure than enlighting)

1 Like

I don’t see how your code could have compiled. There are at least two errors: val is a keyword and OCaml doesn’t allow constructors to be used as functions. Your build tool wouldn’t matter here.

The reason is that your oasis setup applies the ppx rewriter ppx_jane, whereas your jbuild setup does not. Let me explain the details:

The package ppx_jane bundles up a bunch of ppx rewriters, one of which is ppx_pipepang. This rewriter syntactically replaces x |> f with f x. As a result, the line

let myfunc : val = 5 |> DInt

gets rewritten to

let myfunc : val = DInt 5

which is valid OCaml (forgetting for a moment that val is a keyword).

In your oasis setup, ppx_jane is handled correctly, thus the rewrite gets applied and compilation succeeds.
In your jbuilder setup however, you include ppx_jane as a library (rather than a ppx rewriter), and so the rewrite never happens. Thus compilation fails, because 5 |> DInt is not valid OCaml (since constructors are not functions in OCaml).

As an aside, the fact that constructors are not first class citizens in OCaml is somewhat annoying and doesn’t have any conceptual reasons. As far as I know, it’s a design choice to simplify the implementation / improve performance.

6 Likes

Thanks for pointing out this package. I didn’t recall it. What’s the point of this? I vaguely recall there was some performance hit when using |>. Is that the reason?

From what I remember of the explanation, it wasn’t even for performance. In fact, it existed in OCaml’s predecessor, caml-light. The feature simply didn’t seem useful enough to the designers to include it in OCaml, which in retrospect seems really odd. Not only that, it’s near-impossible to insert this feature in a backwards-compatible way, which means OCaml is most likely never going to have it.

Thanks @smolkaj, that did indeed pix it!

Here’s the jbuild file that works:

(jbuild_version 1)

(executable
  (
    (name dark)
    (public_name dark)
    (modes (byte))
    (libraries (core cohttp.lwt yojson threads postgresql))
    (preprocess (pps (ppx_jane)))
  )
)

That said, while I absolutely love constructors as functions, I’d be wary of using a ppx rewriter which subtly changes the semantics of the code in a very surprising way. Maybe it would be better to use a deriving plugin which generates functions for constructors and then has the regular OCaml semantics?

3 Likes

Unfortunately, you cannot define functions that begin with an uppercase letter (and thus, have the same syntax as constructors) in OCaml. If you’re happy with lowercase first-class constructors, ppx_variants_conv can derive them for you automatically.

Again, there is nothing deep here. Morally, constructors should be functions, and they are in more academic functional languages such as Standard ML or Haskell. OCaml is a more pragmatic language, and it’s willing to sacrifice strict mathematical aesthetics here and there if it allows for better performance / convenience. IMHO, that’s also what makes it a much more usable language than Standard ML and others, so I can live with it even though the lack of first class constructors is ugly.

EDIT: That said, I would be curious to hear what the rational behind this design choice was. I cannot believe it wasn’t thought to be useful – who wouldn’t appreciate being able to write List.map ~f:Some, for example?

1 Like

From what I remember of the explanation, it wasn’t even for performance. In fact, it existed in OCaml’s predecessor, caml-light. The feature simply didn’t seem useful enough to the designers to include it in OCaml, which in retrospect seems really odd. Not only that, it’s near-impossible to insert this feature in a backwards-compatible way, which means OCaml is most likely never going to have it.

Do you have any idea what changes to the compiler would be necessary to introduce to make constructors functions?

Xavier Leroy provided his view on constructors as functions in this old email thread:
http://caml.inria.fr/pub/ml-archives/caml-list/2001/08/47db53a4b42529708647c9e81183598b.en.html

6 Likes

Thanks for finding that. As I remembered, the reasoning was weak. I misremembered that caml-light had curried constructors, though.

1 Like

The problem is one of syntax. Since we currently require parentheses to accept multiple arguments, we can’t just switch to accepting curried arguments instead. It would have to be a switch like -safe-string, and many core devs aren’t satisfied with that transition yet, and wouldn’t want to start a new one. Also, trying to support both syntaxes at the same time (parentheses and currying) gets really hairy, especially since we allow dropping the parentheses for a single argument.

Seems to me that the issue is really with the definition of constructors. If the syntax were

type t =
  | C1 : int -> int -> t
  | C2 : int*int -> t  

then Xavier’s second point would disappear.

You’re intruding on GADT territory with this syntax, and I’m not experienced enough to comment on this. It bears remembering that constructors aren’t just construction functions – they’re also used to destruct and pattern match, and I’m not sure how pattern matching would work with your example. Also,

type t =
    | C1 of int * int
    | C2 of (int * int)

aren’t the same. The first is a constructor of 2 numbers, while the second is a constructor that takes a pair. It makes more sense to me to make the first one curriable in the language, but that would require a radical change.

The metatheory behind algebraic data types is well-studied in the programming language literature and well-understood. What I’m proposing here has nothing to do with GADTs, I’m just proposing a different syntax for algebraic datatypes. (By the way, it’s not my invention, it’s standard for example in the Coq proof assistant.)

That said, the same syntax can be used to define GADTs, too. That’s another reason why the syntax is nice: things are more uniform that way, and it becomes clear syntactically that algebraic data types are indeed just a subset of the generalized algebraic data types.

You’re example gets exactly to my point. The int * int in C1 of int * int is not a tuple type, but syntactically it looks like one. With the GADT syntax, the difference between C1 and C2 becomes obvious and explicit. And no one would expect C2 to be curried with that syntax.

3 Likes

FWIW there’s also ppx_curried_constr that aims to provide this feature.

EDIT: sorry to bump this thread, I didn’t notice the date.

1 Like