The answer is probably: "no, because of value restriction"

I’m translating some Haskell code into OCaml and I think I’m bumping into the value restriction. Here’s a simplified case that’s bugging me:

type ('a, 'b) t = C of ('a -> 'b)
let id (* : ('a, 'a) t *)= C (fun x -> x)
let make f = C f
let weak_id (* : ('_weak1, '_weak1) t *) = make (fun x -> x)

By using the type constructor, I can build polymorphic values, but if I define the same value through a function it has weak type variables. I’m just asking if there’s some way to recover polymorphism when using functions in this case, to avoid copy-pasting too much code…

The answer to the value restriction is usually to eta-expand, in this case though you actually want to capture the closure inside the C data constructor so the only option left is to make weak_id a function:

let weak_id () = make (fun x -> x)
(* val weak_id : unit -> ('a, 'a) t *)

Note that it’s not super idiomatic in OCaml to use single-constructor variants like this; typically we’d just alias the type itself, which gets rid of a lot of incidental complexity. If you need to make the type abstract, you can use a module:

module M : sig
  type (-'a, +'b) t
  val make : ('a -> 'b) -> unit -> ('a, 'b) t
end = struct
  type ('a, 'b) t = 'a -> 'b
  let make f () = f
end

Thanks @yawaramin for the suggestions! I wanted to avoid the first one to maximize sharing of values in memory (I’ve simplified the code too much for that aspect to appear). As for the second suggestion, I am not sure how it solves the problem:

# let x = M.make (fun x -> x);;
val x : ('_weak1, '_weak1) M.t = <abstr>

I’m starting to think I might have to give up the idea of a direct translation of the original Haskell code into OCaml…

Sorry, you’re right, I’ve updated my suggested code. But it is using the () trick to turn it into a function.

If you want to preserve sharing, you can do something along these lines:

type ('a, 'b) t = C of ('a -> 'b)

type endo = { f : 'a . 'a -> 'a }
type homo_t = { x : 'a . ('a, 'a) t }

let make ({ f } : endo) : homo_t = { x = C f }

let id_record = make { f = fun x -> x }

let id (* : 'a . ('a, 'a) t *) = id_record.x

It is not possible to generalize make (fun x -> x) right now in OCaml, because the type checker is not tracking purity of expressions, i.e., it doesn’t know that make f is C f and assumes that make is a arbitrary function of type ('a -> 'b) -> ('a,'b) t, which, in particular, could be a partial application, that hides a reference to the ('a -> 'b) parameter inside, so that ('a -> 'b) type could escape the scope of the weak_id definition, therefore it must not be generalized.

We can see, however, that let id = C (fun x -> x) is generalized. This is because OCaml knows that constructor applications are pure expressions. The same is true for let id = {f = fun x -> x} provided that we have ('a,'b) t = { f : 'a -> 'b } type.

Therefore, right now the only way to prove to the type checker that your expression is pure, is to use a constructor. You can use the private keyword if you want more abstractness and control on the module invariants, e.g.,

module M : sig 
   type ('a,'b) t = private {f : 'a -> 'b }
end = struct 
   type ('a,'b) = {f : 'a -> 'b}
end

and expect the users of your interface to say M.{f=x} instead of M.make x to retain the purity.

With that said, the work on tracking the purity is ongoing and there is an RFC PR that more or less tracks the work. It is quite problematic, though, to retrofit this feature into the existing language.

I would argue with that. At least in the code that I am writing, it is idiomatic as type alias is not a type definition and I prefer to give types to everything. An immediate bonus is that it helps the type checker give you precise types and provide error messages relative to the problem location, without requiring any type annotations. I, however, prefer to use records for such cases, mostly for syntactic convenience.

1 Like

That’s an argument that I can understand but, in this case, do you use [@unboxed] annotation to avoid indirection? I have a vague memory of an article from @gasche (maybe after a Marrakech retreat) where he explained that in some case this annotation can make the code slower, which seems a litlle bit odd. Do you have some experience with this ?

2 Likes

It’s about angstrom and this type definition:

type 'a t =
  { run : 'r. ('r failure -> ('a, 'r) success -> 'r State.t) with_state }

It seems that a use of [@@unboxed] slows down the process (in this case, the parsing) due to the unnecessary allocation of closures. I found only this issue but I did not find materials about that.

2 Likes

Yes, I do use [@@unboxed] and yes, when you use it on a type definition that is a function, it will indeed be a function so

type ('a,'b) fn = Fun of ('a -> 'b) [@@unboxed]

is only different from

type ('a,'b) fn = 'a -> 'b

from the perspective of the type checker, and indeed, when you have a definition

let add x = Fun (fun y -> x + y)

then the compiler will interpret it as

let add x = fun y -> x + y

which is the same as

let add x y = x + y

which will be later transformed by the compiler into an uncurried form,

(def add (lambda (x,y) (+ x y)))

And as a result all calls to add x will create a closure, the same as with the vanilla definition of let add x y = x + y. Which is perfectly fine and reasonable and should be expected, unless you hit corner cases such as demonstrated in angstrom, where compiler’s uncurrying led to functions of so big arity that the number of arguments became larger than the number of available registers, therefore the compiler had to fall back to spilling arguments to the stack, which resulted in a severe loss of performance. To solve this problem is is possible to apply staging, e.g., this trick will disable uncurrying1:

let add x = (); Fun (fun y -> x + y)

1) Well, no guarantees applied, as at some point the compiler may decide to remove (); as a pure no-op. But this is used largely in OCaml code base and for that particular reason.

5 Likes

It is if it’s abstract, right? Compiler enforces it as a separate type.

From the perspective of the module user, yes. From the perspective of the module implementer, no. So I am making types like type name = {name : string} for my own sake, that makes the code more readable and types more descriptive, e.g., imagine

let output_name {name} = print_endline name 

which has type

name -> unit

instead of more generic,

string -> int

that can easily hide bugs in the code.

Indeed this was the type in question. The explanation in the issue’s comment, and the @ivg one above, make sense. I was surprised at this time because I didn’t think that there could be such corner case with unboxing.

By the way, I found the blog post from @gasche, and I’ll quote him:

There is a dirty secret about [@@unboxed] : despite what most people think, it is extremely rare that it can make programs noticeably faster, because the GC is quite fast and combines allocations together – many allocations of a boxed object are combined with an allocation for their content or container, and and often the indirection points to an immediately adjacent place in memory so is basically free to dereference. It may help in some extreme low-latency scenario where code is written to not allocate at all, but I have never personally seen a program where [@@unboxed] makes a noticeable performance difference.

That is, until angstrom . Adding [@@unboxed] to this record field makes the program noticeably slower.

Morality of history : should we use [@@unboxed]?

2 Likes

Of course we should, again angstrom is really a specific corner case of a very specialized library. And while I agree with the message in general, I was using type ('a,'b) fn = Fun of ('a -> 'b) long before [@@unboxed] hit the mainstream OCaml without fearing that it will slow down my programs, well except the case when the constructor parameter was immediate, as without relying on unboxing types like type nat = Nat of int will incur a lot of performance loss.

Using [@@unboxed] maybe won’t make your program super fast but it will in general improve its health. It will produce less junk, thus the minor heap will fill in slower and there will be fewer major cycles. The only argument against the [@@unboxed] directive is that it is a directive and a noise about the inner implementation detail, it disturbs attention of a reader and a pretty low-level thing. So, instead we should use -unboxed-types command line option of the compiler and use the [@@boxed] attribute when low-level details matter.

And just to make it clear, in angstrom it reduced the memory footprint, yes the resulting program was slower, but just because of a switch from a chain of closures to a one big function that was slow. But this problem is very specific to the angstrom design and shall not be generalized.

4 Likes