OCaml compiler development newsletter, issue 6: March 2022 to September 2022

The Jane Street compilers team is hard at work on a number of fronts, though perhaps less visibly than in the past. Previously, we worked on a feature and got it merged upstream without releasing it internally. This meant that our colleagues would have to wait for a main trunk release of OCaml before getting the new feature, a turnaround time that was sometimes too slow. In order to decrease latency between feature conception and use, we have now changed to a new workflow where we develop and release features internally while we work to upstream.

Not only does this new workflow get features into the hands of our developers faster, it also allows us to improve the final product:

  • The upstreaming discussion can be informed by actual practice by the 700+ OCaml developers within Jane Street. We can, for example, examine developer adoption rates and performance impact with ease.
  • Because we can access the entire code base where a feature is deployed, we can correct design mistakes with low cost. For example, if a feature proves too confusing to use in practice, we can change its syntax. Indeed, Jane Street regularly makes broad updates to its code base, and this kind of change fits within our workflow. For main-trunk OCaml, this means that features are more fully tested than they could be otherwise.

Upstreaming – contributing back to an open-source project and working in a language that reaches beyond our walls – remains a core value for the compilers team. (Indeed, one of my explicit job responsibilities at Jane Street is to help facilitate this process.) In addition, all our compiler work is fully open-source.

That said, we have a number of features we’re excited to share progress on, listed below. Expect to see proper proposals for these (posted to the RFCs repo) in due course.

Local allocations, implemented by Stephen Dolan and Leo White. This adds support for stack-allocated blocks, enforcing memory safety by requiring that heap-allocated blocks never point to stack-allocated blocks (and stack-allocated blocks never point to shorter-lived stack-allocated blocks). Function arguments can be annotated as local_ to say that the function does not store that argument. For example,

val map : 'a list -> f:local_ ('a -> 'b) -> 'b list

says that map does not store the function it is passed anywhere in the output, allowing the function closure to be stack-allocated. Stack-allocated record, variants, etc., are also possible. See also Stephen Dolan’s talk at ML’22.

This is a large addition to OCaml’s type system, and we’re still learning about how to use it best. We are actively learning from its deployment within Jane Street to influence the final version of this feature for upstreaming.

include functor, implemented by Chris Casinghino. This syntactic extension allows a module to include the results of applying a functor to the prefix of the module already written. For example:

module type Indexed_collection = sig
  type 'a t
  val mapi : 'a t -> f:(int -> 'a -> 'b) -> 'b t
end

module Make_map(M : Indexed_collection) : sig
  val map : 'a M.t -> f:('a -> 'b) -> 'b M.t
end = struct
  let map t ~f = M.mapi t ~f:(fun _ -> f)
end

module List = struct
  type 'a t =
  | Nil
  | Cons of 'a * 'a t

  let mapi t ~f = ...

  include functor Make_map
end

let evens = List.(map (Cons (1, Cons (2, Cons (3, Nil)))) ~f:(fun x -> x*2))

Despite being “just” a syntactic convenience, this has already received wide uptake within Jane Street.

Comprehensions, implemented by Antal Spector-Zabusky. This adds support for both list and array comprehensions, such as

let pythagorean_triples_up_to n =
  [ a,b,c
    for a = 1 to n
    for b = a to n
    for c = b to n
    when a * a + b * b = c * c ]

The feature works similarly for arrays, with [| ... |] syntax.

Immutable arrays, implemented by Antal Spector-Zabusky. This adds support for immutable arrays. These are just like normal arrays, but immutability ensures that the contents of the array do not change.

Unboxed types, with a very early implementation by Chris Casinghino and proposal by me (with the help of my colleagues). Stephen Dolan and I have given talks about the design.

Currently, all variables and fields must store values that are stored in a single machine word (e.g. 64 bits); furthermore, this word must either be a pointer to garbage-collected memory or have its bottom bit tagged to denote that the garbage collector should skip it. Unboxed types relax this restriction, allowing a single variable or field to hold structures smaller or larger than a word, or store a word that the garbage collector will know not to scan. In so doing, unboxed types effectively allow records and variants to be inlined into one another, enabling programmers to structure their compile-time abstractions differently than their run-time memory layout.

The core innovation is the notion of a layout, which classifies a type (much like a type classifies a value). A layout describes how a value should be stored and whether it can be examined by the garbage collector. By assigning layouts to abstract types and type variables, we can abstract over non-word layouts. Much more is in the proposal.

Polymorphic parameters, implemented and documented by Leo White. This feature allows function parameters to have polymorphic types. For example:

let select
  : 'b 'c. selector:('a. ('a * 'a) -> 'a) -> ('b * 'b) list -> ('c * 'c) list -> ('b * 'c) list
  = fun ~selector bbs ccs ->
    List.map2 (fun bb cc -> selector bb, selector cc) bbs ccs

The select function chooses either the first components or the second components of the input lists to comprise the output list. The selector function must be polymorphic, because the elements in the pairs in the input lists may have different types. (Note that 'b and 'c are universally quantified in the type of select, so we know that the type checker does not unify them.)

25 Likes