Janestreet OCaml compiler extensions

I’ve noticed some blog posts and some youtube videos on the internet regarding Janestreet OCaml compiler extensions like modal types etc. Some of these are already merged into their internal live branches I read somewhere.

But its hard to know what the latest news on this is:

I have a couple of questions:

  • Are these janestreet compiler extensions built on top of 4.14 or 5.x ?
  • What about dune support or merlin/ocaml-lsp support – How do engineers at Janestreet use the compiler with these extensions ?

Thanks !

4 Likes

This fork of the opam repository contains their implementation of the compiler, which extends 4.14. (However their work on Flambda2 is on 5.1 currently.)

I think you should expect that some things are broken out of the box. For me:

  1. base installs okay, dune installs okay (get dune version 3.10)
  2. core installs if you get version 0.30.0 of ppx_lib
  3. merlin-lib breaks due to reasons mostly related to implicit dependencies that ocamlopt apparently knows how to add automatically but the jane street fork does not (I made a PR for this last night)

Skim through this issue thread, one person went through the work of forking their repository and making an opam repository with working versions of all the key development packages.

Once you get anything installed at all, you can follow along with this youtube video series https://www.youtube.com/watch?v=PgUsmO0YyQc

I went through several of the videos yesterday and I hollered out loud when I saw the allocation count for my code drop from 177k allocations to 2k allocations. Very exciting

Although it took like 15 minutes of tinkering with annotations to get it to that level of performance :smile:

6 Likes

You asked for recent news and the last git commit to that repository is from 5 months ago. However they’re obviously still working on it as the last Youtube tutorial is only 2 weeks old.

2 Likes

The repository for the compiler that Jane Street develops and uses is public: https://github.com/ocaml-flambda/flambda-backend.
It is currently based on 5.1, with the possibility to use a 4.x runtime (their 5.1 runtime also lacks some of the multicore features; I believe at least effects are not supported, I’m not sure about multiple domains). It is being moved to 5.2.

Compatibility is mixed. Dune should work out of the box, but tools that depend on compiler libraries (typically merlin or ppx) need to be patched, and while Jane Street has written such patches for the tools they use they might not have all been published (and the ones that have been published may be out of date).

4 Likes

Thank you — very useful answers.

It seems to me there are hard compromises to be made if you want to use modal types today.

Ideally speaking there should be a nightly compiler and opam repository that allows you to use the whole stable OCaml 5.2 ecosystem without any trade offs — effects, domains, LSP support should all be available to the user in addition to these new Jane street extensions.

I wanted to use these features… but I’ll probably have to wait. At the speed at which things like this evolve I’m not optimistic. Incorporation into main line is probably multiple years into the future anyways.

Which is a pity because I’ve been hoping to use a mainstream, mature GC language with linear types/similar stuff (features janestreet seems to be working on — modal type are just one of the many related things they seem to be planning).

My use case is systems programming. Linear types are very useful here. It would be great to have linear type with convenience of GC.

3 Likes

what’s the issue with more recent dune releases? can you file an issue about this?

1 Like

I have experimented with Koka, which is not really mature or ready for production use but you may be interested in checking it out for systems programming. The main pitch is that it uses reference counting rather than garbage collection. It does not have a runtime, it transcompiles to C extended with reference counts for each variable and it deallocates the variable once the reference count drops to zero. They also replace malloc with a custom allocator which is more suitable for the many small allocations incurred in functional programming. Their benchmarks show good performance compared to OCaml on red black trees and similar functional data structures.

(Another major selling point is that the language is heavily based on effects but I don’t know enough about effect systems to comment.)

Koka opted not to use a linear type system because they were concerned about code duplication from having linear and nonlinear versions of key functions. You can label a function as linear and the compiler will check that it’s possible to use the function in a linear way but at the point of use it may behave in a linear or non-linear way depending on a dynamic reference count check that decides whether it is safe to overwrite the variable. Arrays are the same way, an imperative update is in-place if there is no array aliasing (as determined by a dynamic reference count check) and otherwise incurs an array copy. They call this immutability for reasons that are not clear to me. I would call it a no aliasing guarantee or something.

Numerical / array programming is a weak point that is really not there but it seems potentially usable for system programming tasks.

3 Likes

Etienne,

I honestly did not try a more recent version of dune. I just followed the instructions on their page which specify dune 3.10 (see below). You are already creating a new opam switch with a new OCaml compiler so unless you rely on new dune features you will not have a conflict with an existing dune install, I think.

# This may take some time
opam switch create 4.14.1-jst --repos janestreet-bleeding-with-extensions,default --packages ocaml-base-compiler=4.14.1-18,dune=3.10.0
eval $(opam env --switch 4.14.1-jst)
2 Likes

no problem, just wanted to let you know that I wasn’t aware of an issue there.

1 Like

Hi! To add to the existing answers in this thread:

  • The with-extensions repository uses an old version of our compiler, lacking support for features we’ve added since.
  • Our current compiler at flambda-backend uses OCaml 5.1 with effects and domains disabled.
  • We open-source our patches to Merlin and OCamlformat on our GitHub. However, we don’t have an easy installation mechanism for them yet.

We are working on updates to the with-extension repository together with tooling and necessary patches to external packages. We aim to start regular pushes using our latest compiler by September :slight_smile:

7 Likes

Liquidsoap also does a lot of very short term memory allocations that push the OCaml GC’s to its limit. The local keyword seems very, very interesting but I think it would need to be upstreamed for us to consider it.

3 Likes

Hi @dkalinichenko ! I just compiled the flambda-backend but it does not seem to include local. Is that extension still actively developed? Any plan to upstream it? Thanks!

1 Like

My bad! The syntax has changed. Looking at the doc now!

Reducing the allocation count by 98,8% sounds impressive, but we should be cautious in interpreting the results. This is replacing 175k short-lived minor allocations with 175k stack allocations, and there are a lot of similarities between the two allocation methods. From the paper (which explains it well), it seems that the benefits are in terms of second-order cache locality effects. In the paper, the locality annotations do not make the functions faster by themselves; the speedup is measured when the computation is mixed with other things. “Avoiding allocations” is probably a bad mental model for understanding the effect of stack allocation.

Also, by reducing the amount of short-lived minor allocations so drastically, this probably moves the GC performance on the throughput vs. latency space. A relevant comparison should control for this effect using GC parameters.

(I wonder if the Liquidsoap use-case is that of large buffers one wants to exclude from GC, in which case this is in a different category altogether.)

4 Likes

Yes, they are both “allocations” so in that sense they are same but stack “allocations” to me are fundamentally different…

Sorry have not read the paper – wouldn’t you agree that stack allocations are much cheaper than OCaml GC based allocations simply because essentially its just a pointer that is bumped for allocation and subtracted for deallocation ? The stack is only properly allocated in the beginning of the program to whatever initial size. Only when the stack’s initial size is exceeded might there be some sort of “allocation” in the true operating system sense (maybe a gaurd page was touched – leading to an additional page to be allocated to the stack and so on).

The traditional wisdom is that the minor heap is cheaper not more expensive than stack allocation for short-lived allocations, because for the minor heap, allocation is also just bumping a pointer, whereas deallocation is free (the cost of doing a minor GC is proportional to the amount of live values). But of course the costs shifted from instructions to memory accesses over the years…

(Note: I expect your stack allocations to be coalesced with the allocations of the frame, so they are free, except when losing tail recursion. But again the memory accesses are what matter.)

2 Likes

I think the key factor is GC pressure. Many short-lived allocations mean that the minor heap fills up more quickly, which causes the heuristic to promote objects to the major heap to fire sooner. This allocation is expensive, both in terms of computational complexity and cache locality. It then also causes the next major GC to happen sooner.

2 Likes

I recommend controlling for these factors by adjusting GC parameters. Otherwise we do not know if the measured benefit is that of stack allocation or of changing the GC behaviour in a way that could be obtained more easily by changing GC params. Using a 256MB minor heap for throughput is not unheard of (cf. the Coq proof assistant).

I do believe in the benefits of stack allocation, but what I am saying is that one must be careful about what we measure.

3 Likes

I don’t think this is a great solution. If this was the solution to many short-lived allocations, we would have huge minor arenas be common, and they’re not. First of all memory consumption would be out of control, especially once you’re using OCaml 5.0+ and each domain needs its own minor heap. Secondly, once you do hit the minor heap limit, you’re going to get a very long pause as there’s a LOT of garbage to go through, all of which is not cache local. Thirdly, your general cache locality may be impacted quite negatively, as you’re not reusing the same small area of memory but instead moving on to other regions. You can think of it as having turned the minor heap into a form of a stack, but one that just grows until it fills up, and without any of the additional knowledge that makes stacks efficient.

Basically, a stack is the best allocation scheme we know of, as it completely mirrors execution flow, allows for gradual allocation and deallocation, and even tends to be very good for cache locality. It’s not a coincidence that the fastest languages (Rust, C, C++) are also the ones that emphasize stack allocation.

1 Like

It’s a bit of both.

The program operates with short streaming cycles, says 0.02s. During each cycles, short-term values are produced. Most of them are meant to be disposed of pretty quickly but others may end up being buffered for longer terms.

Also, the nature of the data may vary. It should be short chunks of audio PCM, currently represented as OCaml float arrays (more convenient for audio data computations) or large big arrays/C memory allocations such as video frames.

I’d love to get more insight on how to handle this.

At the moment, I am considering a ref-counting API at least for our large big arrays so that we:

  • Declare their memory as managed externally.
  • Declare their allocated memory size to be 0 so as to not increase the GC speed
  • Dispose of the C memory when we know that we’re done with them and let the GC cleanup the remaining empty shell whenever appropriate.

It seems that this would work but is tricky to track because of the immutable nature of value and the multiple cases where we re-use/re-package them.