Are changes in the janestreet's ocaml fork to be upstreamed?

Hi, I did not follow the ocaml development closely for a while, but happened to find the repo ocaml-flambda/ocaml-jst: OCaml plus Jane Street extensions (github.com).

The changes seem to be quite a lot (including concrete syntaxes) and it may take a lot of time to be upstreamed.
Is it planned to be upstreamed or intended to work as a separate fork?

If so, what’s the main motivation behind this?
Note I don’t have opinions on this, but want to understand a little more about the technical assessment. Best – Hongbo

3 Likes

My understanding as this is primarily a separate fork containing extensions that they use internally at Jane Setreet. Of course, it may happen that some of these changes find their way to the upstream compiler, if these changes are submitted for consideration and go through the usual review process. But there is no explicit plan in this sense.

Cheers,
Nicolas

As far as I know, everything in there will be eventually upstreamed or dropped. Having a separate repo allows Jane Street to expermient seriously (and at scale) without bothering the upstream maintainers with early prototypes.
But given that upstream has limited time for reviewing, it will take a long time for all the features there to make their way upstream.
The first feature to be upstreamed is unboxed types; currently it’s in the RFC stage upstream (https://github.com/ocaml/RFCs/pull/34).

3 Likes

I took a look at the closed PRs and these language-level changes stood out to me:

  • unboxing
  • region-based stack allocation
  • immutable arrays
  • better support/guarantees for tail calls
  • polymorphic parameters (a bit like system F) – documentation
3 Likes

Jane Street engineer here (back from some holiday last week).

We hope to upstream everything that exists in the ocaml-jst repo. Of course, that is contingent on the ideas we’re developing there being accepted by the community, and we expect to engage in this upstreaming process with care and openness to new ideas. I’ll also admit that we have built up a little upstreaming debt (too many features we have developed with too little upstreaming), but we expect to pay this down over the coming year. Bottom line here: it is understandable that there would be some questions about our commitment to upstreaming given what can be seen externally – but we really are committed to it, with regular internal conversations about how best to proceed.

Why have ocaml-jst instead of just work upstream? For two reasons:

  1. Language design is hard. And we at Jane Street have a great opportunity to design new features, test them extensively in a realistic environment, and then change them. Because we have access to the entire code base where the features are deployed, we can even change concrete syntax relatively easily. So by developing internally, releasing internally, and then upstreaming with experience, we can be more confident that the feature design is correct.
  2. We get a faster turnaround between idea conception and internal deployment. Working solely with upstream, we would develop an idea, go through a long design discussion with upstream, implement, merge, wait for release, wait for the rest of Jane Street to be ready to upgrade, and upgrade. Now, we can implement an idea in parallel with its design, rolling it out internally in stages (as appropriate), and then upstream later. This is a big win for us, and well worth the extra time spent moving changes back and forth.

I can’t confidently yet say this is the right way to go – we’ve reaped the benefits of the split (earlier internal delivery of features) and not paid all the costs (upstreaming after-the-fact). But the benefits are big, so I think this approach is likely to stay.

12 Likes

This sounds reasonable, but is there a lingering danger of internally building up reliance on a feature that, after long upstreaming discussions, eventually gets rejected by the community?
What’s the contingency plan for such a case?

Maintaining the feature out-of-tree indefinitely?

Incidentally, as a curiosity, at LexiFi some non-trivial patches to the compiler have been maintained out-of-tree for more than twenty years!

Cheers,
Nicolas

3 Likes

It may be nice to have a jane-base-compiler in the opam repository (and perhaps a lexifi-base-compiler). I did the same for dkml-base-compiler at opam - dkml-base-compiler (weird; v3 link shows nothing) for cross-compiling, Android Arm32 support and other patches. Makes it easy for a broader set of people to test and perhaps contribute, although it requires some work with the opam team.

1 Like

That’s an interesting idea, @jbeckford. I’m still newish to the OCaml ecosystem and can’t yet gauge how hard or easy this would be. I’ll bring the idea up with my colleagues.

3 Likes

Is the region-based stack allocation thing a Gc replacement?
How does it fare in production?

That would not be possible in OCaml because lifetimes in general are indeterminate (and ref-counting is a form of Gc). It is opt-in (only values with the right annotation / type).

Local allocations can be seen as a second GC, with static scope-based collection, and an invariant that the normal GC cannot point to the local GC.
Whether an allocation is local or not is determined during type-checking, and the type-checker is in charge of enforcing the relevant invariants (escape analysis and insertion of static collection). Annotations can force the type-checker to fail if a value cannot be allocated locally, and if no annotations are present the type-checker will use local types whenever it is safe to do so.

A key property of dkml-base-compiler is that it is strictly compatible with the upstream OCaml distribution, and hence is ok to be an opam package that satisfies the ocaml package dependency.

Other compiler variants (such as the Lexifi or Jane Street forks) would not be appropriate to fill in such a role, since they differ substantially from the upstream distribution. Nothing stops them from being in a package namespace that is clearly differentiated from ocaml of course, but there should never be any user confusion about whether or not they’re using an upstream compiler after doing an opam init.

Indeed - it’s “trivially” possible to set this up with a separate opam-repository (e.g. branches in the already existing janestreet/opam-repository and LexiFi/opam-repository!). The separation of the repositories ensures that users do not accidentally end up with a non-standard compiler. Using a different package name (as dkml-base-compiler does) is also good, as it makes it clearer when reporting problems that a different compiler is in use.

There are two things which could be usefully done in ocaml/opam-repository to aid such alternate repositories (done already for dkml-base-compiler and also for the pre-5.0 multicore ocaml compilers) in order to ensure that those alternate repositories only contain new packages and don’t have to override any existing ones.

As was done with for dkml-base-compiler in ocaml/opam-repository#22719 - updating ocaml/opam-repository’s ocaml package to support it means that the ocaml package itself doesn’t have to be included in the fork (and so definitely stays in sync), and it also reserves the package name (e.g. jane-base-compiler, lexifi-base-compiler, etc.).

Before multicore was merged, but crucially after the plan to merge it was hatched, new base packages were added to ocaml/opam-repository to ease the use of the archived ocaml-multicore/multicore-opam ocaml-variants packages and to allow packages to be marked in a coherent way in ocaml/opam-repository as either not working with multicore (e.g. because of naked pointers) or requiring multicore (e.g. eio, domainslib, etc.). A package which depends on jane-base-compiler or dkml-base-compiler needs to live in a separate repository (since no compiler in ocaml/opam-repository supports them). However, again, reserving the names of these compiler packages in ocaml/opam-repository would allow packages in ocaml/opam-repository which are known not to work with these forks to be marked as conflicting with them. That’s similarly useful as it stops the forked opam-repositories from overriding packages which already exist in ocaml/opam-repository. It does add a maintenance burden on that meta-data for new releases, but perhaps remembering to mark new versions of packages which conflict with forked compilers would be a maintenance task for the owners of the forked repositories.

1 Like

It would be interesting to combine this with ocaml-5.
Unless variables are to be shared between threads, I don’t see why they should be under the control
of a parallel GC.
If the number of variables under control of the region-based allocator is maximized, I guess it could improve performance. Maybe both in sequential and in parallel programs.