Are changes in the janestreet's ocaml fork to be upstreamed?

bobzhang · February 17, 2023, 7:51am

Hi, I did not follow the ocaml development closely for a while, but happened to find the repo ocaml-flambda/ocaml-jst: OCaml plus Jane Street extensions (github.com).

The changes seem to be quite a lot (including concrete syntaxes) and it may take a lot of time to be upstreamed.
Is it planned to be upstreamed or intended to work as a separate fork?

If so, what’s the main motivation behind this?
Note I don’t have opinions on this, but want to understand a little more about the technical assessment. Best – Hongbo

nojb · February 17, 2023, 8:57am

My understanding as this is primarily a separate fork containing extensions that they use internally at Jane Setreet. Of course, it may happen that some of these changes find their way to the upstream compiler, if these changes are submitted for consideration and go through the usual review process. But there is no explicit plan in this sense.

Cheers,
Nicolas

vlaviron · February 17, 2023, 9:16am

As far as I know, everything in there will be eventually upstreamed or dropped. Having a separate repo allows Jane Street to expermient seriously (and at scale) without bothering the upstream maintainers with early prototypes.
But given that upstream has limited time for reviewing, it will take a long time for all the features there to make their way upstream.
The first feature to be upstreamed is unboxed types; currently it’s in the RFC stage upstream (https://github.com/ocaml/RFCs/pull/34).

lukstafi · February 17, 2023, 10:01am

I took a look at the closed PRs and these language-level changes stood out to me:

unboxing
region-based stack allocation
immutable arrays
better support/guarantees for tail calls
polymorphic parameters (a bit like system F) – documentation

reisenberg · February 28, 2023, 3:18pm

Jane Street engineer here (back from some holiday last week).

We hope to upstream everything that exists in the ocaml-jst repo. Of course, that is contingent on the ideas we’re developing there being accepted by the community, and we expect to engage in this upstreaming process with care and openness to new ideas. I’ll also admit that we have built up a little upstreaming debt (too many features we have developed with too little upstreaming), but we expect to pay this down over the coming year. Bottom line here: it is understandable that there would be some questions about our commitment to upstreaming given what can be seen externally – but we really are committed to it, with regular internal conversations about how best to proceed.

Why have ocaml-jst instead of just work upstream? For two reasons:

Language design is hard. And we at Jane Street have a great opportunity to design new features, test them extensively in a realistic environment, and then change them. Because we have access to the entire code base where the features are deployed, we can even change concrete syntax relatively easily. So by developing internally, releasing internally, and then upstreaming with experience, we can be more confident that the feature design is correct.
We get a faster turnaround between idea conception and internal deployment. Working solely with upstream, we would develop an idea, go through a long design discussion with upstream, implement, merge, wait for release, wait for the rest of Jane Street to be ready to upgrade, and upgrade. Now, we can implement an idea in parallel with its design, rolling it out internally in stages (as appropriate), and then upstream later. This is a big win for us, and well worth the extra time spent moving changes back and forth.

I can’t confidently yet say this is the right way to go – we’ve reaped the benefits of the split (earlier internal delivery of features) and not paid all the costs (upstreaming after-the-fact). But the benefits are big, so I think this approach is likely to stay.

n4323 · March 1, 2023, 9:06am

This sounds reasonable, but is there a lingering danger of internally building up reliance on a feature that, after long upstreaming discussions, eventually gets rejected by the community?
What’s the contingency plan for such a case?

nojb · March 1, 2023, 1:36pm

Maintaining the feature out-of-tree indefinitely?

Incidentally, as a curiosity, at LexiFi some non-trivial patches to the compiler have been maintained out-of-tree for more than twenty years!

Cheers,
Nicolas

jbeckford · March 1, 2023, 3:45pm

It may be nice to have a jane-base-compiler in the opam repository (and perhaps a lexifi-base-compiler). I did the same for dkml-base-compiler at opam - dkml-base-compiler (weird; v3 link shows nothing) for cross-compiling, Android Arm32 support and other patches. Makes it easy for a broader set of people to test and perhaps contribute, although it requires some work with the opam team.

reisenberg · March 1, 2023, 8:43pm

That’s an interesting idea, @jbeckford. I’m still newish to the OCaml ecosystem and can’t yet gauge how hard or easy this would be. I’ll bring the idea up with my colleagues.

UnixJunkie · March 2, 2023, 12:32am

Is the region-based stack allocation thing a Gc replacement?
How does it fare in production?

lukstafi · March 2, 2023, 10:10am

That would not be possible in OCaml because lifetimes in general are indeterminate (and ref-counting is a form of Gc). ~~It is opt-in (only values with the right annotation / type).~~

vlaviron · March 2, 2023, 11:30am

Local allocations can be seen as a second GC, with static scope-based collection, and an invariant that the normal GC cannot point to the local GC.
Whether an allocation is local or not is determined during type-checking, and the type-checker is in charge of enforcing the relevant invariants (escape analysis and insertion of static collection). Annotations can force the type-checker to fail if a value cannot be allocated locally, and if no annotations are present the type-checker will use local types whenever it is safe to do so.

avsm · March 2, 2023, 11:49am

A key property of dkml-base-compiler is that it is strictly compatible with the upstream OCaml distribution, and hence is ok to be an opam package that satisfies the ocaml package dependency.

Other compiler variants (such as the Lexifi or Jane Street forks) would not be appropriate to fill in such a role, since they differ substantially from the upstream distribution. Nothing stops them from being in a package namespace that is clearly differentiated from ocaml of course, but there should never be any user confusion about whether or not they’re using an upstream compiler after doing an opam init.

dra27 · March 2, 2023, 12:34pm

Indeed - it’s “trivially” possible to set this up with a separate opam-repository (e.g. branches in the already existing janestreet/opam-repository and LexiFi/opam-repository!). The separation of the repositories ensures that users do not accidentally end up with a non-standard compiler. Using a different package name (as dkml-base-compiler does) is also good, as it makes it clearer when reporting problems that a different compiler is in use.

There are two things which could be usefully done in ocaml/opam-repository to aid such alternate repositories (done already for dkml-base-compiler and also for the pre-5.0 multicore ocaml compilers) in order to ensure that those alternate repositories only contain new packages and don’t have to override any existing ones.

As was done with for dkml-base-compiler in ocaml/opam-repository#22719 - updating ocaml/opam-repository’s ocaml package to support it means that the ocaml package itself doesn’t have to be included in the fork (and so definitely stays in sync), and it also reserves the package name (e.g. jane-base-compiler, lexifi-base-compiler, etc.).

Before multicore was merged, but crucially after the plan to merge it was hatched, new base packages were added to ocaml/opam-repository to ease the use of the archived ocaml-multicore/multicore-opam ocaml-variants packages and to allow packages to be marked in a coherent way in ocaml/opam-repository as either not working with multicore (e.g. because of naked pointers) or requiring multicore (e.g. eio, domainslib, etc.). A package which depends on jane-base-compiler or dkml-base-compiler needs to live in a separate repository (since no compiler in ocaml/opam-repository supports them). However, again, reserving the names of these compiler packages in ocaml/opam-repository would allow packages in ocaml/opam-repository which are known not to work with these forks to be marked as conflicting with them. That’s similarly useful as it stops the forked opam-repositories from overriding packages which already exist in ocaml/opam-repository. It does add a maintenance burden on that meta-data for new releases, but perhaps remembering to mark new versions of packages which conflict with forked compilers would be a maintenance task for the owners of the forked repositories.

UnixJunkie · March 3, 2023, 12:50am

It would be interesting to combine this with ocaml-5.
Unless variables are to be shared between threads, I don’t see why they should be under the control
of a parallel GC.
If the number of variables under control of the region-based allocator is maximized, I guess it could improve performance. Maybe both in sequential and in parallel programs.

Topic		Replies	Views
Jane Street, compiler development, and open-source Community	3	2113	November 17, 2022
Janestreet OCaml compiler extensions Ecosystem compiler	19	2002	August 10, 2024
OCaml compiler development newsletter, issue 6: March 2022 to September 2022 Community compiler-newsletter	13	3822	November 17, 2022
Making OCaml Safe for Performance Engineering (Jane Street Tech Talk) Ecosystem	13	1450	April 29, 2025
OCaml compiler design and development Learning compiler	18	4864	May 25, 2020

Are changes in the janestreet's ocaml fork to be upstreamed?

Related topics