Next priority for OCaml?

I want to support non-popular desire for LLVM backend. Another implementation of compiler, which compiles slower, but gives more performance (maybe via full project optimization) would be reasonable. At least, nobody will say that OCaml is N times slower than JVM or Chez Scheme.

3 Likes

I would be hard pressed to mention one priority for OCaml, because I think that many things are important – and important in different ways for different people. I thought that I could give some information on what support I have been involved in, either as a compiler maintainer or as a member of the OCaml Software Foundation, on some of the topics that have been mentioned here.

  • Modular implicits: we’ve been trying to find people we could fund to make progress on this, but it is difficult – it requires a lot of type-system expertise, and time. We are funding internships with @yallop in Cambridge to experiment with the system, and I am thinking of getting my hands dirty during the next school year with some module-side implementation work – possibly diving into the PRs of Matthew Ryan.

  • Type system for effects: I have decided to setup an informal working group for people interested in working on this (mostly academics, also Jane Street) to exchange information, and in particular to decide a common set of evaluation criteria – with compatibility with existing non-effect-using OCaml code as a top priority. I hope to announce something more structured next September, with the aim of getting a clearer idea of potential design proposals by the end of the academic years.

  • Debuggers: the OCaml Foundation funded ocamlearlybird’s author in 2020-2021, hackwaly, before they stopped to work full-time for a startup. ( [ANN] ocamlearlybird now an OCaml Software Foundation supported project , [ANN] ocamlearlybird 1.0.0 beta1 , etc. ) I learned a week ago that @sim642 is interested in continuing maintenance of the project, and we are looking at funding his work.

    (ocamlearlybird is a bytecode debugger, similar to the venerable ocamldebug – it is a client for the same debugger protocol. Separately we would ideally have good debugging support with the native compiler. That aspect of the debugging story has been worked on by Mark Shinwell in the past, but there is a disagreement among compiler maintainers about what is the right technical approach to support DWARF information in the compiler – it is a complex format that is not necessarily a good match for OCaml, and adding it to the compiler backend is fairly invasive.)

  • A priority that people don’t know about, but in my mind comes before “typed effects”, is to refactor and cleanup the implementation of the OCaml type checker, which has historically been a place of technical debt in the compiler. There has been a fair amount of work on this in the last year, driven by Jacques Garrigue as his group at Nagoya ( in particular Takafumi Saikawa ), with Florian Angeletti and myself on the reviewer side, and in the last few months some very welcome help from the Jane Street group working on language features (in particular Richard Eisenberg, Chris Casinghino, Nick Roberts) who have been lending a hand on refactoring and refactoring-reviewing work.

    More generally, I have been worried this year about maintenance workforce for the OCaml compiler codebase ( Maintenance bottlenecks in the compiler distribution ). I think that solving this issue is also a priority, at the level of the compiler distribution. We have made some progress, with notably maintenance contributions from OCamlPro and Jane Street, but I think that the situation still requires careful monitoring.

  • Relocatability: I agree with @smorimoto that this is important. I would want all OCaml packages to be relocatable, notably as the right first step to enable caching or even binary distribution of OCaml package build artifacts. There has been work in this direction from David Allsopp in the last few years, which I understand gets closer every year to being in a state that could be submitted as upstream PRs. My strategy so far is to wait for this to make progress.

  • Tooling, things that directly come to mind are:

    • development-environment tools whose recommended workflow is to have separate/isolated development environment(s) for each project. The opam client allows this with local switches, but I think the UI could be streamlined a bit to make this the easy default. (For example, not everyone knows how to build a local switch with only the dependencies of the project installed, to start hacking on it.) Ideally we would have caching of package builds across those development environments, and maybe even some distributed caching of build artifacts (assuming relocatability).
    • a “code upgrade” tool to which I can specify program transformations (for example: “rewrite SomeLib.somefun $bar $baz into SomeLib.somefun $bar (Somelib.convert $baz)”, and it applies it to my source code in a diff-friendly way.
22 Likes

re:debugging I’ll chime in shallowly, but I think that looking forward the kind of debugging that OCaml will need could look a lot more like Erlang tracing capabilities. In short you define

  • a patterns to look for (for the kind of functions and inputs you are looking for)
  • a range (every Erlang process, a specific process, etc)
  • whether to see the returned value

You then start the tracer and run your program, and it prints out all the matches your program makes. This is pretty powerful since you can essentially reconstruct one entire execution if your pattern is general enough.

Super useful if you can run this from utop to see in real-time what’s being matched.

And of course, invaluable if it can be done in a restricted manner on running production code.

4 Likes

I just recalled sentences in Erlang in Anger: Stuff Goes Bad:

Forget your debuggers, their use is too limited. Tracing makes sense in Erlang at all
steps of your system’s life cycle, whether it’s for development or for diagnosing a running
production system.

2 Likes

Two thoughts:

  1. Holy cow, yes!

  2. I don’t know that this is a priority for OCaml itself, b/c tracing of this sort – pervasive tracing – is something for developers to insert in their code. But the sort of tracing … well, that’s something that really matters. I have firm beliefs that the right kind of tracing is the sort that is least costly to not use. That is to say, you put the tracing lines in, they’re actually compiled-in so you can turn them on with a runtime directive, but they’re not enabled.

Specifically, Google’s glog has this sort of tracing. Also, I think that for the development of transactional applications, the sort of tracing that Google’s Dapper uses (where a tracepoint is conditionally enabled, and only fires if a “trace-enabled transaction” executes thru that line) is extremely valuable.

As a note, Trace requires the user
to insert tracing in their code, but costs roughly an atomic load +
comparison to None in the case where it’s not used.

4 Likes

Persistent workers.
Java persistent workers for Bazel

A bazel persistent worker for rust

2 Likes

I was just recently, in OCaml, writing a code generator that spits out both OCaml code & Rust code. One stark contrast is the helpfulness of the compiler error msgs.

OCaml: Error on line X, col Y. Here is line X, col Y underlined.

Rust; Error on Line X, col Y. Here is line X, col Y underlined. I was expecting Foo, but got Bar. The following one line changes would fix the error:

  1. helpful, context sensitive, suggestion 1
  2. helpful, context sensitive, suggestion 2
    etc …
3 Likes

Our code very often interoperates with other systems with which it communicates over unreliable protocols with very weak contracts (the file system and text-based messages from other programs, or quirky behavior in custom programs of very respectable age). I have to test the code in a real-life setting to see what actually happens when it runs - what are the values that come through, that are set, etc. The way I expect it to work is very rarely how it actually works.

To be completely fair, when I worked with OCaml for 2+ years 2018-2020, I obviously did not debug anything using a debugger. I used logging/tracing a lot, which is also a form of debugging.

3 Likes

My impression of chez when I tried it was that it produced code which tends to be the fastest among schemes, but which was on a wholly different speed category from ocamlopt. I could’ve just been using it wrong though…

Take for example this issue: benchmark improvement advice · Issue #248 · cisco/ChezScheme · GitHub
After being optimized by scheme gurus, it was ~2 times slower than go code, which in turn is ~1.5 times slower than the equivalent OCaml code.

I just remembered something I’d love to see pick up activity again: Transparent Ascription. It’s a high quality low disruption addition to the language that’d make dealing with modules and interfaces more ergonomic.

1 Like

Distributed OCaml should be the next priority.

2 Likes

Would you mind providing a reference that Go code tends to be ~1.5 slower than the equivalent OCaml code? That’s news to me (the resource I usually check if I want to see language benchmarks is this) and I’d like to follow up on your claim.

(I haven’t looked at the source code for the benchmarks on that Vercel site, so they might not be “equivalent”.)

As always with these benchmark games… you have to be mindful what you are really comparing. Are the Go and OCaml codes doing equivalent things? Are they equally optimized? Are they idiomatic in their respective languages? Are they written by someone actually speaking the language?

For instance, the very first benchmark in your link, binary trees, uses a terrible OCaml code: it implements binary trees with objects and options, so you pay for dynamic resolution and you have twice the amount of boxing you’d have with an idiomatic ADT implementation. This code was possibly written by someone who is not very familiar with OCaml and who translated OO habits. The Go version has neither of these costs and actually does what the idiomatic OCaml version would do!

The LRU benchmark uses four times as much boxing as necessary (it implements a doubly linked list where all pointers are options, and all values are boxed pairs).

The Fannkuch redux benchmark misses opportunities to use Array.blit.


Since I’m writing in this thread, I feel entitled to answer the initial question:

  1. I do believe a typeclass-like mechanism would be the deal breaker, as it would make the language significantly more convenient, and in particular more appealing to beginners (you can print values!) or when compared to other languages. Tangentially, it would partially address the debugging story by making it easier to just print stuff.
  2. I’m eager of meta-programming, not only for optimization reasons, but also because you wouldn’t have to dive into the tooling hell to do simple pre-processing if it was in the language itself.
  3. (Algebraic effects are a nice feature to have but, practically speaking I wouldn’t have much use for them, beyond reversing an iterator to a generator (which could already be done with threads). On the other hand, paranoid people like myself would love to have exceptions be recorded in the type system.)
  4. On the low-level optimization front, along with unboxing and stack allocation, I’d love to have some form of lifetime analysis and opportunistic reusing of blocks, perhaps as a transparent optimization pass, but at that point I might use Rust instead…
6 Likes

the code is provided directly within the issue I linked,
and the ocaml port was meant to be as 1:1 to that as possible:

(* unnecessary record but we're sticking to what Go is doing *)
type matrix = {
  m : float array array;
  r : int;
  c : int;
}

let newmatrix ?(randomized = false) r c = { r; c; m =
  let arr = Array.make_matrix r c 0. in
  if randomized then
    for r = 0 to r - 1 do
    for c = 0 to c - 1 do
      arr.(r).(c) <- Random.float 100. /. (2.0 +. Random.float 100.)
    done done;
  arr
}

let mult m1 m2 =
  if m1.c <> m2.r then invalid_arg "incompatible matrices";
  let r = newmatrix m1.r m2.c in
  for i = 0 to m1.r - 1 do
  for k = 0 to m2.r - 1 do
  for j = 0 to m2.c - 1 do
      r.m.(i).(j) <- r.m.(i).(j) +.  m1.m.(i).(k) *.  m2.m.(k).(j)
  done done done;
  r

(* unused in benchmark, left here anyway *)
let smult s m =
  let r = newmatrix m.r m.c in
  for i = 0 to m.r - 1 do
  for j = 0 to m.c - 1 do
      r.m.(i).(j) <- s *. r.m.(i).(j)
  done done;
  r

let sanity () =
  mult
    { r = 2; c = 2; m = [| [|1.; 2.|]; [|3.; 4.|] |] }
    { r = 2; c = 2; m = [| [|5.; 6.|]; [|7.; 8.|] |] }
  =
    { r = 2; c = 2; m = [| [|19.; 22.|]; [|43.; 50.|] |] }

let bench () =
  let sz = 500 in
  let _ = assert (sanity()) in
  for _ = 0 to 10 do
    let a = newmatrix sz sz ~randomized:true
    and b = newmatrix sz sz ~randomized:true
    in
    let t = Sys.time() in
    let _ = mult a b in
    let t = Sys.time() -. t in
    Printf.printf "took %.0f ms\n%!" (t *. 1000.)
  done
1 Like

My understanding is that modular implicits is a long-term research project and not something to expect in the near- to mid- term. As convenient as it would be for certain kinds of work, I think it might be helpful to brainstorm ways to improve on the existing boilerplate instead of waiting for an uncertain feature.

Similarly, my impression is that unboxed types and stack allocation with modes are still experimental and not yet at the point where there’s a plan to merge them into the main compiler. (But I’d be happy to be proven wrong.)

I think typed effects are are more mid-term with a more concrete plan.

I’m personally looking at the roadmap and looking forward to having better integration for formal verification tools. Being able to do what you can do in whyML or Dafny but in an actual language without an extraction step is going to be interesting.

As for debuggers, I’m not a heavy user of them, but the impression I get from the critics is that people are using them manually to step through code live instead of by programming them to do things for you.

I’ll also point out that this has overlap with “need polymorphic print”. You could always put a breakpoint, look at memory, and then use debug info to print out some kind of representation. And people do use debuggers on parallel / concurrent C++ and Rust. Ultimately, I think being able to do this with the output from the optimizing compiler is important.

But ultimately, it’s a matter of resources and priorities. Ideally, we’d have all the tools and such needed to be a full replacement for C++ at most companies. But that’s a long way off and we don’t have nearly as many devs working on these problems.

6 Likes

Oh yes, there is that too.

I program in an increasingly defensive style, i.e. with asserts everywhere. It makes assumptions explicit, it gives me more confidence in the code, and in the Future™ it may help static analyzers (and even optimizers based on static analyses!) and interactive proof assistants. I’d like to be able to distinguish different kinds of assertions, so that I would be able to ask the compiler to erase one kind or the other independently: for instance, pre-conditions at API boundaries (if they fail, that’s the user’s fault; you never want to erase these checks) versus internal invariants (if they fail, that’s your fault; you’d like to erase these checks once you are confident enough in your code). There is also a notion of “level” of assertions, which is how much you want to run them, depending on how confident you are in your code and how costly they are to evaluate. For instance:

assert ~level:9 (list_is_sorted xs) ;

I want to write that, but I really don’t want it to be evaluated unless in some extreme debugging mode.

Adding assertion kinds as a language/compiler feature seems like a relatively straightforward change but, before rushing into it, we likely want to think carefully about the more global story of integration with proof tools. I recall of a recent discussion about addition of a piece of syntax for inserting logical formulas, but I can’t find it anymore.

1 Like

If they are never disabled, why not call failwith or invalid_arg directly, rather than using assert? I’m just curious because I’m in the process of working out my own approach for this.

You can. I’d like assertions, including pre-conditions, to be considered somewhat specially in the language. First, because then it gives more structure, for instance you may identify a function’s contract mechanically (the difference is comparable to writing documentation in a free-form English comment versus structured annotations in some doc-string format). Second, because I’d like the possibility of them being erased (say, if a static analysis has found the assertions to be true statically; or, as the user of a library, I’m very self-confident and I want the raw speed of the library), and this has consequences w.r.t. semantics (side effects may or may not be performed). Also, this is a first step towards OCaml becoming the next WhyML, with a dedicated logical language.

But in the current state of the language, raising Invalid_argument is good enough!

1 Like

Well, I’m going to offer my opinion on the question in the topic, in case anyone cares.

I think the top priority needs to be simplification of the library packaging system. I think the ocamlfind library tool, and its indexing scheme with the META file format, must be made obsolescent.

The language (and therefore the toolchain) needs, I would argue, to provide a syntax for programs to specify how to locate modules in the subset of the package universe that are available to the compiler environment. This should not be left up to an external tool that drives the compiler. The compiler itself should perform the search and locate the .cma and .cmxa files (as appriopriate) based on parameters specified in the program source code.

4 Likes