We present a resource-management model for ML-style programming languages, designed to be compatible with the OCaml philosophy and runtime model. This is a proposal to extend the OCaml language with destructors, move semantics, and resource polymorphism, to improve its safety, efficiency, interoperability, and expressiveness. It builds on the ownership-and-borrowing models of systems programming languages (Cyclone, C++11, Rust) and on linear types in functional programming (Linear Lisp, Clean, Alms). It continues a synthesis of resources from systems programming and resources in linear logic initiated by Baker.
It is a combination of many known and some new ideas. On the novel side, it highlights the good mathematical structure of Stroustrup’s “Resource acquisition is initialisation” (RAII) idiom for resource management based on destructors, a notion sometimes confused with finalizers, and builds on it a notion of resource polymorphism, inspired by polarisation in proof theory, that mixes C++'s RAII and a tracing garbage collector (GC). In particular, it proposes to identify the types of GCed values with types with trivial destructor: from this definition it deduces a model in which GC is the default allocation mode, and where GCed values can be used without restriction both in owning and borrowing contexts.
The proposal targets a new spot in the design space, with an automatic and predictable resource-management model, at the same time based on lightweight and expressive language abstractions. It is backwards-compatible: current code is expected to run with the same performance, the new abstractions fully combine with the current ones, and it supports a resource-polymorphic extension of libraries. It does so with only a few additions to the runtime, and it integrates with the current GC implementation. It is also compatible with the upcoming multicore extension, and suggests that the Rust model for eliminating data-races applies.
Interesting questions arise for a safe and practical type system, many of which have already been thoroughly investigated in the languages and prototypes Cyclone, Rust, and Alms.
Looks nice! Have you spoken about this with the OCaml development team? And if so, have you had some feedback? Are you going to continue working on the areas you point out in your conclusion (e.g. data races)?
In addition to traced pointers (with lowest bit set to 0), let us use untraced pointers (with lowest bit set to 1). The latter are allocated in the major heap and deallocated using RAII.
So far the lowest bit partitions OCaml values into pointers and integers. You’re reusing this bit to partition values into GC managed and non-GC managed that includes both heap-allocated (but not managed) values (i.e., blocks) and unboxed values such as integers and variants without arguments.
My question is, are you sure that the new representation will not break a lot in the exisiting runtime and even compiler? What I’m afraid of, is that compiler backends, flambda, and other low-level components may depend on a fact that values with the lowest bit set to 1 are generative and could be copied without introducing any effects.
I have to thank again a few members of the OCaml community (and outside of it), including members of the development team, for their encouragement, patience and feedback. Leo White in particular has been very helpful, and he has floated a few ideas that have not yet found their way in the proposal (I think he independently has been thinking about resource management and re-using the type-and-effect system for this purpose), and I hope we can join forces in the future. This work was at a point where it was necessary to properly assemble and write down the ideas, and do some bibliographical work, to get further feedback and foster collaborations, here we are now.
The next crucial step is to figure-out details of a type system, so we will be working on that in our team. Even leaving lifetimes aside, the proposal gives rise to interesting questions. The goal for data races is more modest in a first time, as explained in the end (13.3), but I am looking forward to get any feedback/collaborations on this topic as well of course.
This is precisely the sort of feedback that I was looking for by posting this proposal. What do you mean with generative? I am interested in having more details. In addition, §12.5 describes what would be given up by allocating everything with the GC instead.
OK, let me also throw a big stone in that quiet lake:
I personally want OCaml to drop the garbage collector and start using region-based allocation.
Then, programs will not have to spend any time doing GC.
I have seen optimized OCaml programs spending 20% of their time doing GC.
I am half joking, I don’t know if this would even be possible.
Modular implicits, multicore and algebraic effects, Flambda, possibly more core OCaml developers thanks to recent announces, lots of other improvements (e.g. better error messages from the compiler), “Bob”, and then this very work… it really seems there is a great window of opportunity in the next few years to make OCaml an even greater language than it is today. I hope all this good work will find its way into OCaml in the near future. The last steps for integration into the compiler are, I guess, often the hardest ones, I hope concerned people will not be discouraged by this endeavor…
By generative, I mean that a computation will not have any observable effects (besides possible an allocation). I think it’s irrelevant since the compiler will never see values as it operates on different abstractions. So there shouldn’t be any problems in the middle end. Even in backend I wouldn’t expect any problems, besides maybe ABI implementations.
What I’m thinking is that some primitives that relly on the OCaml value runtime representation might become unimplementable in the new representation, e.g., hash, compare, equal, etc. Let’s take the structural equality for example, so far it is using the lowest bit to distinguish between blocks and immediate. If the bit is set to 1 then it doesn’t follow the pointer. So it is not obvious to me how to implement polymorphic compare with the proposed representation (other than making them non-comparable, like functions, for example).
The proposal considers adding an allocation method based on RAII in addition to the GC, similar for this purpose to Fluet, Morrisett and Ahmed’s linear regions (2006) (not to be confused with Tofte-Talpin regions). It has pros and cons. It is mostly suitable for long-lived memory allocations, for large or mutation-intensive values (because it can be used to bypass the write barrier). Bringing actual figures in the discussion is very helpful. Do you have a more precise picture of how and to which extent an allocation method alternative to the GC would have helped?
Don’t hold your breath! Everything remains to be done. There are a few things to figure out still before one can even get started with a prototype. I would prefer if this proposal is seen as a survey of the literature and techniques, plus some connecting-the-dots, meant to show the feasibility and usefulness of such a model and motivate the real hard work.
I have thought about the generic functions, though you are right that it is not mentioned in the proposal. It makes sense to me that linear values should be compared physically. If two resources have different addresses, then they are different. This is similar to how objects are treated. The current implementation of hash, compare, equal, treating them like integers, is likely satisfactory.
Warning: I am not a compiler person, unlike many INRIA folks.
My understanding of region-based allocation is that all calls to
malloc/free could be found at compile time.
I.e. there are no more GC passes trying to find what to keep and what to free.
You allocate memory when you need it and you free it as soon as you don’t need it anymore (I guess you can use a memory pool, not necessarily the system’s malloc/free calls).
I am not sure this is applicable to OCaml though. Experts should know.
I think this would be a major boon to the ecosystem. Rust is gaining a lot of mindshare on the grounds of safety.
Of course there are many angles to consider. Multicore compatibility is a great advantage for this proposal.
I’m not an ocaml expert yet but from those I’ve interacted with, I think most ocaml hackers are well invested in the ability to write correct programs whilst still being productive wrt “real world” projects.