A language with non-escaping stack allocations and regions

olleharstedt · April 7, 2021, 10:18am

I was thinking of designing a programming language with opt-in (or opt-out) GC, and how that could be achieved. The point is to hit a 80/20 target, where 80% of your code-base does not have to be fast, but 20% must be. Languages like Rust are mostly geared towards projects where 99% of your code must be fast, and in Go, OCaml, Java etc there’s no way to opt-out of blocked GC, and performance is not explicit nor predictable.

The opt-out of GC must be memory safe. I believe this can be achieved with non-escaping variables, without a burden of lifetime annotations like in Rust.

I wanted to apply escape analysis to enforce stack allocated variables to NOT escape their scope. They can be shared, mutated and aliased anyway besides that. This is less restrictive compared to uniqueness types (or rather, has other trade-offs, one could say).

For opt-in GC, I was thinking maybe explicit regions? Or implicit regions? Or even ref count? One can imagine a system which supports both, actually, but not sure if it’s needed.

Regions also need to be non-escaping, but you can pass around its variables in a way you can’t with pure stack allocations.

Note: Nim does not fill the gap, because it doesn’t support multiple memory allocation strategies within the same compilation unit.

The language is supposed to compile to C. Done in OCaml, of course, with Menhir.

Thankful for any feedback. ^^

Edit:

Possible syntax:

local p = Point {1, 2};     // Stack alloc
let q = Point {2, 3} in r;  // Region
let s = Point {4, 5};       // Ref count

UnixJunkie · April 8, 2021, 1:45am

There are some people who sometimes write memory-constant OCaml code.
Jerome Feret from INRIA / ENS Paris once told me that he did that in a project.
https://www.di.ens.fr/~feret/
I guess it is not easy to do.

olleharstedt · April 8, 2021, 8:22am

With Obj.magic?

(Must have 20 chars, bla bla bla bla bla bla)

atavener · April 8, 2021, 9:44pm

I often wish OCaml had a low-level “sublanguage” with manual memory management and control over memory layout – for that 20%. And of course, magically some way to use layout-specific unboxed datatypes from anywhere in the language. It’s some kind of dream-ideal, which I fully expect disintegrates in the light of logic, as most dreams.

Instead, we mostly rely on the uneasy union of C with OCaml… which I’m considering Zig or Rust (had hopes for Kit, but it died) for replacing the C. Still, it would be awesome to not have to write these different parts in such different languages… and again, ideally have more seamless interplay.

I find the GC itself to be mostly-acceptable overhead. It’s possible to have a game/simulation churning out frames and incurring a quite regular per-frame GC cost by triggering the collection per frame. Especially if most per-frame allocations are really just for that frame. I don’t see spikes like C#/Unity games, but I haven’t taxed anything too much. The bigger problem I have is the wild amount of pointer indirection from everything being boxed and few options for more concrete data representations (packed structs/records and/or arrays – not just float array or bigarray).

Having non-GC options would be great too though, and avoid the need for such developer-care to work with/around GC. Especially for regions/pools, which are so stupid-fast and ideal for simulations (and probably could take most of the per-frame allocation burden). Manual memory management in general, as in C/C++, in a codebase written with a lot of dynamic allocs… can even be worse than GC though, if you profile the alloc/free time. So, I’m careful to not vilify GC outright.

I like the simple syntax proposal, but have no idea how all the details would iron out.

I was very optimistic for Rust when it was basically OCaml with low-level programming… but it turned into something very imperative and more C+±tainted in practice, and completely built around the borrow-checker (blessing and a curse). I’d much rather do that 80% in OCaml still. So, I’ll be very interested if you can make that happen!

olleharstedt · April 8, 2021, 10:14pm

This guy summarizes the trip Rust took before version 1.0: PatchMixolydic comments on Three situations regarding memory

Interesting to hear about your use-case from GC and performance.

C# is a language that moves towards features enabling better memory control without sacrificing safety, I’ve heard.

Yeah, boxing is a problem, as is the interaction between boxed and unboxed values. There are not so many ideas out there for memory-safe GC opt-out, I think. Kind of an underdeveloped area of research. Most just throw away safety, it seems.

I have zero continuity with regard to hobby project, so no promises. Just putting some ideas out there, and will hopefully have time for a prototype to get a deeper understanding of the challenges. Will try to write up a conceptual article, too.

If OCaml ever gets a system of affinity, there might be possibilities for tight memory control.

Yes, I think the solution of GC + unsafe language is problematic still, in an enterprise setting. You usually don’t want this, I think. Well, maybe depending on domain. And I actually don’t know which domains would be suitable for an “80/20” language like that, either with opt-in or opt-out GC. More conceptualization needed. Games usually do the language combination stuff, like C + Lua or whatever.

droyo · April 8, 2021, 10:48pm

I was just reading about Wuffs and it made me think of this thread. It’s a language with a C-like syntax that compiles to C, but has some more constraints and safety features, and is supposed to be for writing performance or safety-critical fragments of a larger program as a library, not as a general-purpose language.

I like that idea; you use a more comfortable language for most of your program, then switch to a familiar but more restricted environment for the parts that need to be fast. I wonder how OCaml could be restricted to remove the need for garbage collection and boxed values.

c-cube · April 8, 2021, 11:03pm

I also wish it were possible to push OCaml further, to use a
sub-language with flat memory layout, no implicit allocations, etc.
Something like what cython is to cpython, or maybe even a subset that
would compile to C with automatic/hidden bindings to regular OCaml :-).

That’s just dreams though, no one is putting the effort in to push such
a big project forward, and that’s understandable.

olleharstedt · April 9, 2021, 6:57am

Maybe @Drup can chime in on affinity and tight memory control?

Drup · April 9, 2021, 11:33am

Yes and No.

If you want to do unmanaged manual allocations you need affinity/linearity to make it safe … but that doesn’t give you nearly as much performance gains as you think it does. The real gain happens when breaking out of ocaml’s uniform memory representation, and for that, you will have issues not only with the GC, but also polymorphism and many pieces of the OCaml semantics and compiler.

In the end, to behave correctly in a language like OCaml, the objects you are talking about will have to:

Never contain parametrized types
Never be given to polymorphic code
Not point to GC-managed objects

It’s feasible, albeit difficult. There is a sort-of limited first step RFC by @stedolan (https://github.com/ocaml/RFCs/pull/10) that gives appropriate tools at the typing level to talk about such objects.

In any case, it’s very limited, and attempts to have more are roughly equivalent to turn OCaml into Rust. and you might as well invest in the (very promising!) Rust FFI tools instead.

In general, I think this is very interesting topic, but unless we make huge changes to the language, it’ll always be limited in OCaml. However, we are making huge changes to the language right now, so never say never …

BikalGurung · April 9, 2021, 2:29pm

This got me curious. Do you mean like effects? or implicits ? or perhaps the fabled namespaces ?

B.

Drup · April 9, 2021, 3:15pm

“Simply” multicore.

atavener · April 9, 2021, 4:12pm

That’s great info, Drup!

The constraints you list sound completely reasonable to me. Far more promising than I expected, really. In practice, such constraints would only apply to some small portion of a large piece of software (eg. game), and tend to be at lower-level “leaf” subsystems anyway. These are also constraints which are natural to such lowlevel code anyway – and implicit when currently implemented in C.

The link to the unboxed proposal is nice. I’ll have to read through more fully.

I share your expectation (concern?) that things start turning into Rust. But if that “Rust-like” code is really just the “20%” lowlevel or performance-sensitive stuff, and OCaml proper is the bulk of code with seamless interop to the lowlevel. That would really be close to ideal (IMHO).

On that note, though, I will now look into the state of Rust FFI, so thanks for that mention as well!

olleharstedt · August 31, 2021, 3:01pm

Rust has the worst language ergonomics ever, however. I think alternatives could be interesting.

olleharstedt · August 31, 2021, 8:39pm

What about F*, though? It can be compiled to both OCaml and C, right? At least a subset of it can be compiled to C. Maybe that can be used as a case-study?

olleharstedt · April 4, 2024, 1:08pm

Newer proposal: https://github.com/ocaml/RFCs/pull/34

Topic		Replies	Views
Add support for stack allocation Learning language-design	28	3500	January 11, 2021
Compile a language to C with OCaml GC support Learning compiler	11	1513	January 18, 2021
Is there a way to turn off garbage collection inside of one function? Learning gc	15	2324	April 27, 2020
Tutorial on writing an OCaml bytecode interpreter from scratch in Rust? Learning	8	1423	August 6, 2024
Relaxed rules for binding a C library? Learning multicore	8	714	October 25, 2022

A language with non-escaping stack allocations and regions

Related topics