How hard would it be to add support for stack allocated variables that are not traced by the GC? The constraint would be that those variables are not allowed to “escape” (for a proper definition of escaping). This could be useful for tight loops. Many JITs do escape analysis to automatically achieve stack allocation (or reduce allocations), but I think it’s more attractive to have the ability to control when this happens, as a way to “opt out” of the GC.
Example using local for stack allocation:
let test () =
let local a : point = {x = 10; y = 20} in
let local b : string = point_to_string a in
print_endline b;
b; (* Error: b is not allowed to escape scope *)
Be careful what you wish for. If a block is not scanned by the GC, then it cannot refer to GC-allocated values, otherwise the GC might consider them dead while they can still be acceded through the stack-allocated block. So, it means that it would only contain integer values. Moreover, its address cannot even be passed to other functions, as it would then become markable by the GC, which would confuse it to no end (due to the no-naked-pointer feature). So, the use of stack-allocated blocks would be severely limited.
So, if you consider your example, variable a contains only integer values, so it could be allocated on stack, except that it is passed to function point_to_string. As for value b, it could never have been stack-allocated in the first place. First, it is returned by point_to_string, so it needs to be allocated on the heap anyway (unless the call to the function is inlined). Second, its size is dynamically determined, so it again needs to be allocated on the heap. Third, as with a, it is passed to another function.
That said, your motivation (“tight loops”) is actually a meaningful one. Thus, the OCaml compiler does perform an optimization pass to allocate such values on the stack. For example, the following example does not allocate anything on the heap (and using the ref syntactic sugar does not change that):
let foo n =
let a = { contents = 1 } in
for i = 1 to n do a.contents <- a.contents * i; done;
a.contents
Bruno Blanchet’s PhD thesis was on escape analysis and its use for automatic stack allocation of data structures in Java and in OCaml: DEA training and PhD thesis: Escape Analysis .
He observed significant speedups in Java but much more modest speedups in OCaml. This is probably because the Java implementation he modified had a slow memory allocator and GC.
Heap allocation of small, short-lived data structures is very efficient in OCaml, so the benefits of stack allocation are low. Getting rid of memory allocation altogether, as in the example given by @silene, is still a clear win, however.
Hm, well, stack allocation could be used for long-lived values as well, if their size is “known” and they don’t escape (but only if they can be passed to other functions, obviously). Still not sure about the benefit, though. One would have to test with a benchmark which is open for this kind of optimization, however that would be designed.
By integer values, I meant anything that is out of the scope of the GC, so it could also be floating-point numbers, pointers to statically allocated data, and so on. As for strings, they are represented as a pointer to a block containing a sequence of characters (and its length). If this is a literal value (i.e., a known string), the block is hardcoded in the code segment of the program. So, copying it on the stack would just waste some space without any benefit.
You could make your stack-allocated block look like a regular block (i.e., prepend it with a header), and preemptively color it black, so that the GC skips it if it ever follows a pointer to it.
Obviously, you would still need to perform an interprocedural analysis to make sure that none of the called functions cause the pointer to outlive the stack frame. Once you have implemented such an analysis, there is not much point in having a let local keyword anymore.
If you wanted to avoid such an interprocedural analysis, you would instead need to annotate all the function signatures using a [@@local] attribute. Then, a simpler intraprocedural analysis would be able to check that local pointers are never passed as nonlocal arguments. That said, this would still make the let local keyword useless, since the escape analysis would be trivial.
What I said at the beginning of the discussion still holds: This block cannot container any non-integer values, since it will not be scanned by the garbage collector. So, your code needs to be quite peculiar to exhibit such huge blocks.
Is this necessarily true?
Consider the following type:
type t = {
a : int;
b : string;
}
The block contains an integer (not scanned by the GC) and a string (a pointer, followed by a GC, to a block marked by the GC).
What would prevent to allocate a t directly on the stack?
When the GC follows roots from the stack, what would be a fundamental obstacle to having these blocks on the stack?
Specifically, consider a function that has two local variables: a an integer and b a string. The stack frame has two words for these two values: one word to represent the integer and one word to point to a block on the heap that represents the string.
What is the fundamental difference between these two situations (a stack-allocated block that contains pointers to heap-allocated objects vs. stack pointers to heap-allocated objects) that prevents one from occurring?
You are right. If you color the stack-allocated block in black, and then register all the fields of the block as roots, then you can store GC-handled values in it. But you are going to lose a lot of the advantages of a stack-allocated block. In particular, your stack-allocated block will be scanned by the GC a lot more often than an oldified block, and thus be slower.
On this line of work, you might want to read the retrospective by some of the people involved:
“A Retrospective on Region-Based Memory Management” (Tofte, Birkedal, Elsman, Hallenberg).
A relevant paper: “Gentrification gone too far? affordable 2nd-class values for fun and (co-)effect” by Osvald et al.
Perhaps a lower-hanging fruit than (or a stepping stone to) more invasive proposals involving regions/lifetimes (the paper presents some more use-cases than avoiding allocations in higher-order code).
It is exactly the scheme I described, i.e., annotate all the relevant function arguments with a [@local] attribute (except that their attribute is named @local, because this is Scala and not OCaml). So, it is not that ambitious. In fact, this is the strict minimum to support stack allocations in a viable way, as far as I know. Any less and you will fail to be sound.