A lesson from Ruby's GC?


#1

An insightful blog post looks at Ruby’s GC and why Ruby apps seem to use so much memory:

It appears that malloc fails to release memory back to the OS. There is a proposed fix and it reduces memory consumption dramatically. I wonder whether this provides a lesson for the OCaml GC.

The proposed fix is to call malloc_trim(0) as part of the GC cycle to force malloc to release (grey) pages that it knows are unused.

Maybe this could be tried in an experiment where an OCaml program calls malloc_trim() via a small C binding - without modifying the compiler runtime.


#2

A quick google search shows that Ruby has a conservative GC that does not perform compaction, whereas OCaml’s GC is exact and performs compaction. The following paper is interesting though: Powers et al., “Mesh: Compacting Memory Management for C/C++ Applications” https://arxiv.org/abs/1902.04738. The authors have developed a drop-in replacement for malloc that performs compaction without relocation, and they show a benchmark where they improve the memory usage in Ruby (among others).


#3

I would expect that compaction is orthogonal to whether malloc returns pages to the OS or not. The pages being used in OCaml might be filled better but if malloc does not return them, it could still result in a high memory foot print. It would be interesting to do what the author did: ask the GC for the memory it knows about and look at the memory allocated by the process.


#4

In the blog post you posted, it’s pointed out that, by default, only OS pages at the top of the heap are returned. This is only a problem if you don’t have compaction.


#5

It would be great if anybody who knows the GC well would confirm that indeed an OCaml process returns memory to the OS after compaction and the picture above is not happening for OCaml.


#6

I am not an expert in this area but as far as I know:

In recent versions of OCaml, the configure script checks for huge page support (pages larger than 4KB, OCaml expects 4M at the moment).

If huge pages are supported, then the runtime will use mmap/munmap and properly return pages to the OS. Otherwise, it falls back to malloc managed allocations.

Also note that huge pages are not supported on macOS, though it should be possible to implement that (as huge pages are not a posix extension, the API is implemented differently in macOS and Linux).

Since the huge memory consumption comes from a buggy behavior of glibc, macOS and other BSDs should not be affected (though they could have implemented the same bug :D) and only linuxes running on old hardware or old kernel are affected.
So people likely to run performance sensitive OCaml processes should not be affected, but the bug can definitely apply to some.


#7
let x = ref []

let rec alloc n =
  if n > 0 then (
    x := n :: !x ;
    alloc (n - 1) )

let _ =
  alloc (100 * 1024 * 1024) ;
  x := [] ;
  Gc.compact () ;
  while true do () done

shows that memory is returned after the call to Gc.compact, both virtual and resident memory. Tested both with the toplevel and with ocamlopt; and both with and without huge pages support (by undefining HAS_HUGE_PAGES before compilation of ocaml). The binary was linked against libc.so.6 from glibc 2.28 (Ubuntu 18.10).


#8

Quoting @Drup:

In the blog post you posted, it’s pointed out that, by default, only OS pages at the top of the heap are returned

I guess it means that you need to introduce some fragmentation in the heap. For instance by interleaving C calls to malloc that retain small amounts of memory alive in the middle of your loop.
(The protocol to reproduce this situation is not that trivial :P)


#9

It is also noted in the blog post, that fragmentation was the red herring and that Ruby wasn’t really involved in the process, and it is totally the malloc issue. So I don’t see any reasons why compacting will resolve the issue.

I’m not claiming that I know GC well, but to my understanding, this may happen with OCaml, when malloc is used. The memory is returned to malloc using the caml_shrink_heap in do_compact. The heap is processed in the ascending1 order, OCaml will keep some number of chunks for itself (wanted), and then release all chunks that are above to malloc.his is the relevant code:

    /* Add up the empty chunks until there are enough, then remove the
       other empty chunks. */
    ch = caml_heap_start;
    while (ch != NULL){
      char *next_chunk = Chunk_next (ch);  /* Chunk_next (ch) will be erased */

      if (Chunk_alloc (ch) == 0){
        if (free < wanted){
          free += Wsize_bsize (Chunk_size (ch));
        }else{
          caml_shrink_heap (ch);
        }
      }
      ch = next_chunk;
    }

Since, if I understood the blogger correctly, malloc will return to the system only the chunks from the top of the heap, thus chunks that were returned to malloc by OCaml will still remain gray in the malloc heap, until the top element of the least is not freed. In other words, until the GC is able to move this block downwards. Until this will happen, the memory will be leaking.

This all is, in general, not an issue on Linux, where malloc is not used, and mmap/munmap are used instead of malloc/free.

I’m not sure that your test is representative. To trigger the issue you need to keep a block at the top of the heap. In your case, on each cycle you allocate a chain, then free the whole chain, so everything is returned to malloc (except OCaml own runtime service structures, which were allocated before you, and have low addresses).

Update. Recompacting makes this worser

In fact, the invariant that heap is sorted in the ascending order is no longer preserved, see the issue. After this change the new behavior of the compator is described by the following comment in compact.c:caml_compact_heap:

  /* Compaction may fail to shrink the heap to a reasonable size
     because it deals in complete chunks: if a very large chunk
     is at the beginning of the heap, everything gets moved to
     it and it is not freed.

     In that case, we allocate a new chunk of the desired heap
     size, chain it at the beginning of the heap (thus pretending
     its address is smaller), and launch a second compaction.
     This will move all data to this new chunk and free the
     very large chunk.

     See PR#5389

So, if the recompaction is triggered a large block with possibly high address, will be residing at the beginning of the heap, and new blocks will also stick to it, thus breaking the sortedness of the list even more. So eventually, the list will become totally unsorted, and the probability of removing high addresses will be lesser. So, on a long run, the chances are high, that malloc will leak more memory. But again, this is only when malloc is used.


1) the ascending order invariant is enforced by memory.c:caml_add_to_heap
2) I misunderstood on the first reading the what free means. This is free in terms of OCaml heap, not in terms of the malloc heap. So this are the chunks that will not be actually returned back to OCaml.


#10

Nice find. I tried again with tests that should display the head shape you describe by triggering recompaction, and I also tried to interleave manual calls to malloc as suggested, but that was inconclusive (I did not try too hard since it is now established that standard configurations do not use the glibc malloc).