About first_fit GC policy worst case complexity (it's "quadratic")

For years we have been running on prod a service which has two unfortunate properties :

  • it fragments a lot by allocating a lot of various block size values of different lifecycles
  • it has a tight requirement on latency, that compaction breaks when compacting all the freelist mess.

Freelist would easily grow to ~300/400k blocks and compact stall for a few seconds to deal with this. Our idea was to compact less and fragment less by using the first_fit policy. At first it worked fine for months. Then every few months it started to stuck in minor GC. Recently it started stalling more often and we managed to get a core and understand what’s going on here is the key take away.

when the manual say you have to pay a price in allocation cost when enabling first_fit, what it actualy means is :

first_fit policy worst case minor GC is O(blocks in minor heap * blocks in free list). This can grow really big. I’m not sure it’s a bug or how it can be improved yet, but this can easily trigger loops in gc code that will run for more 30minutes.

The worst case happens when the first block in the free list is relatively big and most of the other block in freelist are small. The flp then has ~1/2 entries, and if the freelist is large like ~150k blocks, the minor gc will scan it for every minor heap block (to check if the flp can be extended with more blocks).

I’m going to try another angle and instead try to compact really often.

Are you able to distill down this allocation policy into a reproducible test case that can be profiled independently of your codebase?

This seems very hard to do unfortunately. I’ve been trying and all i managed to do is find a totally unrelated bug few months ago. It’s especialy hard that the denegerate case take several days to reproduce.
Now i must admit this service is allocating A LOT. It is http service allocation a lot of values for headers and stuff and it holds a very big peristent memory structure based on map and sets which gets changed often.
(not that big actually, ~500MB)

The corner case on first-fit might be considered as an algorithm bug i suspect but i guess there is not much to do when freelist grows big like >200k blocks.