We boiled it down to a combination of two things, garbage collection and allocation of custom blocks.
Our project is CPU intensive and we found that expanding the minor_heap_size of the GC gave us a benefit in performance of about 10-20% at the expense of increasing the memory use. To do that we have a line of code that expands the minor heap size it when the executable start.
On other side of the project, we use custom blocks to interact with C++ objects. Initially we created the custom blocks as follows:
obj = caml_alloc_custom(&custom_block, sizeof(Object*), sizeof(Object), 10000);
with the given parameters for
max of the custom block we found that they were not being finalized as often as we needed. Then we changed the
max parameter to
obj = caml_alloc_custom(&custom_block, sizeof(Object*), sizeof(Object), 1);
that worked just ok but we were not having the objects finalized as we fast as we needed. Then we changed our code to explicitly finalize every custom block. This worked perfectly, but we kept the
max parameter set to 1.
That parameter in customs blocks seems to be the one giving problems in 4.07.1. The interesting thing is that we didn’t see this in 4.06.1 thanks to our changes in the GC settings. Reverting the GC settings to default (in 4.06.1) show a slowdown.
We have changed the allocation of custom blocks to:
obj = caml_alloc_custom(&custom_block, sizeof(Object*), 0, 1);
with this change we have similar performance with both versions of the compiler.