[CPU bug affecting OCaml] Skylake bug: a detective story

A nice blog post from Ahrefs about the Skylake and Kaby Lake CPU bug recently patched by Intel, which disproportionately affected OCaml.

For reference, see this message from the Debian mailing list.

[WARNING] Intel Skylake/Kaby Lake processors: broken hyper-threading

TL;DR: unfixed Skylake and Kaby Lake processors could, in some
situations, dangerously misbehave when hyper-threading is enabled.
Disable hyper-threading immediately in BIOS/UEFI to work around the
problem. Read this advisory for instructions about an Intel-provided

On 2017-05-29, Mark Shinwell, a core OCaml toolchain developer,
contacted the Debian developer responsible for the intel-microcode
package with key information about a Intel processor issue that could be
easily triggered by the OCaml compiler.

For people on Linux, there is a perl script for checking whether your CPU is likely affected.

For an idea of how likely this bug is to occur, we had four reports recently in Lwt alone from users that had been running stress tests, where the stack traces happened to imply Lwt might be the problem.



Question: since OCaml doesn’t do multithreading, why did hyperthreading kick in here? Is it a result of multiple processes assigned by the OS to different virtual cores?

Speculating, but hyperthreading could have caused the problem even if the other virtual core was running something totally unrelated to OCaml, which probably happens frequently on any system. So, probably yes to

Is it a result of multiple processes assigned by the OS to different virtual cores?

The blog post was amazing to read, thanks for sharing!


Xavier Leroy wrote another great post with additional information http://gallium.inria.fr/blog/intel-skylake-bug/. It’s nice to know the details of his analysis, and I must point out that the conclusions Xavier had reached with the help of SIOU in his previous work were incredibly useful for us during the work described in our blogpost.

It’s a bit sad that this story came out fragmented but this was really a community wide effort.