Segfaults on static compilation with alpine 3.23 (Fix: -no-pie ?)

While investigating a segfault with alpine-3.23 and ocaml 4.14 in static compilation, that could involve thread local storage, I stumbled across a segfault on empty ocaml program with ocaml 5.*

## KO static empty program segfault
#FROM ocaml/opam:alpine-3.23-ocaml-5.0
#FROM ocaml/opam:alpine-3.23-ocaml-5.1
#FROM ocaml/opam:alpine-3.23-ocaml-5.2
#FROM ocaml/opam:alpine-3.23-ocaml-5.3
#FROM ocaml/opam:alpine-3.23-ocaml-5.4
FROM ocaml/opam:alpine-3.23-ocaml-5.5

## OK
#FROM ocaml/opam:alpine-ocaml-4.14
#FROM ocaml/opam:debian-ocaml-5.4
#FROM ocaml/opam:alpine-3.22-ocaml-5.5

RUN touch "test.ml"

RUN opam-2.1 exec -- ocamlopt -ccopt -static -o test.exe test.ml
#RUN opam-2.1 exec -- ocamlopt -o test.exe test.ml

RUN sudo apk add gdb
#RUN sudo apt install gdb -y

RUN gdb -batch -ex "run" -ex "bt" --args ./test.exe

The output is:

#8 [5/5] RUN gdb -batch -ex "run" -ex "bt" --args ./test.exe
#8 0.298 warning: Error disabling address space randomization: Operation not permitted
#8 0.303 
#8 0.311 Program received signal SIGSEGV, Segmentation fault.
#8 0.311 caml_shared_try_alloc (local=0x0, wosize=wosize@entry=65, tag=tag@entry=0, reserved=reserved@entry=0) at runtime/shared_heap.c:508
#8 0.311 warning: 508   runtime/shared_heap.c: No such file or directory
#8 0.311 #0  caml_shared_try_alloc (local=0x0, wosize=wosize@entry=65, tag=tag@entry=0, reserved=reserved@entry=0) at runtime/shared_heap.c:508
#8 0.311 #1  0x00007a8fe7459210 in alloc_shr (wosize=wosize@entry=65, tag=tag@entry=0, reserved=0, noexc=0) at runtime/memory.c:424
#8 0.311 #2  caml_alloc_shr (wosize=wosize@entry=65, tag=tag@entry=0) at runtime/memory.c:455
#8 0.311 #3  0x00007a8fe7463170 in caml_init_signal_handling () at runtime/signals.c:220
#8 0.312 #4  0x00007a8fe7445339 in caml_init_domains (max_domains=<optimized out>, minor_heap_wsz=<optimized out>) at runtime/domain.c:1110
#8 0.312 #5  0x00007a8fe744d118 in caml_init_gc () at runtime/gc_ctrl.c:357
#8 0.313 #6  0x00007a8fe746ad61 in caml_startup_common (pooling=<optimized out>, argv=0x7fffb41d73c8) at runtime/startup_nat.c:106
#8 0.313 #7  caml_startup_common (argv=0x7fffb41d73c8, pooling=<optimized out>) at runtime/startup_nat.c:86
#8 0.313 #8  0x00007a8fe746aedb in caml_startup_exn (argv=<optimized out>) at runtime/startup_nat.c:134
#8 0.313 #9  caml_startup (argv=<optimized out>) at runtime/startup_nat.c:139
#8 0.313 #10 caml_main (argv=<optimized out>) at runtime/startup_nat.c:146
#8 0.313 #11 0x00007a8fe743e0cc in main (argc=<optimized out>, argv=<optimized out>) at runtime/main.c:37
#8 DONE 0.4s

It only occurs in static compilation. During my investigation on 4.14, it seems the local storage was not correctly initialized. In the GDB backtrace, you can see at frame #0: caml_shared_try_alloc (local=0x0, ...) .

I don’t know if the problem comes from alpine 3.23, musl (different rc compared to alpine 3.22), or gcc (15 instead of 14 for 3.22). But when I asked an LLM to proof read this post it gave me the advice to try -no-pie which avoid the segfault.

(Perhaps similar to Segfault in static executables compiled with musl libc in 5.0.0~alpha1 · Issue #11463 · ocaml/ocaml · GitHub )

local=0 is definitely wrong and looks similar to the issue you mention. But no idea what can cause this.

That libc / compiler / linker combination seems to have a broken implementation of TLS, where cross-module TLS references don’t work properly in static-PIE executables linked with -Wl,-E.

In a.c:

thread_local int var = 42;

and in b.c:

#include <stdio.h>
extern thread_local int var;
int main() { printf("%d\n", var); }

then:

$ gcc -Wl,-E -static a.c b.c -o prog && ./prog 
-20093280

(Passing both -Wl,-E and -static is a slightly weird combination, but it shouldn’t fail like this)

There’s actually a workaround already present in the runtime for broken TLS implementations, but currently it’s hardcoded to trigger only on certain platforms:

# Full support for thread local storage
# macOS and MinGW-w64 have problems with thread local storage accessed from DLLs

AS_CASE([$target],
  [*-apple-darwin*|*-w64-mingw32*|*-pc-windows], [],
  [AC_DEFINE([HAS_FULL_THREAD_VARIABLES], [1])]
)

It might be useful to expose this as a ./configure option to work around this bug.

2 Likes

Than you a lot @stedolan for the small example, I was not able to create one. I will now report on alpine bugtracker(EDIT:here):

#OK
#FROM alpine:3.22.3
#KO
FROM alpine:3.23.3

RUN apk add gcc build-base

RUN echo -e "#include <threads.h>\n thread_local int var = 42;" > a.c

RUN echo -e "#include <threads.h>\n extern thread_local int var;\n int main() { return var; }" > b.c

RUN gcc --std=c11 -Wl,-E -static a.c b.c -o prog

RUN ./prog; RES=$?; echo "Should be 42, it is $RES"; test "42" = $RES

I modified the example a little so that it works on alpine 3.22 which uses gcc 14.2