OCaml segfaults on a fresh switch

An idea… brew install gcc, make sure the real gcc is in the path and reinstall ocaml.

(I have no Mac and can’t test).

I’m afraid that’s not good advice. You rather want to make sure there’s no such thing in your PATH (along with binutils). In general having multiple C toolchain in the PATH leads to problem on MacOS.

Out of desperation I tried deleting and re-creating the switch for 4.14.1, having earlier upgraded the system ocaml compiler and opam version via homebrew. Now it cannot even finish creating the switch:

$ opam switch create 4.14.1

<><> Installing new switch packages <><><><><><><><><><><><><><><><><><><><>  🐫
Switch invariant: ["ocaml-base-compiler" {= "4.14.1"} | "ocaml-system" {= "4.14.1"}]

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫
∗ installed base-bigarray.base
∗ installed base-threads.base
∗ installed base-unix.base
∗ installed ocaml-options-vanilla.1
⬇ retrieved ocaml-base-compiler.4.14.1  (cached)
∗ installed ocaml-base-compiler.4.14.1
∗ installed ocaml-config.2
[ERROR] The compilation of ocaml.4.14.1 failed at "ocaml /Users/csg63/.opam/4.14.1/share/ocaml-config/gen_ocaml_config.ml 4.14.1 ocaml".

#=== ERROR while compiling ocaml.4.14.1 =======================================#
# context     2.1.4 | macos/x86_64 | ocaml-base-compiler.4.14.1 | https://opam.ocaml.org#b26ac460
# path        ~/.opam/4.14.1/.opam-switch/build/ocaml.4.14.1
# command     ~/.opam/opam-init/hooks/sandbox.sh build ocaml /Users/csg63/.opam/4.14.1/share/ocaml-config/gen_ocaml_config.ml 4.14.1 ocaml
# exit-code   2
# env-file    ~/.opam/log/ocaml-12016-f90f94.env
# output-file ~/.opam/log/ocaml-12016-f90f94.out
### output ###
# sh: /Users/csg63/.opam/4.14.1/bin/ocamlc: cannot execute binary file
# Exception: Failure "Bad return from 'ocamlc -where'".



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫
┌─ The following actions failed
│ λ build ocaml 4.14.1
└─
┌─ The following changes have been performed
│ ∗ install base-bigarray         base
│ ∗ install base-threads          base
│ ∗ install base-unix             base
│ ∗ install ocaml-base-compiler   4.14.1
│ ∗ install ocaml-config          2
│ ∗ install ocaml-options-vanilla 1
└─
Switch initialisation failed: clean up? ('n' will leave the switch partially installed) [Y/n] Y

So I’m starting to wonder if this is more likely an issue with my system ocaml install (though the system ocaml does compile hello world into a working executable).

I re-ran this, and didn’t clean up the failed switch, so I could poke at the ocamlc binary it was trying to run. file doesn’t flag it as a binary, but as random data. This appears to be because the first roughly 4.8MB of the file consists of just 0s. Then at that point, when looking at a hex dump of the file, I do start to see tell-tale signs of a binary (what look like identifier names related to ocaml and ASTs and types and such). Then after a bit there’s another several-MB chunk of 0s.

And trying with 4.14.0 yields the same results with a broken/corrupt ocamlc.

I think it’s very unlikely to be an issue with your system ocaml, as it is not used at all when building a new switch.
If you have some extra hard drive lying around, you might want to try creating a switch on that drive just to check if it’s a hardware failure or not:

$ cd /my_external_drive/some_folder
$ opam switch create . 4.14.1
$ opam install dune

I can’t think of anything else that would produce the results you’ve reported.

1 Like

I don’t have an external drive handy.

I’ve escalated to moving my existing ~/.opam and trying opam init.

I’ve tried once with opam init --compiler=4.14.1 and the initial switch creation fails in the same way as when I tried to create a fresh 4.14.1 switch with an existing installation — it ends up with an ocamlc binary in the switch’s bin that isn’t actually a valid executable.

opam init by itself seems to install a default switch “successfully,” though it’s unclear to me whether ~/.opam/default/bin is supposed to be empty in this case; it doesn’t seem to install an opam-managed version of ocaml, but simply relies on the system ocaml.

Installing dune on this new default switch (the one that actually just uses the system ocaml install) succeeds. So I can now accomplish what I originally set out to do (build a particular 3rd-party dune project), but it’s not clear to me that my new default switch is actually working as intended. Can anyone confirm if an empty ~/.opam/default/bin directory is correct/expected after a fresh opam init?

If it’s using ocaml-system (meaning opam has detected that a system wide opam is already available) it is indeed expected. If you want to force opam to install everything in your switch you have to use the ocaml-base-compiler packages. For example:

opam init --compiler=ocaml-base-compiler.4.14.1

Thanks for confirming.

Per above, init with 4.14.1 still fails with a corrupt ocamlc binary.

With my working opam re-init, I can create a switch for 4.05.0 successfully, but 5.0.0 fails with the same issue as 4.14.1 and 4.14.0.

I’m starting to do binary search over which ocaml versions I can install from opam now…

  • opam switch create 4.12.1 fails
  • opam switch create 4.09.1 succeeds

To rule it out, this is not an issue with the disk being almost full?
(the binary search you’re doing is a good thing to try)

Nope, disk is not full. 4.10.2 succeeds.

Have to run for now, but 4.11.2 succeeds. The only release between that and one I know fails (4.12.1) is 4.12.0, which I’ll try when I get a chance.

Okay, this is utterly bizarre now. Yesterday building 4.12.1 failed. Today, after building 4.12.0 succeeded, I retried 4.12.1, and it worked. Then I tried 4.14.1 (the version I was originally failing with) and it worked. And 5.0.0 works, even though all of these failed yesterday. I have not changed my machine since I started this binary search over versions. I have not rebooted, I have not updated any software since 4.12.1 and higher failed yesterday, the only thing I can conceive of is that I put my laptop to sleep and woke it up several times. That is all.

Since I observed the same failure pattern (4.8MB of zeros, part of a binary, then more zeros) on multiple versions, but my disk doesn’t report any errors and since I kept the broken versions of the switch builds around, the new failures were going to different disk blocks anyways, so I don’t think it’s disk-related. I suppose it’s plausible I have some faulty RAM, but it also seems unlikely that macOS would keep backing the same part of only builds above 4.12.1 with the same faulty page but not doing the same with builds below that version.

On the other hand, given that I changed none of the software configuration between yesterday and today, it seems hard for me now to conclude that this wasn’t some kind of hardware error (or I suppose it could have been a kernel virtual memory management bug where sleep/wake cycles reset some data structure, having written such bugs myself before…).

Either way, this is now looking a lot like some (transient?) hardware/OS failure, rather than an actual opam issue.