Cross compile legacy version compiler-libs.common using later version OCaml

I am trying to build compilerlibs/ocamlcommon.cmxa of an older branch of OCaml, namely 4.06.1+BS, using OCaml 4.10.2. However I ran into the Error: Unbound module Stdlib on most of the targets. For example the following does NOT work:

../../../../inria/ocaml/boot/ocamlrun ../../../../inria/ocaml/boot/ocamlc -use-prims ../byterun/primitives -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats `sh ./Compflags list.cmi` -c list.mli

However these two worked:

../../../../inria/ocaml/boot/ocamlrun ../../../../inria/ocaml/boot/ocamlc -use-prims ../byterun/primitives -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats `sh ./Compflags camlinternalFormatBasics.cmi` -c camlinternalFormatBasics.mli ../../../../inria/ocaml/boot/ocamlrun ../../../../inria/ocaml/boot/ocamlc -use-prims ../byterun/primitives -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats `sh ./Compflags camlinternalFormatBasics.cmo` -c camlinternalFormatBasics.ml ../../../../inria/ocaml/boot/ocamlrun ../../../../inria/ocaml/boot/ocamlc -use-prims ../byterun/primitives -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats `sh ./Compflags pervasives.cmi` -c pervasives.mli ../../../../inria/ocaml/boot/ocamlrun ../../../../inria/ocaml/boot/ocamlc -use-prims ../byterun/primitives -strict-sequence -absname -w +a-4-9-41-42-44-45-48 -g -warn-error A -bin-annot -nostdlib -safe-string -strict-formats `sh ./Compflags pervasives.cmo` -c pervasives.ml

But the same thing worked when 4.10.2 was bootstrapping itself. The difference is that the more up-to-date build process has stdlib.ml and compiles that first. Is there a way to get past this?

The context for this attempt is that I am trying to see if I can build a native M1 binary for the ReScript compiler (or bs-platform), and hopefully figure out a way to decouple the ReScript build from OCaml’s own backend native code generation. 4.10.2 works on native M1, so in theory I should be able to use it to compile an earlier version of compiler to native, no? Sure the binary code generation of the older version still would not work but here we only need the frontend of the older compiler to work.

I don’t think building the compiler with a newer version of itself is fully supported, but you may want to try the following approach:

  • Install the compiler you want to compile with (either with opam or manually). I’ll assume it’s installed in /installdir/, replace by the actual installation directory in the following steps.
  • Checkout the sources of the compiler you want to build, and cd into its directory.
  • Run make CAMLC=/installdir/ocamlc.opt CAMLOPT=/installdir/ocamlopt.opt compilerlibs/ocamlcommon.cmxa

I think this should produce the expected result.

If you end up with errors because the standard library you’re building with is too recent, you will need to rebuild the stdlib too and this will be more complicated (as you’ve already seen).

I am using the boot/ compilers to avoid dependence on the wrong version of Stdlib etc. What I found is that init_env as defined in compmisc is responsible for initializing the initial environment. Despite the -nostdlib flag, Pervasives still needs to be loaded in order to proceed and nowadays that means opening Stdlib. So I made progress by shimming with a stdlib.ml with the following one-liner:

include Pervasives

and compiling it with the -nopervasives flag. This way when compiling with the newer compiler Pervasives is autoloaded matching the older behavior. With this step I was able to compile much of the old stdlib with some additional type annotations until it reaches the following statement involving a format string:

71 |   let l1 = String.length (sprintf "%.0f" st.minor_words) in
                                       ^^^^^^
Error: This expression has type 'a * 'b
       but an expression was expected of type
         CamlinternalFormatBasics.float_conv

Nowadays CamlinternalFormatBasics.ml defines

type float_conv = float_flag_conv * float_kind_conv

But in 4.06.1 it was a single variant type. The question is why ocamlc thinks that the format string literal is of the type defined in 4.10.2. Is this information hard coded in C code? Does that mean I should update camlinternalFormatBasics and camlinternalFormat to that of 4.10.2 to match what C code expects and pray that it does not break the 4.06.1 code some other way?

Again, you’re trying to compile an old stdlib with a new compiler. That’s a lot of trouble, because the stdlib is deeply linked to the compiler itself. However, the standard library is mostly backwards-compatible so there shouldn’t be any problem compiling the compiler itself with a more recent compiler and stdlib.

It’s likely that you’ll need to have a version of the stdlib compatible with your newly-compiled compiler, so there will be an additional step needed where you rebuild the stdlib with your new compiler, but for now I would start by checking that you can indeed build the 4.06 compiler itself with a 4.10 compiler+stdlib.

@vlaviron You are right that I should first check if I could compile with 4.10+stdlib. Unfortunately that appears to be no. The first blocking change was from Format module (Format.pp_set_formatter_tag_functions is deprecated, suggested replacement Format.formatter_stag_functions has compatibility issues).

Going back to the previous issue from compiling 4.6.1 stdlib: it appears to be that for format string constants or literals ocamlc directly types them, which may conflict with the desired type in the old stdlib. But changes to the types of format string constants were somehow made. The change to float_conv was made in this commit. I wonder if some special procedure had to be followed for such changes. The change involved commits of binaries boot/ocamlc and boot/ocamllex. Is that common? Does that mean all the branches involved in the commit share a common boot/ocamlc? How often does boot/ocamlc etc change?

Format string constants seem to be a very special case as their types are inferred directly. I wonder if there are any other situation where the compiler also implicitly types things.

About format strings: when the typer knows that a given string is expected to have the special CamlinternalFormatBasics.format6 type, instead of compiling the string in the normal way, the compiler first builds an actual format value by parsing the string, then deconstructs this value to generate the AST node corresponding to it that will be placed instead of the string in the result.
This means that the CamlinternalFormatBasics module used to compile the typer must be the same as the one that will be linked with the programs generated by the compiler being compiled.

I tried to work around this problem by copying the camlinternalFormat* files from 4.06 in the typing directory (with a new name) and patching typing/typecore.ml to refer to these modules instead, and I eventually managed to get it to work.
Here is the command I used:

OCAMLPARAM=_,alert=-all,w=-60-66-67 make CAMLC=ocamlc.opt CAMLOPT=ocamlopt.opt compilerlibs/ocamlcommon.cmxa

I’ve had to make a number of small patches, in four categories:

  • Format-related patches. This includes the copy of the camlinternalFormat* modules, patches to typing/typecore.ml to make it use them, patches to the camlinternalFormat.ml copy to work around GADT-related errors (just some additional type annotations), patches to the build system (Makefile and .depend) to take them into account properly.
  • Build-system patches. Mostly around two tools that are used during compilation: tools/make_opcodes and tools/cvt_emit. The command that runs them needs to be patched so that it does not prefix them with boot/ocamlrun, since we’re compiling them with an already installed compiler they’ll run fine if called directly
  • Warning attributes: a number of files (mostly in the middle-end/ directory, but also a few in asmcomp) have warning headers that are too strict. I’ve added additional exceptions in the attribute for the warnings that I stumbled upon, mostly -3 for uses of Pervasives and -60-66-67 for various module-related warnings that would have no meaning in 4.06.
  • A patch to utils/identifiable.ml (and its interface) to make the filter_map function compatible with the version added in the stdlib recently. This meant removing the label, so a few files in middle-end that used the function needed to be patched too.

And with that I’ve successfully built the ocamlcommon.cmxa library. Apart from the tools patch, I believe that the resulting source tree can still be bootstrapped so you should be able to use the normal build process to produce the bytecode libraries (stdlib and the other libraries) if you need them.

I strongly advise against building the stdlib with a different version of the compiler than the one it’s made for, as parts of the stdlib are there to expose compiler-specific features. Even with a backwards-compatible interface, there is no guarantee that the underlying compiler features used to provide the interface stay the same, so it really needs to be compiled with the correct version of the compiler.

1 Like

You can disable the deprecation warnings and move on with your life :grin:

1 Like