New name: ocamlcc - the OCaml Compiler Collection. Analogous to GCC, the GNU Compiler Collection. Not only do we have multiple compilers - ocamlc.byte, ocamlopt.byte, etc., the compilers themselves may serve to drive a C toolchain, to compile, assemble, and link C code.
Platforms and toolchains have been revised to support recursive (staged) builds.
Recursive builds. Well, at least quasi-recursive. There is only one target for building a compiler, bin:ocamlcc; to build the compiler, this target needs a compiler, which it gets from a toolchain, which depends on bin:ocamlc, which needs a compiler, âŚâ recursively, until we get to the base case, the precompiled boot/ocamlc compiler. So the boot compiler (boot/ocamlc) builds the stage 1 compiler which builds the stage 2 compiler which builds the stage 3 compiler. See Staged builds.
Revised configuration logic. The goal is to eliminate dependency on autotools (./configure). This is very much a Work-In-Progress; the code is in //config. For more info see notes/autoconf.
Use the link command from the Bazel CC toolchain. OBazl now uses information from the cc toolchain selected (and configured) by Bazel to set the command string used by OCaml to run the C linker (Config.mkexe). For more info see ocaml_cc_config and notes/linking. [TODO: same thing for the assemble command Config.asm]
Revised preprocessing. OBazl eliminates shell scripts and tools, instead using a template engine written in portable C to generate code.
In other words, this version includes some stuff beyond just getting the Bazel build to work, in particular concerning configuration and preprocessing. Part of the motivation there is to pave the way to Windows support by eliminating dependency on Unix-ish stuff.
Maintainers of the Makefiles may be interested in some of that stuff. Using templates for code generation instead of sh, awk, sed, etc. The template engine is written in portable C. Personally I find that using templates simplifies things considerably.
Feedback always welcome. The issue tracker is enabled on the github repo, and a discourse server is at obazl.
I donât see any discussion of build times. For compiler development, short feedback loops are important, so I care about the amount of time required to build the system, the ability to perform incremental rebuilds, and the abilty to quickly build only the part of the system that I rely on â typically, to debug type-checker issues, building ocamlc and ocaml (the bytecode version) is enough. We spend a lot of unpleasant Makefile maintenance time to ensure that parallel builds sort of work (poorly), just to reduce build time. How are the build times with your Bazel-based build system?
Incremental and parallel builds should just work in Bazel. I can see multiple build tasks running when I build.
Bearing in mind that 1) I havenât tried to optimize anything; and 2) the current build protocol does not include a âcoldstartâ; it just builds from the sources, so any changes to them will rebuild starting from boot/ocamlc, meaning ocamlc.byte gets built twice. Once I implement a coldstart target (to snapshot the result of the current process) that wonât happen. With that in mind I ran some builds:
bazel clean --expunge
time bazel run bin:ocamlcc --config=ocamlc.byte
âŚ
bazel run bin:ocamlcc --config=ocamlc.byte 0.04s user 0.05s system 0% cpu 1:41.85 total
The --expunge means Bazel starts from nothing, so these figures include Bazel startup time.
Then I made some changes and recompiled:
after a small change to config.ml:
bazel build bin:ocamlcc --config=ocamlc.byte 0.04s user 0.04s system 0% cpu 1:11.50 total
then I added a comment to typing/typecore.ml:
bazel build bin:ocamlcc --config=ocamlc.byte 0.03s user 0.04s system 0% cpu 48.076 total
then added a comment to bytecomp/bytegen.ml:
bazel build bin:ocamlcc --config=ocamlc.byte 0.03s user 0.04s system 0% cpu 41.853 total
add comment to driver/compile.ml
bazel build bin:ocamlcc --config=ocamlc.byte 0.02s user 0.03s system 1% cpu 3.032 total
add comment to bytecomp/emitcode.ml
bazel build bin:ocamlcc --config=ocamlc.byte 0.02s user 0.03s system 1% cpu 4.438 total
How does that compare?
A distinctive feature of the Bazel build is that targets can depend directly on submodules in a namespace (âwrappedâ lib) without depending on the whole thing. Between that and -opaque it might be possible to eliminate some unnecessary compilations. Havenât looked into it closely yet. Bazel also has tools to help with optimization (e.g. Build performance metrics - but I have not used them yet.
There are probably some things in the build rules themselves that can be optimized. I havenât worried about that yet; the first priorities are correctness, completeness, and user experience.
From a distance, this seems to be in a similar ballpark to what I would expect, but I donât know what your machine is like. You could compare yourself by running make ocamlc after the same changes. Another interesting measure would be âbuilding everythingâ â which we often need to do to run the testsuite.
Iâm not saying that build times should be a priority of your work, but I think that the documentation could benefit from some ballpark performance numbers to at least reassure people that the build times are not strongly degraded by the new build system. (âBuilding everything from scratchâ and âincrementally recompiling ocamlc.byte after a type-checker changeâ would be maybe two informative measures.)
Oops, forgot to mention: macbook pro 2.4 GHz 8-core intel i9 64 GB memory, macos Ventura 13.0.1
Yeah, good point, people will want to see that regardless of my priorities. Iâll set aside some time this weekend to address this. I can say something about Bazel features involving performance even if I donât have benchmark numbers.
Good thing you asked. I discovered a bug in my dependency logic that was causing a bunch of unnecessary recompiles, which made things way slower than the makefile builds. Now theyâre comparable. Adding a comment to bytecomp/btype.ml after a full compile (three rounds each):
Bazel (â//config/stage=0 means only build the first stage):
bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
0.02s user 0.03s system 4% cpu 1.322 total
bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
0.02s user 0.03s system 3% cpu 1.293 total
bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
0.02s user 0.02s system 3% cpu 1.247 total
Makefiles:
make ocamlc 0.78s user 0.32s system 70% cpu 1.541 total
make ocamlc 0.77s user 0.23s system 97% cpu 1.021 total
make ocamlc 0.80s user 0.23s system 97% cpu 1.061 total
For the record, I find your reporting format a bit hard to read;
bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
0.02s user 0.03s system 4% cpu 1.322 total
Iâm not sure what âuserâ, âsystemâ and âtotalâ mean here. I would guess that the command ran in about 0.05s second (0.02s time spent in userland, 0.03s in system routines), consuming 4% of a CPU core, but took 1.322 seconds of compute time in total. This suggests a massive amount of parallelism (0.05s of âreal timeâ, time a human user has to wait at the prompt, for 1.3s of compute time, summing all CPU cores involved, thatâs 26 cores assuming perfect speedup), but I think that it is more likely that bazel uses detached processes in a way that confuses your system time accounting.
On the other hand, the make times are easier to interpret to me
make ocamlc 0.78s user 0.32s system 70% cpu 1.541 total
This suggests 1.54s of compute time, for 0.78s+0.32s = 1.1s of ârealâ time, so a 1.4x parallelism speedup. (But maybe my guess on what total means is wrong.)
I ran âtime â under zsh for both, so the output fields have the same meanings. I believe âtotalâ means wall time. The bazel command says to build ocamlc.byte but only stage 0 (the first build, using boot/ocamlc).
Below are the numbers using instead /usr/bin/time (on a mac) with -l (print rusage), -h (human readable output), -p (one line per measurement), after building, then adding a comment to typing/btype.ml. My understanding is that âuserâ and âsysâ refer to cpu time, the former for user-mode (libs), the latter for time spent in the kernel, and ârealâ is wall clock time.
$ /usr/bin/time -l -h -p bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
INFO: Analyzed target //bin:ocamlcc (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //bin:ocamlcc up-to-date:
bazel-bin/bin/_boot/ocamlcc
INFO: Elapsed time: 1.194s, Critical Path: 1.07s
INFO: 6 processes: 2 internal, 4 darwin-sandbox.
INFO: Build completed successfully, 6 total actions
real 1.22
user 0.02
sys 0.02
7147520 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
3605 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
9 messages sent
26 messages received
1 signals received
4 voluntary context switches
278 involuntary context switches
25046991 instructions retired
32757699 cycles elapsed
1982464 peak memory footprint
and with makefiles:
$ /usr/bin/time -l -h -p make ocamlc
... echoed cmds omitted ...
real 1.03
user 0.75
sys 0.22
153260032 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
76009 page reclaims
62 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
5 signals received
208 voluntary context switches
276 involuntary context switches
480335554 instructions retired
298145387 cycles elapsed
3883008 peak memory footprint
Pretty huge difference for user and sys times. The difference in real time is no doubt due in part to Bazel overhead - it runs in the JVM as a service.
I tried running a clean build but got an error:
$ make clean
$ /usr/bin/time -l -h -p make ocamlc
... elided ...
./runtime/ocamlrun tools/make_opcodes -opcodes < runtime/caml/instruct.h > bytecomp/opcodes.ml
/bin/sh: ./runtime/ocamlrun: No such file or directory
make: *** [bytecomp/opcodes.ml] Error 127
real 73.94
user 37.38
sys 5.03
So I ran âmake worldâ, but I donât know how to then force a complete rebuild of ocamlc. BTW, there is no analogous target under Bazel; you canât build everything at once. Thatâs a future enhancement.
Under bazel:
$ bazel clean
$ /usr/bin/time -l -h -p bazel build bin:ocamlcc --config=ocamlc.byte --//config/stage=0
INFO: Analyzed target //bin:ocamlcc (82 packages loaded, 1559 targets configured).
INFO: Found 1 target...
Target //bin:ocamlcc up-to-date:
bazel-bin/bin/_boot/ocamlcc
INFO: Elapsed time: 46.831s, Critical Path: 28.38s
INFO: 989 processes: 463 internal, 526 darwin-sandbox.
INFO: Build completed successfully, 989 total actions
real 46.87
user 0.03
sys 0.03
7380992 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
3680 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
39 messages sent
197 messages received
1 signals received
4 voluntary context switches
660 involuntary context switches
25256424 instructions retired
29407371 cycles elapsed
2060288 peak memory footprint
Iâm not sure what to make of those user and sys numbers, theyâre almost the same for a clean build as they are for a build involving only compile of one module. I ran it again with verbosity on, to make sure everything was in fact being built and got similar numbers:
real 33.12
user 0.04
sys 0.06
P.S.: The makefiles apparently always build all of the runtime variants (ocamlrund, ocmamlruni, etc.). Bazel only builds what you ask for. If you want a debug runtime, you add --//runtime:DEBUG; if you also want a âdebuggerâ build (with -g), add --//runtime=dbg (note: â=â not â:â). For an instrumented runtime: --//runtime:INSTRUMENT.
So to compare build times for the runtime, I ran âmake worldâ, then added a comment to runtime/intern.c, then:
$ /usr/bin/time -h make runtime -j
...
cd stdlib; ln -sf ../runtime/libcamlrun.a .
0.83s real 1.29s user 0.54s sys
(By my count this cmd builds at least five libs: libcamlruni.a, libcamlrun.a, libcamlrun_pic.a, libcamlrun_shared.so, and libcamlrund.a, and three executables: ocamlruni, ocamlrun, ocamlrund.)