Dear community,
We recently used ocamlformat on qubes-mirage-firewall, and it leads to reproducible hashsum failures. To investigate this further, I tried with a simple Ocaml hello world program, and ran the following:
$ ls hello.* && \
cat hello.ml && \
ocamlc -no-keep-docs -no-keep-locs hello.ml && \
sha256sum hello.* && \
rm hello.{cmi,cmo}
hello.ml
let () =
Printf.printf "I print a string %s\n" "hello world";
44d8263b230861cc88ce77b0e487f900a92e18d410117dd0391eb46a55f1133e hello.cmi
c6783064d7e3f93d094dc5f0e3f09cb68f5e0795d50861d4a20c4fac41dbda03 hello.cmo
f53eb6f88ca1cc3ca9cadb48ebb68c95259885db0dc6587ae01b9c3154fb12ac hello.ml
and then:
$ sed -i 's/Printf.printf/ Printf.printf/' hello.ml && \
ls hello.* && \
cat hello.ml && \
ocamlc -no-keep-docs -no-keep-locs hello.ml && \
sha256sum hello.* && \
rm hello.{cmi,cmo}
hello.ml
let () =
Printf.printf "I print a string %s\n" "hello world";
44d8263b230861cc88ce77b0e487f900a92e18d410117dd0391eb46a55f1133e hello.cmi
0ac7fa1e82532d0ac9bae7970d3ccf64c36afaaed4c2af3a8813cdef166bddbd hello.cmo
5efdf847e0477a2b82b1e95b9fde66f85516b386c4125134e3065fee3a474c8e hello.ml
So the first run is my baseline, and in the second run I just add a space that shouldn’t interfere with the binary produced (as I understand it), as it’s just a space at the head of a line of code. I’m having trouble understanding why the cmo file has a different hashsum.
Perhaps I’ve missed some options that would allow me to have the same output file or maybe that’s unavoidable?
I wouldn’t be so sure about that. The idea of a reproducible build is rather that the same source gives equal results. Locations in stacktraces come to mind.
You’re right
And indeed the two cmos differ by one byte, diffoscopie gives me more differences, but I guess the cmo embeds bytes related to the source files.
I guess my answer is that it can’t be avoided
Since -g is not passed to the compiler, there shouldn’t be any locations in the resulting bytecode. I don’t think we want to give too many guarantees about the presence of locations or not even without -g, but this looks surprising to me. The -no-keep-locs flag only impacts cmi files, by the way (you can use it with -g without losing backtraces).
I understand that I won’t be able to avoid this issue because I have lots of __POS__ references in my duniverse folder when I try to compile a unikernel :+1 :
As an example, when I try (using Ocaml 5.3) the mirage hello world (mirage-skeleton/tutorial/hello at main · mirage/mirage-skeleton · GitHub), I find the issue again
$ opam --version
2.3.0
$ mirage --version
v4.9.0
$ ocamlc --version
5.3.0
$ opam pin
$ mirage clean && rm -rf duniverse && mirage configure -t spt && make depend
$ dune build && sha256sum dist/hello.spt
0bb1392c30a52e3990213b81eff5e4b2f1b9520d64273c27b2edcf1c8ab1f1e9 dist/hello.spt
$ <modify unikernel.ml by adding a unique space>
$ git diff
diff --git a/tutorial/hello/unikernel.ml b/tutorial/hello/unikernel.ml
index 33b3ccf..d35cbac 100644
--- a/tutorial/hello/unikernel.ml
+++ b/tutorial/hello/unikernel.ml
@@ -2,7 +2,7 @@ open Lwt.Infix
let start () =
let rec loop = function
- | 0 -> Lwt.return_unit
+ | 0 -> Lwt.return_unit
| n ->
Logs.info (fun f -> f "hello");
Mirage_sleep.ns (Duration.of_sec 1) >>= fun () ->
$ dune build && sha256sum dist/hello.spt
fbbcbd45eed68dff2f015134823cfcb3ae648f3f15bb29df77b92a50b774da7d dist/hello.spt