After finding _caml_apply2
high in the perf profile where I don’t expect while tuning some code, I have not been able to figure out how to convince ocamlopt/flambda to inline and resolve functions as I would like.
It seems that when a functor is passed a recursive module as argument, even if the functor gets inlined, calls to the functions in the argument module are still not resolved, and get called using e.g. _caml_apply2
. Is there any way around that (well, other than rewriting the arguments to be non-recursive modules)?
As a concrete example, consider:
❯ cat -n rec_mod.ml
1 module Key : sig
2 type t
3
4 val compare : t -> t -> int
5 val of_string : string -> t
6 end = struct
7 include String
8
9 let of_string s = s
10 end
11
12 module rec Key_rec : sig
13 type t
14
15 val compare : t -> t -> int
16 val of_string : string -> t
17 end = struct
18 include String
19
20 let of_string s = s
21 end
22
23 module KeyMap = Map.Make (Key)
24 module KeyMap_rec = Map.Make (Key_rec)
25
26 let key_test () = Key.of_string "one"
27 let key_test_rec () = Key_rec.of_string "one"
28
29 let test () =
30 let open KeyMap in
31 let k = Key.of_string "one" in
32 find k (add k 1 empty)
33
34 let test_rec () =
35 let open KeyMap_rec in
36 let k = Key_rec.of_string "one" in
37 find k (add k 1 empty)
After compiling with:
❯ ocamlopt -g -O3 -S rec_mod.ml
we can see the difference between KeyMap
and KeyMap_rec
in the generated assembly code. For the non-recursive functor argument Key
, _caml_string_compare
is called directly:
_camlRec_mod__test_126:
.loc 5 29 9
.cfi_startproc
subq $8, %rsp
.cfi_adjust_cfa_offset 8
L668:
movq _camlRec_mod__const_immstring_156@GOTPCREL(%rip), %rsi
movq %rsi, %rdi
.loc 3 202 28
call _caml_string_compare
cmpq $1, %rax
jne L667
movl $3, %eax
addq $8, %rsp
.cfi_adjust_cfa_offset -8
ret
.cfi_adjust_cfa_offset 8
.align 2
L667:
cmpq $1, %rax
jge L666
movl $1, %ebx
jmp L665
.align 2
L666:
movl $1, %ebx
L665:
movq _camlRec_mod__const_immstring_156@GOTPCREL(%rip), %rax
.loc 1 140 15
addq $8, %rsp
.cfi_adjust_cfa_offset -8
jmp _camlRec_mod__find_397
.cfi_adjust_cfa_offset 8
.cfi_adjust_cfa_offset -8
.cfi_endproc
But for the recursive argument module Key_rec
, the call is indirect using GOTPCREL
and _caml_apply2
:
_camlRec_mod__test_rec_149:
.loc 5 34 13
.cfi_startproc
subq $8, %rsp
.cfi_adjust_cfa_offset 8
L675:
movq _camlRec_mod__Key_rec_223@GOTPCREL(%rip), %rax
movq (%rax), %rax
.loc 5 36 10
movq 8(%rax), %rbx
movq _camlRec_mod__const_immstring_156@GOTPCREL(%rip), %rax
.loc 5 36 10
movq (%rbx), %rdi
.loc 5 36 10
call *%rdi
L669:
movq %rax, (%rsp)
movq _camlRec_mod__Pmakeblock_arg_213@GOTPCREL(%rip), %rbx
movq (%rbx), %rdi
movq %rax, %rbx
.loc 1 138 18
call _caml_apply2
L670:
cmpq $1, %rax
jne L674
movl $3, %eax
addq $8, %rsp
.cfi_adjust_cfa_offset -8
ret
.cfi_adjust_cfa_offset 8
.align 2
L674:
cmpq $1, %rax
jge L673
movl $1, %ebx
jmp L672
.align 2
L673:
movl $1, %ebx
L672:
movq (%rsp), %rax
.loc 1 140 15
addq $8, %rsp
.cfi_adjust_cfa_offset -8
jmp _camlRec_mod__find_1037
.cfi_adjust_cfa_offset 8
.cfi_adjust_cfa_offset -8
.cfi_endproc
(Just for context, I wouldn’t be asking this is the profiler wasn’t telling me to look hard at code that is analogous to this.)
Tagging some flambda folks who might be interested: @mshinwell, @lpw25, @vlaviron.