Hi! I’ve been doing profiling with landmarks and it has worked fairly well, but my current codebase uses a particular monadic style that makes profiling quite difficult, and I wanted to gather ideas or information on how to make this easier.
Basically, most of the execution is spent in a 'a Iter.t monad (ie. ('a -> unit) -> unit), to model symbolic execution and non-determinism.
Furthermore, we make good use of wrapper functions, e.g. this simple wrapper for an effect for global state:
let with_crate (crate : t) f =
let open Effect.Deep in
try f () with effect Get_crate, k -> continue k crate
The combination of these two things makes profiling quite unpractical:
-
Because of the wrapper functions, my entire profile is a dozen calls deep, making it hard to read (this is a minor problem). I was wondering if there are alternatives to these wrapper functions, or an option when profiling to omit these frames
-
Because our monad uses smaller functions very heavily, and we make extensive use of let-binds to compose computation, the main monadic functions do not show up at all in the profile. For instance, I have an
exec_stmt : stmt -> unit Iter.tfunction, that despite being called hundreds of thousands of times does not appear.
I suspect this is because of the let-binds, or possibly because it is immediately wrapped in a function too:and exec_stmt stmt : unit t = let@ () = with_loc ~loc:stmt.span in match stmt.kind with | Nop -> ok () | Assign (({ ty; _ } as place), rval) -> let* ptr = resolve_place place in let* v = eval_rvalue rval in State.store ptr ty v | ...This is a small snippet of that function. Note we define
let@as( @@ ), so this is effectively doing:with_loc ~loc:stmt.span @@ fun () -> match stmt.kind with | ...And similarly,
let*is basicallyIter.flat_map, so that’s just an additional closure.
If you have any recommendations on how profiling can be made easier with our current implementation please advise ! I am very happy to add ppx annotations or do small modifications that would make profiling more effective (as far as performance isn’t negatively affected).
I care a lot about performance and this project has been quite hard to benchmark :P
Thanks !
