Performance of the new effects in comparison to monads

Cjen1 · November 2, 2022, 12:21pm

My reading of the effects paper is that in addition to improving on the programmability, that the effects should be strictly equal to or higher performance than the equivalent monadic code (modulo things like flambda). With the primary reason being that it can avoid allocating closures etc.

My microbenchmark is at: prototyping-testing/main.ml at 37f5d35ea6f644339bfb6426e47d6b4b1de0b93b · Cjen1/prototyping-testing · GitHub

My use case is a state machine for a distributed system where on each iteration it receives a list of incoming messages and emits a list of messages to send.

In the benchmark I am testing just the emitting of the messages to send by looping to send all the ‘messages’ (integers here from n → 0). There is also a (minimal?) variation which does the same using a reference.

I suspect that what is happening is a tradeoff between unrolling the stack (for the effect system) and the allocations for the monad, but it is also highly likely that I have missed something obvious!

(also just to point out that the effects system not being quite as fast as for example the monadic approach means nothing when you consider that you get a very very good way of writing generic effectful code without having to deal with coloured functions).

("Starting test" (n 10))
Estimated testing time 40s (4 benchmarks x 10s). Change using '-quota'.
┌────────┬──────────┬─────────┬────────────┐
│ Name   │ Time/Run │ mWd/Run │ Percentage │
├────────┼──────────┼─────────┼────────────┤
│ EffFun │ 353.02ns │ 225.00w │    100.00% │
│ EffRef │ 199.26ns │ 178.01w │     56.45% │
│ Monads │  91.73ns │ 234.00w │     25.98% │
│ SimRef │  34.06ns │  36.00w │      9.65% │
└────────┴──────────┴─────────┴────────────┘

("Starting test" (n 1000))
Estimated testing time 40s (4 benchmarks x 10s). Change using '-quota'.
┌────────┬──────────┬─────────┬──────────┬──────────┬────────────┐
│ Name   │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────┼──────────┼─────────┼──────────┼──────────┼────────────┤
│ EffFun │  30.64us │ 20.03kw │  116.31w │  116.31w │    100.00% │
│ EffRef │  16.37us │ 15.03kw │  173.63w │  173.63w │     53.43% │
│ Monads │  10.10us │ 22.01kw │  126.35w │  126.35w │     32.97% │
│ SimRef │   3.72us │  3.01kw │   34.39w │   34.39w │     12.13% │
└────────┴──────────┴─────────┴──────────┴──────────┴────────────┘

("Starting test" (n 10000))
Estimated testing time 40s (4 benchmarks x 10s). Change using '-quota'.
┌────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name   │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ EffFun │ 415.01us │ 200.03kw │  11.47kw │  11.47kw │    100.00% │
│ EffRef │ 333.82us │ 150.04kw │  17.19kw │  17.19kw │     80.44% │
│ Monads │ 230.45us │ 220.02kw │  12.59kw │  12.59kw │     55.53% │
│ SimRef │  70.04us │  30.01kw │   3.43kw │   3.43kw │     16.88% │
└────────┴──────────┴──────────┴──────────┴──────────┴────────────┘

edwin · November 2, 2022, 8:33pm

With effects I think you only need to pay the “cost” of the effect when the effect is actually triggered (e.g. on a syscall, although I assume for network syscalls you could also perform them optimistically and rely on EAGAIN to trigger the effect, but I haven’t checked how EIO is implemented).

With monads you have to pay that cost all the time, because each computation and function return type needs to be wrapped in the monad, which can result in the allocation of a lot of closures, especially when a value is waiting on the return of a syscall (some more short-lived than other, although Lwt has some optimizations here when the result is immediately known I think it can still build up chains of deferred values when the Lwt.t result is not immediately available).

So a more realistic comparisons of concurrency monads vs effects would be to measure both paths:

immediate value propagation where the monad, or the compiler via flambda can optimize away some of the overhead
suspended/blocked values, which might trigger the construction of a lot of closures to be run when the value is available (probably best to measure the Lwt and Async implementation here directly, because they’ll be a lot more complicated than simple ones, especially that they might also have to deal with cancelation, etc. and they may also have some optimizations,
e.g. I think Lwt uses mutation internally to avoid some pathological cases
). This depends on the depth of the call-stack (so e.g. try 10-30 binds after the “sleeping” monad value), but also if you happen to fold_left bind on a list the chains can be much much longer.

I think effects can avoid the latter problem altogether, and see Reflection without Remorse for a description of the problem (it affects Haskell more because it doesn’t have mutation, and has to resort to other data structures to recover the performance cost with long monad chains, I think Lwt achieves a similar optimization through the use of mutation, but a naively implemented monad would have the performance cost)

VPhantom · November 4, 2022, 10:06am

Sorry if this seems out of place, but what’s this Caml.Effect module you open? I don’t see it in the standard library nor find any reference to it on the web.

hyphenrf · November 4, 2022, 11:23am

They (probably) use janestreet libs, which reexport stdlib as caml.

Cjen1 · November 4, 2022, 11:46am

Yep using Core requires the reexport (afaik).

VPhantom · November 4, 2022, 12:17pm

Granted, but I don’t see an Effect module in the stdlib reference at https://v2.ocaml.org/manual/stdlib.html

Cjen1 · November 4, 2022, 12:20pm

Yep it gets added in OCaml 5.0.

github.com

ocaml/ocaml/blob/5.0/stdlib/effect.ml

(**************************************************************************)
(*                                                                        *)
(*                                 OCaml                                  *)
(*                                                                        *)
(*      KC Sivaramakrishnan, Indian Institute of Technology, Madras       *)
(*                                                                        *)
(*   Copyright 2021 Indian Institute of Technology, Madras                *)
(*                                                                        *)
(*   All rights reserved.  This file is distributed under the terms of    *)
(*   the GNU Lesser General Public License version 2.1, with the          *)
(*   special exception on linking described in the file LICENSE.          *)
(*                                                                        *)
(**************************************************************************)

type 'a t = ..
external perform : 'a t -> 'a = "%perform"

type exn += Unhandled: 'a t -> exn
exception Continuation_already_resumed

This file has been truncated. show original

VPhantom · November 4, 2022, 12:36pm

Oh sorry, I keep forgetting that 5.0 isn’t the official release yet. (I’m still on 4.13.1 but I have my eye on 5.x for Eio & friends. Didn’t think effects would be in that release; I thought they were for much later. I must be confusing “effects” vs “typed effects”.)

hyphenrf · November 4, 2022, 12:40pm

Indeed, there hasn’t been much (visibly) going on for effect typing in the compiler, but the runtime mechanism for effect handlers (delimited control support, new garbage collector, etc…) has been in the works for the last few years and has successfully landed on 5.0.0

alan · November 4, 2022, 12:56pm

Delimited Continuations vs Lwt for Threads | MirageOS seems like a relevant article.

hyphenrf · November 5, 2022, 12:26pm

That post is quite old and focuses on user-level delimited continuations from the honestly excellent delimcc library. Effects’ delimited continuations are first-class in the runtime and limited to linear mode of usage… I imagine that gives them different perf characteristics?

There was a blog post where the authors of angstrom observed perf improvements when they switched from lwt to effects, let me see if I can find it…

Topic		Replies	Views
Monadic concurrency vs algebraic effects Ecosystem multicore	1	975	September 11, 2020
IO Monad for OCaml Learning effects	13	7239	November 8, 2019
[mystery solved] Compiling with continuations: flambda and performance Ecosystem lwt , async , flambda , monads , concurrency	7	1240	October 16, 2020
Demo of power of OCaml effects? Learning	12	780	March 3, 2023
Am I wrong about Effects? I see them as a step back Learning	19	4296	December 14, 2022

Performance of the new effects in comparison to monads

Related topics