Aha. Needed a little more care with the placing of abstractions for continuations.
Erasing post, b/c now it all works as desired.
Restoring the deleted post:
I was recently chewing over monadic concurrency, and got to wondering what the performance difference is, between writing in direct style, and writing in the -simplest- kind of monadic style I could imagine: just continuations in tail position.
So I wrote a benchmark: https://github.com/chetmurthy/event/blob/master/tests/test_ab.ml
It’s a simple test with two modes: “direct” and “kont”. In both modes, it loops thru a buffer, fetching a character and bitwise-or-ing it into an accumulator (to simulate -using- the fetched byte, and hopefully forcing the optimizer and hardware to wait for that fetch). And I’m using the 4.11.1+flambda compiler, with “-O3 -unbox-closures”, so hopefully I’m getting excellent inlining and allocation-avoidance.
And yet, direct-style is 5x faster than CPS on this (admittedly micro)benchmark. I wonder why … It’s been decades since I went down into the assembler … does anybody have any ideas on what might be going wrong here?
./test_ab.opt direct 1024000 258MiB/sec: 1024000 read in 0.003792 secs ./test_ab.opt kont 1024000 50MiB/sec: 1024000 read in 0.019679 secs
P.S. I guess, for such a simple benchmark, I was hoping that the compiler could inline enough to discover the loop. Ah, well.