Causes of long futex calls using Async

I’m trying to debug a performance problem currently. The issue is that I have several nodes talking to each other via Async Rpcs and a single node which acts as a gateway to clients.

When the gateway doesn’t communiate with any other nodes performance is in the ~100k ops/s range, however as soon as the node tries to insert another rpc all to another node before it replies to the client it, performance drops to ~400 ops/s.

Checking strace for long syscalls I’m getting futex calls which are apparently taking ~1s to complete. Is this normal? I’d presume that this would be caused by some contention over probably the Async lock, however I can’t see why there should be that much contention?

I wouldn’t normally be so quick to blame the environment but there’s at least one condition signalling bug in latest shipping glibc that was undiscovered for about 4 years.

If you are on glibc, a quick way to verify is to try Ubuntu 16.04, which is before the pthread signalling code was majorly reworked and see if your performance is better.

EDIT: actually, you should go ahead and mention your platform details (including CPU make/model). There’s a surprisingly large number of defects that affect mutexes.


Thanks! I’ll check that tomorrow! I think its running Ubuntu 18.04, so that may be the cause.

Thanks for the pointer! Unfortunately having just built it on Ubuntu 16.04 there is no change. Which means something else is the problem. Thanks regardless!!


That throughput seems kind of slow. What make/model CPU? lscpu | grep 'Model name'

Also, try attaching to the running slow process with gdb -p $pid and running thread apply all bt? That prints the call stacks of all threads. Feel free to paste the output here.