This error arises via Lwt:
Thread 1 killed on uncaught exception Sys_error(“Mutex.unlock: Operation not permitted”)
Called from file “thread.ml”, line 39, characters 8-14
tezos-node: internal error, uncaught exception:
(Sys_error “Mutex.lock: Resource deadlock avoided”)
Raised at file “src/core/lwt.ml”, line 2998, characters 20-29
Called from file “src/unix/lwt_main.ml”, line 26, characters 8-18
Called from file “src/bin_node/node_snapshot_command.ml”, line 124, characters 10-26
Called from file “cmdliner_term.ml”, line 25, characters 19-24
Called from file “cmdliner.ml”, line 26, characters 27-34
The high-level code in this example constraints lwt version to < 4.3, so I don’t yet have a test against lwt latest. This looks like it may be related to https://github.com/ocaml/ocaml/pull/9757/files
If anyone has a suggested workaround, or can help construct a simple example that triggers the error, I’d be very grateful.
This is very relevant to the discussion at https://github.com/ocaml/ocaml/pull/9757 , indeed.
What’s happening is that a thread is unlocking a mutex that was locked by another thread. This is not allowed in the mutex abstraction, but this is not mentioned in the OCaml documentation and several implementations of mutexes actually support this, while others may report an error (like here) or silently crash.
What would be nice to know is whether this “unlock a mutex locked by another thread” is a bug in the libraries involved in your backtrace (LWT, bin_node, cmdliner) or an essential part of their design. In the latter case we may need a Boolean semaphore abstraction in addition to the mutex abstraction.
@web are you using the “async_switch” mode of Lwt?
I’m not sure how to tell, but searching for setting of async_method to Async_switch finds nothing.
@web You can check via
val default_async_method : unit -> async_method
More details can be seen in the Lwt_unix docs