Dear Ivan, Xavier, Yotam,
thank you very much for your answers, help is really appreciated!
I think I have a better understanding now on what the situation is regarding process interruption in different platforms.
Maybe I can add a bit more the concrete use case I have.
We are developing a user interface for “reactive” proof checking. In such model, the IDE forwards changes to a daemon which will then update the document and initiate checking of proofs.
Such checking is by nature speculative; when the user updates the document, ongoing proof checking has to be cancelled and new one has to be launched, in an incremental fashion.
In this case, the controller is a “Language Server Protocol” daemon for Coq. The controller never does any expensive computation and it is basically a polling loop, so it works fine.
On the worker end, we have full-fledged Coq processes, and they have a pretty fat state and like to do long, costly computations.
The main example of reactive proof checking is Isabelle/PIDE, which TTBOMK is implemented over a preemptive threading model. In this setting, speculative execution is carried out by futures, however these are cheap to cancel.
Indeed I think I’ve heard the PIDE developers say that it took a long time to fine-tune the runtime so thread cancellation worked properly among platforms.
Going back to the Coq case, it seems to me indeed that we are facing a hard problem, as these kind of reactive UI interfaces seem to be one of the few use cases where a multi-threaded implementation may work better than a message-passing model.
Even so, my guess is that at least for Linux, using signals to interrupt should work well enough: it is unlikely that a Coq process is stuck on a true non-allocating loop. But of course, as we are bootstrapping a new language server, we’d like to provide a good Windows experience as much as possible.
Unfortunately adding polling to Coq is not a good option; in fact, there is already a “polling” check in Coq (
check_for_interrupt), and even if it is just a few instructions, it has an observable impact on performance; having to do a system call would likely be bad given the nature of Coq hot paths such as reduction. It also wouldn’t work in all cases, as not every hot path has been instrumented to call the poll function.
I’d guess that the way to proceed would be to use some heuristics so users have to opt-in for expensive computation to take place [at the risk of having to use the
KILL hammer) and indeed, try to thin state and process startup time. That should provide a decent experience on Windows for a wide range of users.
With all of this into account, IMHO the following questions do remain:
are we missing some better possibility than what is described in the summary above [killing processes on windows, trying to signal, if still stuck kill on Unix] ?
is there some chance we could improve the Windows behavior in this aspect? Is there some existing Mantis issue / roadmap on this aspect?
We would be happy to add some platform-specific code to enable a better interruption model on Windows, would this be doable in some principled way?
We could even contribute some manpower to help with that if needed.