Cancelling a CPU bound operation

orbitz · March 6, 2025, 12:09pm

Hello, I’m implementing a multi-tenant service in Ocaml and I’m trying to make some QoS guarantees.

One issue is that I have a synchronous function which is executed using input from the user, and that function is complicated enough that it’s difficult to determine exactly how expensive executing that function call is. 99.9% of inputs will have the function call essentially be free but some user inputs can make that function call extremely expensive. There is some heuristics I could implement but they all depend on what kind of hardware the function is being executed on and all sort of hand wavey.

I’d like to ensure that the function execution will be cancelled if it does not complete within a specified amount of time. It is, of course, important that the program is in a well-defined state after this.

I’m already offloading the function call to another domain so it does not block the main domain, however that is insufficient because the domain pool can be saturated quite cheaply.

I see that there is a GitHub issue for being able to asynchronously cancel a domain[0] but it is still open.

Any recommendations here?

So far my best idea is to toss the function into another CLI and spawn it, however that is pretty expensive solution, both in terms of run-time in the average case and code to implement. Another option is that…old school CGI-like architecture makes a lot more sense in this context, and then offloading the whole request handling to a separate process.

[0] Support for interrupting another domain · Issue #11411 · ocaml/ocaml · GitHub

jfeser · March 6, 2025, 3:03pm

This library looks interesting: Memprof_limits (memprof-limits.Memprof_limits). I haven’t tried it myself, however.

I think the most robust solution is to modify the function to check a flag, which allows you to ensure that it always stops somewhere reasonable and does appropriate cleanup. But that does require some code changes.

Chet_Murthy · March 6, 2025, 3:03pm

Long ago people tried to do this in Java, and eventually after many tears were shed, the answer was that “you can’t get there from here.” Asynchronously cancelling a systhread (that hasn’t cooperatively established at what points it is willing to be canceled, so it can cleanup) is always liable to leave things in a mess. Long ago people tried to do multi-tenancy in Java, and eventually came around to one of two solutions:

(a) invent your own language that will insert the correct “savepoints” into generated code so that some external actor can cancel threads by instructing the errant thread to terminate at one of those savepoints (it’s been a long time, but IIRC Salesforce Apex did that)

(b) use multiple processes/address-spaces.

Or as I once put it, “there is only one application lifecycle: it starts with fork/exec, and it ends with exit”.

P.S. One thing people sometimes do is to keep a pool of processes hanging around, to dispatch work to, so they can avoid process startup overhead and amortize that overhead across many invocations (in the expected case, that these invocations don’t spin off into an infinite loop). Since this is a GCed pointer-safe language, you can at least have reasonable assurance that previous dispatching of work to a process didn’t leave around turds for the next invocation to trip over.

orbitz · March 6, 2025, 4:12pm

Thank you, I did see memprof limits in the GitHub issue but I clearly didn’t read deep enough and just assumed it was memory related.

jfeser · March 6, 2025, 5:05pm

Yeah, that was my initial assumption as well. My main worry about this approach is that it requires all of your code (and any library code) to be async-exception-safe. I think that might not be true of most OCaml code.

lambda_foo · March 6, 2025, 11:11pm

Memprof_limits looks promising and rather similar to what I was going to suggest from Haskell Asynchronous Exceptions in Practice · Simon Marlow. In Haskell it works nicely with some corner cases about needing to bracket and masking exceptions for critical sections. For OCaml I think more code will not be async-exception-safe and it’ll be harder to have working general OCaml code.

Rucikir · March 11, 2025, 10:50am

If you’re on Unix you could use Unix.setitimer (based on POSIX setitimer and soon POSIX timers) to have the OS send you a signal after some time, install a signal handler, and raise from the signal handler to interrupt the thread/domain. I’m not sure if you can guarantee that the signal handler will be executed in the second domain, though.
We’ve had success using GC alarms or GC.Memprof (statmemprof) callbacks for OCaml 4.14 and 5.3. I suppose Memprof_limits is based on it.
Another weird idea was to use a PPX to instrument the functions that could be stopped with points where it would check how much time has elapsed.

Topic		Replies	Views
[ANN] memprof-limits preview (and a guide to handle asynchronous exceptions) Community announce	1	1380	May 20, 2020
Multicore Ocaml: parallel_for Learning opam	1	661	August 19, 2022
Coordinating (Java) systhreads / domains Learning multicore , java	6	317	April 13, 2025
[ANN] memprof-limits (first official release): Memory limits, allocation limits, and thread cancellation, with interrupt-safe resources Community announce	1	945	August 30, 2021
Understanding cancellation (in eio) Ecosystem multicore , effects	56	4485	March 24, 2022

Cancelling a CPU bound operation

Related topics