Cancelling a CPU bound operation

Hello, I’m implementing a multi-tenant service in Ocaml and I’m trying to make some QoS guarantees.

One issue is that I have a synchronous function which is executed using input from the user, and that function is complicated enough that it’s difficult to determine exactly how expensive executing that function call is. 99.9% of inputs will have the function call essentially be free but some user inputs can make that function call extremely expensive. There is some heuristics I could implement but they all depend on what kind of hardware the function is being executed on and all sort of hand wavey.

I’d like to ensure that the function execution will be cancelled if it does not complete within a specified amount of time. It is, of course, important that the program is in a well-defined state after this.

I’m already offloading the function call to another domain so it does not block the main domain, however that is insufficient because the domain pool can be saturated quite cheaply.

I see that there is a GitHub issue for being able to asynchronously cancel a domain[0] but it is still open.

Any recommendations here?

So far my best idea is to toss the function into another CLI and spawn it, however that is pretty expensive solution, both in terms of run-time in the average case and code to implement. Another option is that…old school CGI-like architecture makes a lot more sense in this context, and then offloading the whole request handling to a separate process.

[0] Support for interrupting another domain · Issue #11411 · ocaml/ocaml · GitHub

This library looks interesting: Memprof_limits (memprof-limits.Memprof_limits). I haven’t tried it myself, however.

I think the most robust solution is to modify the function to check a flag, which allows you to ensure that it always stops somewhere reasonable and does appropriate cleanup. But that does require some code changes.

Long ago people tried to do this in Java, and eventually after many tears were shed, the answer was that “you can’t get there from here.” Asynchronously cancelling a systhread (that hasn’t cooperatively established at what points it is willing to be canceled, so it can cleanup) is always liable to leave things in a mess. Long ago people tried to do multi-tenancy in Java, and eventually came around to one of two solutions:

(a) invent your own language that will insert the correct “savepoints” into generated code so that some external actor can cancel threads by instructing the errant thread to terminate at one of those savepoints (it’s been a long time, but IIRC Salesforce Apex did that)

(b) use multiple processes/address-spaces.

Or as I once put it, “there is only one application lifecycle: it starts with fork/exec, and it ends with exit”.

P.S. One thing people sometimes do is to keep a pool of processes hanging around, to dispatch work to, so they can avoid process startup overhead and amortize that overhead across many invocations (in the expected case, that these invocations don’t spin off into an infinite loop). Since this is a GCed pointer-safe language, you can at least have reasonable assurance that previous dispatching of work to a process didn’t leave around turds for the next invocation to trip over.

4 Likes

Thank you, I did see memprof limits in the GitHub issue but I clearly didn’t read deep enough and just assumed it was memory related.

Yeah, that was my initial assumption as well. My main worry about this approach is that it requires all of your code (and any library code) to be async-exception-safe. I think that might not be true of most OCaml code.

Memprof_limits looks promising and rather similar to what I was going to suggest from Haskell Asynchronous Exceptions in Practice · Simon Marlow. In Haskell it works nicely with some corner cases about needing to bracket and masking exceptions for critical sections. For OCaml I think more code will not be async-exception-safe and it’ll be harder to have working general OCaml code.

If you’re on Unix you could use Unix.setitimer (based on POSIX setitimer and soon POSIX timers) to have the OS send you a signal after some time, install a signal handler, and raise from the signal handler to interrupt the thread/domain. I’m not sure if you can guarantee that the signal handler will be executed in the second domain, though.
We’ve had success using GC alarms or GC.Memprof (statmemprof) callbacks for OCaml 4.14 and 5.3. I suppose Memprof_limits is based on it.
Another weird idea was to use a PPX to instrument the functions that could be stopped with points where it would check how much time has elapsed.