I’m writing bindings to a C library. There is a create function which allocates and initializes a C library object and returns it as a custom OCaml value. Creating the object with the C library allocates resources that stick around after the program exits unless cleaned up with the library delete function. I’ve written a finalization callback to clean up the resources but it’s not called when the program exits. The OCaml GC tutorial says that you can force the finalization callback to be called with at_exit Gc.full_major. So, question: Are there downsides to registering a full GC collection at program exit from a library in order to clean up resources?
Happy to provide some code if that might clarify the question.
[have to ask] One assumes those resources are -outside- the process itself, yes? E.g. shared-memory segments which need to be cleaned-up? [or files, named-pipes, etc, etc] Stuff like that? B/c if they’re in-process-only, there’s no need to clean them up, right?
@Chet_Murthy oops, I should have clarified. Yes, the resources are outside the process. I’m creating an AF_XDP socket and there are some resources allocated in the kernel that need to be released. If they aren’t released the program won’t be able to reuse them after restarting.
In that case,either
(a) this doesn’t apply
or (b) you’ve thought thru the issues related to “finalizing multiple external resources during process-exit, but in the wrong order”, yes? I had to deal with this in a database-access extension for Perl5 … decades ago.
Regardless, it’s an important problem. In a way, maybe Ocaml should have a special version of the GC ,that simply scans for finalizable blocks, calling their finalizers, and allocating whatever nursery is required, not bothering to GC at all. B/c after all, anybody who does nontrivial consing in a finalizer … not worthy of consideration grin.
I guess my worry was that calling Gc.full_major could cause issue with other libraries or other parts of the program. I just couldn’t figure out why the finalizers weren’t called by default.
Well, that’s pretty easy-to-answer. There’s a famous (?) quote from Hans Boehm: “finalization is no excuse for inadequate resource management”. Lots of subsystems don’t adequately manage resources (e.g., keeping track of which processes have access to interprocess comms objects", and finalization isn’t a solution. For instance, when I worked with DB2, it used shmem/sem/queues for IPC. And if DB2 crashed -hard- (or a machine-local client did), those resources didn’t get cleaned-up. We all had our “ipcclean” scripts that we ran religiously in order to clean that shit up, after stopping and before restarting DB2.
So you can imagine that a language-runtime implementer might just punt on the issue: after all, doing a full GC can’t actually solve all those issues (just make them less likely).
This is an excellent suggestion! I had never thought of it, but for anything whose finalization does not require actual ML code, this … is a perfect solution.
ETA [not sure, but it seems …]: But upon reading the C++ standard, it isn’t so clear that std::shared_ptr will be cleaned-up at C++ std::exit time. It seems unavoidable to explicitly keep a list of pointers, and register an atexit() handler to finalize 'em at exit() time. Ah, well.
Thanks for your thoughts here folks. This helps clarify my thinking on this.
I’m trying to decide between two different interfaces:
module Socket : sig
type t
val create: string -> t
val delete: t -> unit
val do_something: t -> unit
end
and
module Socket: sig
type t
val create: string -> t
val do_something: t -> unit
end
In each interface the create function is a binding to a C function. create device allocates memory and installs a bpf program on the ethernet interface device.
In the first interface delete frees the memory and uninstalls the bpf program. If the bpf program is not uninstalled with delete you’ll get an error when you call create again. The OS doesn’t uninstall the bpf program after the program exits.
The applications that I’m writing generally don’t need to create a socket on the same interface multiple times during program lifetime. To make my life easier, I was thinking I could do the work of delete in a finalization callback. This way I wouldn’t have to worry about double-frees or forgetting to call delete. For this to work, I need to for the finalization callback to be executed at program exit which I can do with at_exit Gc.full_major
[been thinking about your problem, and how to solve it without heavy lifting]
Assumptions:
(1) there are a number of objects that need to be cleaned-up.
(2) it doesn’t matter in which order they’re cleaned-up [if it does, that can be addressed, but for now, I’ll ignore it]
(3) there are pointers for these objects, that your Ocaml code manipulates.
You could use a weak-pointer table and put the pointers to the objects in there. At exit time, you could walk the table and call the finalizers. No need for a full GC.
What about the following interface, which is explicit about at which point in the program lifetime the resource is disallocated?
module Socket: sig
type t
val with_socket : string -> (t -> 'a) -> 'a
end
Two remarks:
In the implementation, remember to also consider the case where the function raises an exception and cleanup your resource there as well; Fun.protect from the standard library can help you do this.
If you like cool syntactic tricks, you can write
Socket.with_socket name @@ fun socket ->
<the rest of my code>
to have something that looks a bit like a declaration (of socket) with a body, instead of a nested subfunction.
As others have said using/relying on Gc.full_major to free your resources is a bad idea.
However I would like to say that having an option – especially in test programs – to run Gc.compact at exit is very useful. It’s particularly good at finding memory corruption / logical errors in your bindings. For some tools like virt-v2v we have a hidden option --debug-gc which does precisely this and is used in non-production builds and tests.
I like this a lot. I was trying to help the interface consumer by having the cleanup work done automatically even in the uncaught exception case. with_socket and Fun.protect seems like a more reasonable approach.