Downsides to calling Gc.full_major at exit?

suttonshire · April 20, 2020, 12:33am

I’m writing bindings to a C library. There is a create function which allocates and initializes a C library object and returns it as a custom OCaml value. Creating the object with the C library allocates resources that stick around after the program exits unless cleaned up with the library delete function. I’ve written a finalization callback to clean up the resources but it’s not called when the program exits. The OCaml GC tutorial says that you can force the finalization callback to be called with at_exit Gc.full_major. So, question: Are there downsides to registering a full GC collection at program exit from a library in order to clean up resources?

Happy to provide some code if that might clarify the question.

Cheers

Chet_Murthy · April 20, 2020, 12:37am

[have to ask] One assumes those resources are -outside- the process itself, yes? E.g. shared-memory segments which need to be cleaned-up? [or files, named-pipes, etc, etc] Stuff like that? B/c if they’re in-process-only, there’s no need to clean them up, right?

suttonshire · April 20, 2020, 1:18am

@Chet_Murthy oops, I should have clarified. Yes, the resources are outside the process. I’m creating an AF_XDP socket and there are some resources allocated in the kernel that need to be released. If they aren’t released the program won’t be able to reuse them after restarting.

Chet_Murthy · April 20, 2020, 1:36am

In that case,either
(a) this doesn’t apply
or (b) you’ve thought thru the issues related to “finalizing multiple external resources during process-exit, but in the wrong order”, yes? I had to deal with this in a database-access extension for Perl5 … decades ago.

Regardless, it’s an important problem. In a way, maybe Ocaml should have a special version of the GC ,that simply scans for finalizable blocks, calling their finalizers, and allocating whatever nursery is required, not bothering to GC at all. B/c after all, anybody who does nontrivial consing in a finalizer … not worthy of consideration grin.

suttonshire · April 20, 2020, 1:48am

I hope so

I guess my worry was that calling Gc.full_major could cause issue with other libraries or other parts of the program. I just couldn’t figure out why the finalizers weren’t called by default.

Chet_Murthy · April 20, 2020, 2:59am

Well, that’s pretty easy-to-answer. There’s a famous (?) quote from Hans Boehm: “finalization is no excuse for inadequate resource management”. Lots of subsystems don’t adequately manage resources (e.g., keeping track of which processes have access to interprocess comms objects", and finalization isn’t a solution. For instance, when I worked with DB2, it used shmem/sem/queues for IPC. And if DB2 crashed -hard- (or a machine-local client did), those resources didn’t get cleaned-up. We all had our “ipcclean” scripts that we ran religiously in order to clean that shit up, after stopping and before restarting DB2.
So you can imagine that a language-runtime implementer might just punt on the issue: after all, doing a full GC can’t actually solve all those issues (just make them less likely).

gadmm · April 20, 2020, 7:42pm

Drawbacks:

Not collecting values that are still live when exit is called
Increased shutdown time

The first one is a problem if exit is called in the middle of the program as a way to handle errors (which should be considered bad practice).

I am curious about your use-case. Have you considered deterministic resource management? (See my reply to Wrapping C++ std::shared_ptr and similar smart pointers, section 2.)

Chet_Murthy · April 20, 2020, 8:34pm

This is an excellent suggestion! I had never thought of it, but for anything whose finalization does not require actual ML code, this … is a perfect solution.

ETA [not sure, but it seems …]: But upon reading the C++ standard, it isn’t so clear that std::shared_ptr will be cleaned-up at C++ std::exit time. It seems unavoidable to explicitly keep a list of pointers, and register an atexit() handler to finalize 'em at exit() time. Ah, well.

gadmm · April 20, 2020, 8:58pm

Of course, calling exit is bad practice. Better use an exception (and make sure to catch it in the main).

suttonshire · April 20, 2020, 9:04pm

Thanks for your thoughts here folks. This helps clarify my thinking on this.

I’m trying to decide between two different interfaces:

module Socket : sig
  type t
  val create: string -> t
  val delete: t -> unit
  val do_something: t -> unit
end

and

module Socket: sig
  type t
  val create: string -> t
  val do_something: t -> unit
end

In each interface the create function is a binding to a C function. create device allocates memory and installs a bpf program on the ethernet interface device.

In the first interface delete frees the memory and uninstalls the bpf program. If the bpf program is not uninstalled with delete you’ll get an error when you call create again. The OS doesn’t uninstall the bpf program after the program exits.

The applications that I’m writing generally don’t need to create a socket on the same interface multiple times during program lifetime. To make my life easier, I was thinking I could do the work of delete in a finalization callback. This way I wouldn’t have to worry about double-frees or forgetting to call delete. For this to work, I need to for the finalization callback to be executed at program exit which I can do with at_exit Gc.full_major

Now to weigh all the option presented here…

Chet_Murthy · April 20, 2020, 9:11pm

Don’t know if you intended it, but your message was cut short, I think.

Chet_Murthy · April 20, 2020, 9:17pm

[been thinking about your problem, and how to solve it without heavy lifting]

Assumptions:
(1) there are a number of objects that need to be cleaned-up.
(2) it doesn’t matter in which order they’re cleaned-up [if it does, that can be addressed, but for now, I’ll ignore it]
(3) there are pointers for these objects, that your Ocaml code manipulates.

You could use a weak-pointer table and put the pointers to the objects in there. At exit time, you could walk the table and call the finalizers. No need for a full GC.

gasche · April 22, 2020, 6:01am

What about the following interface, which is explicit about at which point in the program lifetime the resource is disallocated?

module Socket: sig
  type t
  val with_socket : string -> (t -> 'a) -> 'a
end

Two remarks:

In the implementation, remember to also consider the case where the function raises an exception and cleanup your resource there as well; Fun.protect from the standard library can help you do this.
If you like cool syntactic tricks, you can write

Socket.with_socket name @@ fun socket ->
<the rest of my code>

to have something that looks a bit like a declaration (of socket) with a body, instead of a nested subfunction.

rwmjones · April 24, 2020, 4:18pm

As others have said using/relying on Gc.full_major to free your resources is a bad idea.

However I would like to say that having an option – especially in test programs – to run Gc.compact at exit is very useful. It’s particularly good at finding memory corruption / logical errors in your bindings. For some tools like virt-v2v we have a hidden option --debug-gc which does precisely this and is used in non-production builds and tests.

suttonshire · April 25, 2020, 4:23pm

I like this a lot. I was trying to help the interface consumer by having the cleanup work done automatically even in the uncaught exception case. with_socket and Fun.protect seems like a more reasonable approach.

Thanks.

yawaramin · April 25, 2020, 6:09pm

Oh, nice. This can also be made a let-operator, something like:

module Socket: sig
  type t
  val ( let& ) : string -> (t -> 'a) -> 'a
  (** 'Borrow' (a la Rust) a socket *)
end

E.g. usage,

let open Socket in
let& socket = name in
<the rest of my code>

Topic		Replies	Views
Is there a way to turn off garbage collection inside of one function? Learning gc	15	2295	April 27, 2020
OCaml 5: forcing objects to be collected and finalized Learning gc , ocaml5	4	621	June 27, 2023
Relaxed rules for binding a C library? Learning multicore	8	704	October 25, 2022
How does OCaml's garbage collector handle external functions Learning	2	676	February 2, 2022
[ANN] OCaml-MariaDB 0.9.0 Community announce	6	1363	November 23, 2017

Downsides to calling Gc.full_major at exit?

Related topics