For a long time I wondered about signals. Essentially, I would like to program without having to think about them at all.
For example, POSIX says that close can fail with EBADF (which I am happy to handle) and EINTR (the system call was interrupted) which I would rather not handle since presumably it involves doing something awful like re-trying the close call. So I want a close call that either fails with EBADF, or succeeds and closes the fd.
Linux man page for close has the following:
Retrying the close() after a failure return is the wrong thing to do,
since this may cause a reused file descriptor from another thread to
be closed. This can occur because the Linux kernel always releases
the file descriptor early in the close operation, freeing it for
reuse; the steps that may return an error, such as flushing data to
the filesystem or device, occur only later in the close operation.
But this seems to stop short of saying that close will always close the fd even if EINTR is returned (“steps that may return an error” could be interpreted as those steps which may return non-EINTR errors). So it is not even clear to me how (in a concurrent setting) you can actually close a file descriptor cleanly.
There is even an LWN article about POSIX and Linux behaviours of close (POSIX says to retry close in a loop; Linux says to never do this):
Is there any way to program without having to consider these signals? Presumably some signals are fine (those that forcibly kill the process because something terrible has happened). But the rest I just don’t want to think about; I would rather deal with the errors that occur as a result of using the system calls, not from some unrelated signal-type thing.
How do Core and other libraries deal with this?
At a more basic level: what is the safe way to (portably?) close a file descriptor? And why is this so hard???
Ah, but the point of my post is that it seems impossible to even close an fd in a portable way. So I don’t see that CCIO can provide any such guarantee (and indeed it seems to just call OCaml’s standard Unix.close).
Because NFS has no ‘reserve some space for me’ operation and your local machine buffered the data until close() was called, you can only get an ‘out of disk space’ error on the close(); the first the remote fileserver heard of your new data is when your local machine started sending it writes as you closed the file. Now suppose that this close() takes long enough that it gets interrupted with an EINTR. If the file descriptor is now invalid, your program has no way to find out that the data it thought it had written has in fact been rejected.
Thus it’s at least sensible for Unix systems to worry about this potential case and decide that close() should not close the file descriptor in the EINTR case.
I’m not sure I agree with his reasoning. If you want to detect errors due to data not being written you need to fsync and pick up the errors from that. Assuming a valid fd, close should always close it. This is the only interface that makes sense in a concurrent setting.
Could you be more precise as to what the problem is here ?
As far as I can recall bos tries to be as robust as possible. Which means that if EINTR is raised: we retry. Now if the fd was in fact closed this will be EBADF and ignored (see the uses of close in the module), if it’s not then we will retry to close it.
For the concurrent case, I suppose we could allocate a global lock and require all uses of close to take the lock first, and then potentially retry the close (on HP/UX et al?)… (Will to live slowly ebbing away)
I’m unable to think of a version of Unix where close can actually be interrupted by a signal. Yes, POSIX claims it is possible, and I suppose a version would be standards conformant if it permitted it to happen, but I’ve never seen it in a real implementation. After using doing Unix systems programming for 33 years or so now, I’ve also never seen close(2) fail on a valid descriptor and I’ve never seen a bug caused by failure to test for EINTR as a return value from close(2) with retry.
I might be horribly mistaken, but I think you can just ignore that completely. Certainly if you did ignore it, your code would look like essentially everyone else’s.
This permits the behavior that occurs on Linux and many other
implementations, where, as with other errors that may be reported by
close(), the file descriptor is guaranteed to be closed. However, it
also permits another possibility: that the implementation returns an
EINTR error and keeps the file descriptor open. (According to its
documentation, HP-UX’s close() does this.) The caller must then once
more use close() to close the file descriptor, to avoid file
descriptor leaks. This divergence in implementation behaviors
provides a difficult hurdle for portable applications, since on many
implementations, close() must not be called again after an EINTR
error, and on at least one, close() must be called again. There are
plans to address this conundrum for the next major release of the
I’ll take people’s word for it that HP-UX used to do this. I would ignore it. That behavior is clearly broken. System calls should be interruptable (at least to userland’s knowledge) only if they’re long. Short calls of guaranteed duration like close(2) should never allow themselves to be interrupted. Don’t worry about it, don’t program on the assumption that it can happen.