select is a well-known limitation of a program: it only supports file descriptor numbers up to 1024 due to a hardcoded bitmap in C (note that this refers to the file descriptor number, and not the amount of file descriptors you want to watch: it can fail even when watching a single file descriptor if its value is beyond 1024).
Alternatives are poll and epoll and there are various bindings in OCaml for them. However I was trying to ensure that I really switched the entire application over to these alternative interfaces, and in a project with hundreds of recursive dependencies on other OCaml packages this becomes difficult.
In C it would be easy: check that the binary doesn’t refer to the select symbol, and then (unless you play trick with dlsym) know it doesn’t call it.
Even if a single call to ‘select’ from a library I link slips through it may result in an ocassional error, so before bumping the max number of file descriptors limit for the program I want a static way to prove its absence.
What I came up with is not very reliable: disassemble the binary and check for calls to unix_select (
you can see the details at https://github.com/xapi-project/xen-api/pull/4877#issuecomment-1816210034 but this involves checking for ‘mov’ of ‘unix_select’ address to ‘rax’ and then a call to ‘caml_c_call’, it is not a direct call. This then results in a smallish list of OCaml functions, such as ‘Thread.wait_timed_read’, etc.
Although not entirely reliable as a way to prove the absence of select (what if something calls Thread.timed_read?) it did help to identify bugs in programs that were already thought to be safe on >1024 FDs (xenopsd which has a bug in a dependency ezxenstore that calls select).
Next I thought I could use ‘objdump -dS’ to look for calls to these functions but that doesn’t work: calling that on a large binary is very slow and incomplete (you don’t see all the source code of all the opam libraries by default).
I also tried to use bap, but it cannot track indirect calls by default in its latest version (there was an earlier experiment in cbat_tools that implemented vsa that may have been able to, but that only worked with bap 1.5 not 2.x).
I also thought about grepping for Unix.select/etc. during a CI build but that may be unreliable (what if some code does open Unix and call just select, etc.)
My next step would be to:
try to see whether I could get dead code elimination working well enough and granular enough such that linking the Unix module doesn’t result in linking all the stubs that introduce the ‘select’ call
try to create a CI build that removes ‘select’ from the Unix module (and timed_read/write from Thread) and see whether that succeeds.
But maybe there is a more obvious approach that I missed, any suggestions?
I think the 1024 FD limitation is serious for server applications and the limitation can be easily picked up by using a library that itself or transitively uses select. So I do think that getting away from select in the library ecosystem, or at least tracking it, is worth more attention.
I wasn’t aware of this limitation, it’s not great. I only tend to use Unix.select to implement a timer that sleeps while select-ing on a FD to be awaken early, but it’s bad that it can fail due to totally unrelated limits.
Naively, could this be fixed by… changing the stdlib so that Unix.select uses something less bad behind the scene? Maybe the API would have to change, which I understand is not great, but select could be at least deprecated because no one is going to actually use it in production to implement async IO when epoll and kqueue and the likes exist?
oxenstored tried to provide a drop-in replacement for select implemented using poll. Although it solves the file descriptor limit it is horrible for performance (it constructs a hashtable every time).
Although that was done at the OCaml level, and it might be possible to get closer to equivalent performance by implementing the drop-in replacement in C and keep just the OCaml API intact.
(Code that needs to watch a large number of file descriptors would hopefully know to use something more efficient, and Lwt (with conf-libev) and Eio already do).
Whether it is problematic in practice depends on the extent of your control over user limits. open and cognates allocate the lowest file descriptor number that is available (and a file descriptor number becomes available again as soon as it is closed), and FD_SETSIZE is normally 1024. By default the maximum soft user limit for open files is also the same as FD_SETSIZE, so if user code doesn’t play around with ulimit I have taken it in the past that I should be OK with select.
We do want to increase the ulimit though (e.g. in a systemd .service file), 1024 is quite limiting:
if you have a file descriptor leak your program will stop working very frequently. If we could increase the file descriptor limit temporarily then we may be able to fix the bug with very little downtime. (e.g. instead of failing once a day with EMFILE you’d fail once a month, or not at all because hopefully a month later you’d have a fix already)
if you need to service a large number of concurrent actions (e.g. in our case: 1000 VM/host) then you use up almost all your FDs just handling one task on behalf of them, and you’re very limited in what functions you can call (e.g. a typical pattern to wait for an event might be to use a pipe – because there is no pthread_cond_timedwait – but that already consumes 2 FDs, etc.).
with such a small number of FDs it is more difficult to protect against malicious or buggy clients flooding your server. If you could increase the number of FDs then you could set it higher than the max number of source ports on a single IP and then attacks from a single machine would no longer easily take down your server. Of course other defences are still needed.
Here’s another variation on this idea, assuming the server runs in a Linux environment: install an LD_PRELOAD hook that causes select() calls to crash, log, or signal failure in some other appropriate way.