Since I know you’re open to suggestions, I thought I’d point out that maybe a dedicated “cookbook” section could make a great addition to the ocaml.org.
I’m afraid I don’t have the time to make extensive remarks, as it would amount to a full rewrite of the tutorial. But basically:
It is not a good idea to mostly eschew error handling and correct ressource handling (i.e. use of Fun.finally). It’s precisely in this kind of tutorial that good habits should be established. There are too many functions here that people will cut and paste which can leak file descriptors including the first read_from_file that claims to handle errors or the second read_from_file which is plain wrong (racy to be precise). If you want to show incorrect usage you should at least put a comment in the code.
Don’t promote use of Sys.file_exists it’s a broken function which will only lead to head scratchings (or pure anger on a bad day) for both for programmers and end users. It turns permissions errors into false. Do a mkdir bla && touch bla/file.txt && chmod ug-x bla and enjoy the result of
if Sys.file_exists "bla/file.txt" then
print_endline "File exists."
else
print_endline "File does not exist."
The problem is that currently the function never raises, IIRC it turns all unix errors into false, the correct way would be to make it raise with Sys_error on unix errors but then maybe some programs rely on the fact that Sys.file_exists never raises.
It looks like Windows support will be improving, and that - in the long term - we need to provide practical-minded documentation that works for both Windows and Unix users.
Don’t let yourself be fooled by a name, the Unix module is ill-named. It’s a carefully implemented OS abstraction library, emulating most of POSIX functions on Windows (and those that are not are documented on this page).
If you find there is someone can you ask them to add to the list something I have mentioned before and learnt the hard way, namely that the Unix.execv* functions in Stdlib are not thread-safe even though we now have domains and have had the Thread module for some time, and this is not documented in the OCaml reference even though users will assume otherwise by analogy with the underlying C functions.
This is not academic. The Lwt authors assumed them thread-safe: Lwt now automatically starts threads by default when encountering potentially blocking operations but uses Unix.execve in its Lwt_process module. This has the unfortunate feature that it will work under glibc but occasionally blow up under musl.
I am not suggesting that the code should necessarily be rewritten (maybe it can’t be) but it should be documented.
By “thread-safe” I mean complies with POSIX and does not give rise to random lockups. (It conditionally applies malloc which is not allowed in the child process of a multi-threaded program.)
I’m not sure I follow: which child process are we talking about? The result of calling exec* is to replace the current process with a different executable, no new process is created. Regardless, reports of thread unsafety in the unix library should be reported upstream: https://github.com/ocaml/issues.
I also don’t follow. Exec is applied after a fork in any real world program. In what circumstances would a program apply exec except after a fork? Lwt_process is a typical example. Can you set out the case to which you refer?
This is getting off-topic for the present thread, but just to round things up:
As mentioned, the issue of Unix.exec* allocating memory in multi-threaded programs should be reported upstream so that at least it can be documented.
On a related note, Unix.create_process was reimplemented on top of posix_spawn precisely to avoid this issue: https://github.com/ocaml/ocaml/pull/9573. Even when posix_spawn is not available, the fallback code in that function takes care not to allocate in the child process.
My brief notes as an Ocaml learner (but experienced dev).
“doesn’t return a file descriptor…instead…a channel” - the practical difference being what? Is it just that I can’t open a file in read/write mode?
There is quite a lot of “chat” before you get to examples / bullet-points. Maybe if you aren’t familiar with reading/writing files in other languages it gives vital context though.
Start with the “with_” example, because it is (a) shorter and (b) closes the channel correctly.
Follow the “with_” example with the steps it performs including the try/catch. This way learners will see why they want to use the short version if they can.
You need examples of writing/reading a file line-by-line immediately after that - it is likely the most common task a learner would attempt. It is unfortunate that there is no stdlib wrapper to make this less complex, but that example needs to be there.
No links to the relevant modules in the stdlib docs!
The examples for “Error Handling” don’t seem to close channels in the event of an exception. If I’ve understood correctly this could leak channels? You might need to move “Remembering to close channels” above this section to give context.
Maybe a note (near the bottom) about whether garbage-collection closes a channel or not?
I would also mention that writing directly to a final file is the bad pattern, more so if one is overwriting the existing valid file (e.g. some settings modified by user from ui, persistent state, etc) - in case filesystem runs out of space or the unfortunate kernel crash happens in the middle of the write the user is left with the partially written or empty file.
Proper way to do atomic file (over)writes is to write to the temporary file in the same directory, close, fsync and rename to final path, e.g. Devkit.Files.save_as
And no, this is not a theoretical problem, and actually a widespread mistake in many popular programs (ask me how I know).
Think of a tail call. A launcher program prepares arguments and environment, then launches another program as its last action. You can find examples in “the real world”, whatever that means.