Help Review the new "File Manipulation" tutorial on OCaml.org

sabine · July 18, 2023, 9:19am

Hey everyone,

there’s a new version of the “File Manipulation” tutorial on

https://staging.ocaml.org/docs/file-manipulation

For comparison: the old version of this tutorial is here File Manipulation · OCaml Tutorials.

Thanks for taking a look and giving feedback and suggestions for revising this!

chshersh · July 18, 2023, 4:08pm

Thanks a lot for writing such tutorials!

As a person who started learning OCaml only several months ago, I find such articles especially helpful where I can learn OCaml bit-by-bit

Great content. And this version is much better than the previous one!

benjamin-thomas · July 18, 2023, 7:01pm

It looks very good, great job

It has a “cookbook” feel which I quite like.

Since I know you’re open to suggestions, I thought I’d point out that maybe a dedicated “cookbook” section could make a great addition to the ocaml.org.

dbuenzli · July 19, 2023, 8:04am

I’m afraid I don’t have the time to make extensive remarks, as it would amount to a full rewrite of the tutorial. But basically:

It is not a good idea to mostly eschew error handling and correct ressource handling (i.e. use of Fun.finally). It’s precisely in this kind of tutorial that good habits should be established. There are too many functions here that people will cut and paste which can leak file descriptors including the first read_from_file that claims to handle errors or the second read_from_file which is plain wrong (racy to be precise). If you want to show incorrect usage you should at least put a comment in the code.
Don’t promote use of Sys.file_exists it’s a broken function which will only lead to head scratchings (or pure anger on a bad day) for both for programmers and end users. It turns permissions errors into false. Do a mkdir bla && touch bla/file.txt && chmod ug-x bla and enjoy the result of
```
if Sys.file_exists "bla/file.txt" then
  print_endline "File exists."
else
  print_endline "File does not exist."
```
to get a an idea.

sabine · July 19, 2023, 8:28am

I’m understanding the overall sentiment here (and from the feedback on the PR itself) as this:

cookbook-style presentation of different recipes / things people may want to do is helpful
actual code examples show some bad practices - these must be fixed to promote good practices

…

Tangent: If Sys.file_exists is a broken function…

is there any way to fix that in the long-term?
Is anyone keeping a list of such “broken Stdlib functions”?
Is there a stable package that provides a safer API to the file system?

dbuenzli · July 19, 2023, 8:34am

The problem is that currently the function never raises, IIRC it turns all unix errors into false, the correct way would be to make it raise with Sys_error on unix errors but then maybe some programs rely on the fact that Sys.file_exists never raises.

The Unix module.

sabine · July 19, 2023, 9:09am

Thank you Daniel, I opened an issue Stdlib: Make `Sys.file_exists` raise `Sys_error` in error cases, instead of returning `false` · Issue #12393 · ocaml/ocaml · GitHub

The Unix module

It looks like Windows support will be improving, and that - in the long term - we need to provide practical-minded documentation that works for both Windows and Unix users.

dbuenzli · July 19, 2023, 9:33am

Don’t let yourself be fooled by a name, the Unix module is ill-named. It’s a carefully implemented OS abstraction library, emulating most of POSIX functions on Windows (and those that are not are documented on this page).

sabine · July 19, 2023, 9:37am

Ah sorry about that… reminds me of the only two hard things in computer science: naming things and cache invalidation.

cvine · July 19, 2023, 7:02pm

If you find there is someone can you ask them to add to the list something I have mentioned before and learnt the hard way, namely that the Unix.execv* functions in Stdlib are not thread-safe even though we now have domains and have had the Thread module for some time, and this is not documented in the OCaml reference even though users will assume otherwise by analogy with the underlying C functions.

This is not academic. The Lwt authors assumed them thread-safe: Lwt now automatically starts threads by default when encountering potentially blocking operations but uses Unix.execve in its Lwt_process module. This has the unfortunate feature that it will work under glibc but occasionally blow up under musl.

I am not suggesting that the code should necessarily be rewritten (maybe it can’t be) but it should be documented.

arbipher · July 19, 2023, 7:38pm

For curious, what is the usual workflow to update or propose a tutorial on OCaml.org?

nojb · July 19, 2023, 9:11pm

Sorry for the naïve question, but can you explain what you mean by thread-safe in this context?

Thanks,
Nicolas

cvine · July 19, 2023, 9:45pm

By “thread-safe” I mean complies with POSIX and does not give rise to random lockups. (It conditionally applies malloc which is not allowed in the child process of a multi-threaded program.)

nojb · July 19, 2023, 10:22pm

I’m not sure I follow: which child process are we talking about? The result of calling exec* is to replace the current process with a different executable, no new process is created. Regardless, reports of thread unsafety in the unix library should be reported upstream: https://github.com/ocaml/issues.

Cheers,
Nicolas

cvine · July 19, 2023, 10:59pm

I also don’t follow. Exec is applied after a fork in any real world program. In what circumstances would a program apply exec except after a fork? Lwt_process is a typical example. Can you set out the case to which you refer?

nojb · July 20, 2023, 4:57am

This is getting off-topic for the present thread, but just to round things up:

As mentioned, the issue of Unix.exec* allocating memory in multi-threaded programs should be reported upstream so that at least it can be documented.

On a related note, Unix.create_process was reimplemented on top of posix_spawn precisely to avoid this issue: https://github.com/ocaml/ocaml/pull/9573. Even when posix_spawn is not available, the fallback code in that function takes care not to allocate in the child process.

Cheers,
Nicolas

nojb · July 20, 2023, 5:08am

I took the liberty of filing a bug report myself: Unix.create_process_env might not be multi-thread safe · Issue #12395 · ocaml/ocaml · GitHub.

Cheers,
Nicolas

R_Huxton · July 20, 2023, 6:58am

My brief notes as an Ocaml learner (but experienced dev).

“doesn’t return a file descriptor…instead…a channel” - the practical difference being what? Is it just that I can’t open a file in read/write mode?
There is quite a lot of “chat” before you get to examples / bullet-points. Maybe if you aren’t familiar with reading/writing files in other languages it gives vital context though.
Start with the “with_” example, because it is (a) shorter and (b) closes the channel correctly.
Follow the “with_” example with the steps it performs including the try/catch. This way learners will see why they want to use the short version if they can.
You need examples of writing/reading a file line-by-line immediately after that - it is likely the most common task a learner would attempt. It is unfortunate that there is no stdlib wrapper to make this less complex, but that example needs to be there.
No links to the relevant modules in the stdlib docs!
The examples for “Error Handling” don’t seem to close channels in the event of an exception. If I’ve understood correctly this could leak channels? You might need to move “Remembering to close channels” above this section to give context.
Maybe a note (near the bottom) about whether garbage-collection closes a channel or not?

HTH

ygrek · July 20, 2023, 4:04pm

I would also mention that writing directly to a final file is the bad pattern, more so if one is overwriting the existing valid file (e.g. some settings modified by user from ui, persistent state, etc) - in case filesystem runs out of space or the unfortunate kernel crash happens in the middle of the write the user is left with the partially written or empty file.
Proper way to do atomic file (over)writes is to write to the temporary file in the same directory, close, fsync and rename to final path, e.g. Devkit.Files.save_as
And no, this is not a theoretical problem, and actually a widespread mistake in many popular programs (ask me how I know).

xavierleroy · July 21, 2023, 7:51am

Think of a tail call. A launcher program prepares arguments and environment, then launches another program as its last action. You can find examples in “the real world”, whatever that means.

Topic		Replies	Views
Feedback / Help Wanted: Upcoming OCaml.org Cookbook Feature Community user-feedback , ocamlorg	36	2660	January 5, 2025
Feedback Needed: New "Arrays" Tutorial on OCaml.org Community user-feedback , ocamlorg	7	1147	August 4, 2023
Updating the “Error Handling” tutorial Community user-feedback , ocamlorg , tutorial	27	2339	May 19, 2023
New Tutorials on Basics of OCaml Learning announce , learning , ocamlorg , learn-ocaml , tutorial	15	2004	November 17, 2023
Tutorial on Mutability, Loops, and Imperative Programming Learning announce , learning , ocamlorg , learn-ocaml , tutorial	0	395	December 15, 2023

Help Review the new "File Manipulation" tutorial on OCaml.org

Related topics