Why is there an output_string but not a read_string function?

In the Pervasives module, we have an output_string which takes care of writing a string on an output channel, even if the string is huge.

On the other hand, there is no read_string:in_channel -> string function that reads all the remaining text in an input channel (assuming it corresponds to a text file).

Is this deliberate ? I would naively have thought that reading and writing are rather symmetrical operations.

1 Like

What is string? The entire file? What if it’s too big/has no EOF?

It shouldn’t be hard to raise exceptions in those cases, should it ? It’s not very difficult to write a small function that does that, but I’m surprised that it’s not builtin.

I’m glad you asked, someone motivated could definitely
try and revive https://github.com/ocaml/ocaml/pull/640 which adds some
helpers to the stdlib’s IO. Good luck!

1 Like

I’m glad other people share my concerns too :slight_smile:

Indeed, this is quite useful when doing system programming.
Several libraries define such a function.
It is named string_of_file usually.
For example:

opam install ocaml-compiler-libs

Then use Misc.string_of_file.

1 Like

It is not too hard, because it is impossible to do “correctly”. The read operation on a file might block forever, e.g. a file on an NFS mount on a connection that went away.

I must point out that “compiler libs” (the set of modules that are part of the compiler implementation) give currently no guarantee about the stability of their API. You should only use it if you really need it and you know what you are doing.
So for a simple function such as string_of_file that can be implemented independently from the compiler codebase, you really should use any other library (containers, bos, base) instead of compiler libs.

1 Like

Or we could try to get @c-cube PR merged in the stdlib.

Sure, that would be nice as well.

Serious question: is anyone motivated to take the PR over? (not sure how easy it can be).

[oh, I see you really -do- mean “read the whole file into a string”]

This might make sense to put in a bolt-on library, but why would one put it in the core?

  1. it’s trivial to implement with really_input, a fixed-size read buffer, and the Buffer module.
  2. for anybody who doesn’t know what they’re doing, it’s a loaded gun lying around.
  3. For anybody who does know what they’re doing, see #1.

I found this code in my DAFT project:

  let string_of_file fn =
    let buff_size = 1024 in
    let buff = Buffer.create buff_size in
    let ic = open_in fn in
    let line_buff = Bytes.create buff_size in
    begin
      let was_read = ref (input ic line_buff 0 buff_size) in
      while !was_read <> 0 do
        Buffer.add_subbytes buff line_buff 0 !was_read;
        was_read := input ic line_buff 0 buff_size;
      done;
      close_in ic;
    end;
    Buffer.contents buff

Yep. I used to have one in my “utils” library. But recently I started using the “Bos” library for “convenient file I/O” and it’s got nice verbs for things like this – file contents, dir contents, etc.

Ok, I found the bos:

https://erratique.ch/software/bos

I do think the symmetric operation to output_string is called input_line in the stdlib:

val input_line in_channel -> string                                          
    Read characters from the given input channel, until a                       
    newline character is encountered. Return the string of                      
    all characters read, without the newline character at the end.              
    Raise [End_of_file] if the end of the file is reached                       
    at the beginning of line.   

Yeah, that’s some sweet code there.

I feel like I’m missing a punchline because nobody has mentioned core yet, but you can use In_channel.input_all if you install core.

https://ocaml.janestreet.com/ocaml-core/109.55.00/tmp/core_kernel/In_channel.html

A large group of OCaml devs has had these “wtf?” moments like yours wrt the standard library and released an overlay.

The same thing exists in Stdio, also by Jane Street.

This is an obvious need. Memory is cheap, and if you can load the entire file into a string, it makes sense to do so.

There’s something like that in all and every stdlib extension or alternative, including batteries and containers! Oftentimes it’s better to load a file in its entirety, so it won’t change (e.g. if a source file won’t fit in memory, a compiler will have trouble processing it!)