How can I create an in_channel from a string?

Is there no function for creating an in_channel from a string? I found Stream.of_string and Lexing.from_string, but nothing for in_channels. Do I have to write the string out as a file to be able to open it as an in_channel?

4 Likes

May I ask what the use case is? Strings are immutable and fully loaded in memory, so at first glance it doesn’t make sense to open them as input channels…

I have a function that takes an in_channel. The function is used in several contexts. In one of those contexts, the values that it should be reading from the in_channel are constants known at compile time. I would like to pass the function an in_channel that I have created by providing constant values as a string.

If I needed a stream, I could use Stream.of_string. If I needed a lexbuf, I could use Lexing.from_string. But I need an in_channel, and I don’t see what I can use to create that.

1 Like

This happens pretty often in many languages: I’ve encountered it in Golang, C++, and Java. Is it possible to modify the code that consumes the in_channel, e.g. by functorizing it with an argument that can be satisfied by a simple wrappering of an in_channel, and also of a string ? I remember back in the day in Java, there were Streams and Readers, and the former had to be attached to something I/O related, where the latter could be backed by a string or buffer of some sort.

1 Like

Interestingly though, in Go it should be easier because they have standardized on the Reader and Writer interfaces for pretty much everything.

I was actually going to suggest just using a variant, e.g.

type input = Channel of in_channel | String of string

let my_func = function
  | Channel inc -> ...
  | String s -> ...

One nice thing about OCaml, is that most of the standard library is written as a set of modules that (a) could have been written by normal users, and (b) are meant to be used by them, not merely as an aggregate standard library, but also in bits-and-pieces.

In Golang, that isn’t the case (or it wasn’t 4-5yr ago; gosh I doubt it’s changed grin) Instead, for instance, the entire inside of the networking stack (which means sockets, tcp/ip, HTTP, GRPC) is all designed to be used only from its public interface.

  1. So for instance, GRPC provides a way to receive/process RPCs from a ServerSocket (socket/bind/listen/accept) but not a way for a user to provide their own accept-loop (and hence their own thread-pool, etc).
  2. Another example: the socketpair() function is exposed (so is fork/exec), but once you have a socketpair, there’s no way to start up GRPC comms on that pair. Nor is there a way to convert an integer into a socket and then onward into a GRPC comms endpoint. So a standard way of doing IPC from a “mamma process” to its “child” (make a socketpair, then fork/exec in the child; child uses the integer fd to establish a server-endpoint; mamma sends RPCs to chlid, etc) is unavailable.

[It’s been years and years, but] At least with Thrift, there was a way of doing this. But the Golang philosophy for building its libraries is “use it the way we tell you, peon”.

Ah, well.

As you discovered you can’t.

A long time ago a proposal was made to make Java-like composable IO classes but somehow it never caught in the larger eco-system (the fact that it used objects certainly has something to do with that).

More recently your problem would likely be solved by this “modular” IO RFC.

Personally I’m not super fond of it. With effects at the corner I would prefer that the Stdlib simply provides a more general notion of bytes IO that revolves around functions.

Something like:

type buffer = { bytes : bytes; start : int; len : int }
type input = unit -> buffer option
type output = buffer option -> unit

With None consistently meaning end of input. The sub bytes specified by buffer values returned by input are owned by the receiver until the next call to input. The sub bytes specified by buffer given to output are owned by output until it returns.

5 Likes

In case you need a workaround, perhaps you could look to the approach of using a socket described in io - In OCaml, how can I create an out_channel which writes to a string/buffer instead of a file on disk - Stack Overflow ?

(The asker there mentions Scanf.Scanning.from_string, but afaict, that does not actually create an in_channel of the desired sort, but instead a Scanf.Scanning-specific channel.)

1 Like

I think this is a fine design. A few years ago, I was working with a JSON parser/prettyprinter for C++ written using STL. And it didn’t have support for I/O to/from an abstract object (I had a Thrift stream I wanted to read/write from); it only supported strings and C++ iostreams. But doing the necessary “lifting” to slide in a little abstraction that I could instantiate with a struct of my own to wrapper the Thrift stream was … maybe a half hour’s work. I think that that kind of division-of-labor is a fine thing, partially because it meant that I provided only the functions that this JSON code needed, not the entirety of what users of iostreams might use in general.

And that’s been the case several times in OCaml code, too. It’s just not a hard enough or important enough problem to merit restructuring the entire standard library.

1 Like

I’m sorry, I have to disagree. It’s a big flaw of the stdlib, and the
reason I opened that RFC is because extensible channels would be
incredibly useful everywhere. I’m not going to repeat all the use cases,
but there are good reasons why every modern language has generic
Reader/Writer interfaces, with Go and Rust being only two examples.

Look at Printf and tell me it wouldn’t be better to keep only one of the
three versions of each function. Look at recent issues on camlzip. I’ve
had to redo some sort of ad-hoc IO wrapper so many times it’s getting
tiresome.

The only problem is that I’ve run out of steam on implementing the RFC
because I have little free time currently, and even less motivation to
finish updating all the IO functions in the stdlib.

12 Likes

@c-cube, could you make a fork just for this PR? You could then base the PR on a branch, and allow other people to contribute with PRs to the fork so it’s not all on you. You could advertise the fork right here on discuss.

5 Likes

Shameless plug: you may try the redirect package (opam - redirect) which exposes (among some others) the following function:

val with_channel_from_string : string -> (in_channel -> 'a) -> 'a
(** [with_channel_from_string s f] evaluates [f chan] where [chan] is a channel
    from which [s] can be read. *)

(the implementation uses pipes, as suggested by the second answer in the link that @shonfeder posted)

6 Likes

Thank you, Thierry. The redirect package looks very useful, for more than just this one thing.

1 Like

Using pipes let me build what I wanted:

let in_channel_of_string string =
  let (in_file_descr, out_file_descr) = Unix.pipe () in
  let in_channel = Unix.in_channel_of_descr in_file_descr in
  let out_channel = Unix.out_channel_of_descr out_file_descr in
  begin
    output_string out_channel string;
    flush out_channel;
    in_channel;
  end
1 Like

I think this will not work if the string is larger than the pipe’s
buffer, because output_string will block. In addition, reading from
in_channel will not terminate because the pipe is not closed.

3 Likes

A fix can be to make the output_string in a new thread. (It’s what I did in redirect.)

1 Like

The small-string limitation is not a problem for me because I know the string length and it is small, but I do need to mention it in the documentation. And thank you for pointing out the bug: I need to close out_channel, not just flush it.

let in_channel_of_string string =
  let (in_file_descr, out_file_descr) = Unix.pipe () in
  let in_channel = Unix.in_channel_of_descr in_file_descr in
  let out_channel = Unix.out_channel_of_descr out_file_descr in
  begin
    output_string out_channel string;
    close_out out_channel;
    in_channel;
  end
1 Like

Sorry to bump that, but you relinked to that from somewhere else :). The input type is bad because it would allocate one byte for each byte in the input (a bit more even).

If you want function based compositional IOs, consider this, which works with existing read/write primitives:

type input = bytes -> int -> int -> int
type output = bytes -> int -> int -> unit

For each buffer in the input :–)

Your proposal is a different perspective: the buffer is owned by the consumer and you assume it knows how much data it wants.

Oh, you mean it’d re-emit the same bytes later. That’s interesting, thank you, I’ll ponder about that. Have you thought of having input be idempotent and couple this with consume: input -> int -> unit? This solves the problem of asking for a lot of data upfront but just consuming what you need (e.g. to read a line of input, then read n bytes, then a line, etc. as typically happens in HTTP and other protocols).