On the design of iostream

The design of an iostream type that offers an imperative programming interface for I/O isn’t going to be able to hide its side effects, and the side effects may be different depending on the implementation of the underlying stream.

One example of this is handling non-blocking I/O operations. If the underlying stream is an In_channel that has been opened non-blocking, then a read may raise Sys_blocked_io. If the underlying stream is a Unix.file_descr that has been opened non-blocking, then a read may raise Unix.Error (Unix.EAGAIN, _) or Unix.Error (Unix.EWOULDBLOCK, _). And if the underlying stream is a bytes, then it’s an open question what happens when multiple domains are contending for mutually exclusive access to the mutable octets. Do they block on a Mutex.t even though the iostream is meant to represent a non-blocking stream? Does it try to acquire the Mutex.t and raise some exception if it can’t do so without blocking? Which exception would that be?

Other side effects can be important as well. Consider streams that read from a pipe or a socket rather than a file at rest. The operation of receiving from the pipe consumes octets from buffers held by the OS, which may have the side effect of signaling to the correspondent that it may resume sending. The timing of such signals may have important downstream consequences in a system of concurrently communicating processes.

My take is that where an imperative programming interface to I/O is required is at the lowest levels of an application stack, at precisely the points where the action and events represented by operations on an I/O stream often should not be hidden behind an abstract type. And we have a good standard functional programming interface for dealing with streams of data: Stdlib.Seq.

The trick that I think a good I/O library needs to perform is to enable programming in the boundary between 'a Seq.t and Unix.file_descr (and its nieces In_channel.t and Out_channel.t)

And the idea is that it’s none of your business and you shouldn’t need to ask.

And so your system as nothing to do with Haskell type classes, Rust traits or Golang interfaces. So don’t try to sell it if it was similar. When You use a trait in Rust on a socket value, you still know that you had a socket. It’s not the case with your design.

The system I advocate is not something that I invented myself, it’s the design at the heart of the modular implicit proposal, but instead of being implicitly instantiated (which will be a compiler work as in Haskell, Rust or Golang) you have to explicitly pass the methods module at call site.

What they did in those languages, if I had to translate it in OCaml (and thta’s what is done in the modular implicit proposal). is that they don’t pack their value with their methods in an object, like in this API:

val foo : packed ('a meth * 'a) -> 'a

but they unpacked the object and curryfied the signature, like with this API:

val foo : 'a meth -> 'a -> bar

In such a way that you can use a socket as a stream, without forgetting that it’s a stream. The only thing that you get with those languages is that the methods module parameter is instantiated by the compiler. But in the end you really get similar system which is not clearly the case with your design.

This is just wrong :slight_smile:

Iostream is the exact equivalent of Go’s io interfaces (Reader, Writer). If you take such an interface you have to get outside the type system, via reflection, to get the concrete thing underneath.

It’s also exactly like rust’s traits when they’re used via trait objects. A &mut dyn Read is just that, and you know nothing of the concrete type underneath.

Passing a type alongside a set of functions can also be useful… But then you have to carry this type parameter everywhere where you use the type. Sometimes an existential is useful too.

1 Like

I want to expand on this point, which makes a lot of sense to me. A seq (of strings, presumably) is a functional, high level interface to represent IO streams, indeed.

Iostreams here are lower level and, indeed, more imperative. The usefulness of going low level is efficiency: you often will only do copies, not allocations, with Iostreams. Even if you transfer GBs of data you’re not going to allocate GBs. Whether it’s worth it for you or not is application dependent.

It’s a bit similar to how we have printf and Buffer.t and not just 'a -> char list. Printing efficiently is also somewhat imperative in places.

Some points I’d comment on:

the side effects may be different depending on the implementation of the underlying stream.

That’s true, and the design is oriented towards exceptions (like Unix) and either blocking IOs, or eio-style cooperative concurrency. With effects you don’t need to mention EAGAIN ever, you can just block the fiber, so that won’t ever leak.

With OCaml 5, blocking (the current thread of fiber) is the portable, universal synchronization primitive for IOs that you were mentioning :). It’s also how backpressure will manifest (if you try to write to a stream too fast, it can just block you).

And if the underlying stream is a bytes, then it’s an open question what happens when multiple domains are contending for mutually exclusive access to the mutable octets

Yeah that’s just a bug. The same thing is true with a seq if multiple domains try to access it, if it’s backed by anything impure. Generally speaking we don’t have (yet) the type system to prevent such wrong uses, so locking is left to the user. Note that the compositionality of iostream means you can take a stream and wrap it into a mutex protected stream quite easily (like std::cout famously is in C++).

1 Like

I’m not really sure it’s the case. When you define a function that takes a Reader/Writer interfaces as input, for sure you don’t know its type and you have to use reflection if you want to know it. But when you call this function on your value you don’t loose its typing information (if it was a socket, you still know it’s a socket).

Be assured that in OCaml, if you turn a Buffer.t into an output stream so you can pass it to a function, the buffer is still there, unchanged, with the same type :slight_smile:

Sure, it is still there, unchanged, but you have to keep two versions of your buffer : one as a an output stream and one as a buffer (or you have to play with the getter-setter pattern of OOP).

Basically, when you lift your buffer in an output stream you’re upcasting it. I prefer when this upcasting is done locally (only in the scope of a function call) than globally. I prefer to write something like this:

let b = Buffer.create 1024 in
foo (b :> out); (* here I use b as an output stream *)
bar (b :> out); (* idem *)
baz b (* here I use b as a buffer with a method that an output stream don't have *)

Basically, the generic function make : 'a meth -> 'a -> t is equivalent to the upcast b :> out. And if you have a function with this signature: foo : 'a meth -> 'a -> foo, you are doing the upcast only inside of foo and you can still use your buffer as a buffer outside of foo. But if your function have this signature foo : t -> foo, you have to upcast, once and for all, your buffer in order to use foo, and you have to keep a copy of your buffer if you just want to use it as such.

In other words:

foo (module Bytes) b

is the real equivalent of:

foo (b :> out)

which is what Golang, Haskell and Rust do with their system.

Why hello, it’s been 4 months. You can upcast locally if you want, it works fine with iostream.

which is what Golang, Haskell and Rust do with their system.

In Go, it depends on whether the callee expects an interface, no? People carry around interfaces instead of concrete types all the time. You couldn’t even write the generic version foo (module Bytes) b until a few years ago.

In Rust, you have the choice between:

fn foo<R>(r: &mut R) where R: io::Read { ... }

and

fn foo(r: &mut dyn io::Read) { … }

and both have different tradeoffs (code bloat vs one indirection at runtime). The later is the equivalent of what iostream offers, partly because I find the former really cumbersome to pass around since all functions start needing to pass an additional type parameter and dictionary.

Sorry for the delay, but I was reading recently the source code of riot IO and it reminds me this discussion. I can, indeed, upcast locally, but that means that I would have to allocate an object at each function call.

Yes, but I believed you wanted to do something like Go interfaces. In Go, if a function expect an interface, the caller can give it a bytes (if this type satisfies the interface in question).

That’s indeed the problem with the OCaml version, the dictionary have to be passed explicitly, when it is done implicitly in other languages. It’s the problem that modular implicit try to solve.

Yes, Riot does something similar (but using first-class modules, also a valid choice). It’s almost like this is a real use case that people want.

It’s the problem that modular implicit try to solve.

It’d be lovely to have that. However, in 2024, modular implicits are mostly sci-fi and I can’t write code that relies on them. This really should be solvable with the tools we have today (or that we had 15 years ago, to be honest).

3 Likes

For sure, it’s a use case that people want. And this solution (I mean yours) is already described in the first version of Real World OCaml in the chapter about first-class module. It’s just that they don’t use an object type (as you) but a first-class module type that is isomorphic to an object.

It’s solvable with the tools we have, even 15 years ago, since if I remember correctly first-class module is circa 2010. And It was also solvable since the very beginning, since modules (not the first-class version) are made for this: if you take a set of function with a first-class module, and you factor out this module, you get a plain functor, i.e. a function that take a module and return a set of function (packed in a module). Algebraically, you have (A -> a -> b) * (A -> c -> d) = A -> ((a -> b) * (c -> d)) ; the left-side are two functions with first-class modules, and the right-side is a functor that returns two function without first-class modules.

The problem I have with your solution is its cost/benefice : I loose really valuable typing information (very high value) to gain the fact I don’t have to explicitly pass a dictionary (weak cost).

The only interest I see in an object-like value is if you want to put distinct kinds of output streams in a container (say a list). To do so you have to upcast each value in a common supertype, and this object type can be use for that (that’s the reason they use this solution in Real World Ocaml).

I’m really weary of this thread. Your use case is fine, mine is fine too, and nobody is using iostream anyway so you’re not at risk of having it imposed on you by ecosystem pressures.

For me, hiding the types is a feature, I don’t want to thread additional type parameters everywhere, and I want to pick the implementation at of these things at runtime based on CLI flags or other things anyway. I just want a “input channel”, blissfully unaware of what’s underneath, because I value this kind of modularity. It’d be nice to have an easy design that allows for both choices without code duplication (like Rust does with trait objects/bounded polymorphism) but in OCaml there’s not really as easy a solution.

Cheers!

2 Likes

Sorry, if I upset you. I hadn’t realized that hiding type information was a feature.

It’s is with plain functor. But according to the use case you described, you will instantiate it whit only one module (the one about the common supertype, the object type, the golang interface type), hence you don’t need the functorize version.