Eio 0.1 - effects-based direct-style IO for OCaml 5

I’m still not quite sure what you mean about multiple uses of run. Running two event loops in a single domain will never work (Lwt will raise an exception if you try, and Eio probably should too).

To run a new event loop in a new domain, you can spawn one using env. That ensures that the new loop will be of the same type, and so the resources from one will work in the other. e.g.

let () =
  Eio_main.run @@ fun env ->
  Eio.Domain_manager.run env#domain_mgr (fun () ->
      Eio.Flow.copy_string "Hello, from a new domain!\n" env#stdout
    )

We need to check a bit to make sure everything is thread-safe. I’m a bit nervous about sharing FDs across domains because if one domain closes an FD as another it using it, there is a possible race where it might access an unrelated FD that just got opened with the same number. However, given that the stdlib already has a much worse version of this problem even within a single domain, we can probably live with it for now!

[ Example of implementing directory sandboxing using nested effect handlers ]

Interesting. I suspect this won’t quite work if you try to pass a directory from one fibre to another, though, since only the original fibre will have the appropriate handler in its stack.

That’s not to say that the capability approach is a bad design though. On the contrary, I think that it might combine very nicely with some of the work we’ve been doing at Jane Street on local allocations, allowing us to have a safe effect handlers without/before having a full blown effect system in the language. I’m very interested in exploring this direction.

I remember hearing about some local allocations stuff in the Signals and Threads podcast. That would be very useful! We have several APIs where we pass a slice of a buffer to a callback, and ask users not to continue using it after the callback returns. Would be great to ensure that statically.

It still requires users to understand class syntax to read the API…

odoc hides the class definition by default, which is useful here. Possibly it needs a comment saying “You don’t need to read this unless you’re implementing the API yourself” or something?

e.g. in odoc you see this:

class virtual source : object ... end

val read : #source -> Cstruct.t -> int

So the only things you really need to know are:

  1. class foo = ... can be treated as type foo, and
  2. #source can be treated as just source.

However, we do make use of row-polymorphism. For example, in addition to source and sink, we have two_way, which includes both APIs. You have to know that the # allows you to call read on a socket, even though it looks at first like a separate type.

I would note though that Python has objects, methods and classes and is regularly recommended as a first language for children. I’m not saying this is a good thing (objects shouldn’t be first choice if something else will do), but it’s not an advanced topic.

and to decode the more difficult type errors from object types.

Using functions to access objects seems to avoid that problem. Here’s an example, where we try to write to stdin instead of stdout:

let () =
  Eio_main.run @@ fun env ->
  let dst = Eio.Stdenv.stdin env in
  Eio.Flow.copy_string "Hello!\n" dst
                                  ^^^
Error: This expression has type Eio.Flow.source
       but an expression was expected of type #Eio.Flow.sink
       The first object type has no method copy

Taking a method call on every operation also seems like an unnecessary cost.

I did some benchmarking, comparing various schemes here GitHub - talex5/flow-tests: Just for testing. In particular, this compared Conduit 3 (using first-class modules and GADTs) with objects. The conclusion was that for accessing OS resources the speed of a method call hardly matters. In fact, Conduit 3 was slower, but for other reasons, I think. Here’s a simple benchmark with my own GADT version:

module Object = struct

  class type source =
    object
      method read : bytes -> int -> int -> unit
      method close : unit
    end

  let of_channel ch =
    object (_ : source)
      method read buf off len = really_input ch buf off len
      method close = close_in ch
    end

  let read (source : #source) = source#read
end

module Gadt = struct
  module type SOURCE = sig
    type t

    val read : t -> bytes -> int -> int -> unit
    val close : t -> unit
  end

  type source = Source : (module SOURCE with type t = 'a) * 'a -> source

  module Channel_source = struct
    type t = in_channel

    let read = really_input
    let close = close_in
  end

  let of_channel ch =
    Source ((module Channel_source), ch)

  let read (Source ((module Source), source)) buf off len =
    Source.read source buf off len 
end

let time_object ch =
  let source = Object.of_channel ch in
  let buf = Bytes.create 4096 in
  let t0 = Unix.gettimeofday () in
  for _i = 1 to 1_000_000 do
    Object.read source buf 0 4096
  done;
  let t1 = Unix.gettimeofday () in
  Printf.printf "Time with object: %.3f\n" (t1 -. t0)

let time_gadt ch =
  let source = Gadt.of_channel ch in
  let buf = Bytes.create 4096 in
  let t0 = Unix.gettimeofday () in
  for _i = 1 to 1_000_000 do
    Gadt.read source buf 0 4096
  done;
  let t1 = Unix.gettimeofday () in
  Printf.printf "Time with module: %.3f\n" (t1 -. t0)

let () =
  let zero = open_in "/dev/zero" in
  time_gadt zero;
  time_object zero
$ dune exec -- ./test.exe
Time with module: 0.243
Time with object: 0.243

This is for reading from /dev/zero, which is very fast for the kernel. If you’re reading from a file or socket then obviously the kernel will be doing more work and the speed benefit (if any) will be smaller.

However, we also want to use sub-types, which is much easier with objects. As well as source, sink and two_way, some flows can be closed while others can’t. File sources should also allow pread, and sockets should allow send_msg. When using io_uring, you should be able to get the file descriptor, etc.