An amusing use of first-class modules: reading from plaintext and compressed files

I was recently trying to write a thing in Rust, and having problems, so I wrote the same thing in OCaml, just to make sure that it was doable. I thought I’d post about it, b/c maybe it’s an example of what we’ll find more tractable, once we have modular implicits.

The problem: I have both compressed and plaintext files, and I want to run a function over the uncompressed contents. I’d like a combinator that I can apply to the filename and the function, that will do the work of opening the file, calling the function, closing the file, etc.

This isn’t so hard.

  1. define a type of READER (and two instances for plaintext and gzipped). This is the equivalent of Rust’s “io::BufRead”.
module type READER =
  sig
    type in_channel
    val open_in : string -> in_channel
    val input_char : in_channel -> char
    val close_in : in_channel -> unit
  end
let stdreader = (module Stdlib : READER) ;;
let gzreader = (module Gzip : READER) ;;
  1. then define a type of “in channel user” (“ICUSER”) and the generic version of it
module type ICUSER = sig
  type in_channel
  val use_ic : in_channel -> unit
end
module type GENERIC_ICUSER = functor (R : READER) -> (ICUSER with type in_channel = R.in_channel)
  1. then define our function that takes a generic in_channel, and uses it – “Cat”
module Cat(R : READER) : ICUSER with type in_channel = R.in_channel = struct
  type in_channel = R.in_channel
  let use_ic ic =
  let rec rerec () =
    match R.input_char ic with
      c -> print_char c ; rerec ()
    | exception End_of_file -> ()
  in rerec ()
end
  1. And then write our “with_input_file” function, that takes a filename, the function from #3, and applies it to either a normal in_channel, or one produced from a gzip-reader.
let with_input_file fname (module R : GENERIC_ICUSER) =
  let (module M : READER) =
    if Fpath.(fname |> v |> has_ext "gz") then
      gzreader
    else stdreader in
  let open M in
  let ic = M.open_in fname in
  let module C = R(M) in
  try let rv = C.use_ic ic in close_in ic ; rv
  with e -> close_in ic ; raise e

And now we can use it:

with_input_file "/etc/passwd" (module Cat) ;;
with_input_file "foo.gz" (module Cat) ;;

Easy-peasy. I don’t remember enough about the modular implicits proposal to remember if this can be cast in the supported language there, so I suppose I should get some version of that code (or the newer versions from others) up-and-running, and see if this can be made to work.

6 Likes

can’t we get rid of the GENERIC_ICUSER requirement and just ask for functions that take a packed module of type READER

by that I mean the signature of with_input_file becomes string -> ((module READER) -> 'a) -> 'a

1 Like

It’s a good question, and as a newbie user of first-class modules, I don’t know the typing rules well enough to answer. But I did try:

let with_input_file' fname f =
  let (module M : READER) =
    if Fpath.(fname |> v |> has_ext "gz") then
      gzreader
    else stdreader in
  let open M in
  let ic = M.open_in fname in
  f (module M : READER) ic

and got

File "ioabs.ml", line 96, characters 24-26:
96 |   f (module M : READER) ic
                             ^^
Error: This expression has type M.in_channel
       but an expression was expected of type 'a
       The type constructor M.in_channel would escape its scope

ETA: I remember in the modular implicits paper, that there was a lot of wrappering code in structs (that didn’t start off in structs). I wonder if that’s evidence that you really do have to “push up” code to the module level in order to make it work.

You don’t need modular implicits to simplify your code. Your packed module type is equivalent to:

type channel = { input_char: unit -> char; close_in: unit -> unit }
type channel_generator = string ->  channel

We could go fancy and manifest the type with an existential

type 'a channel = 
  { open_fn: string -> 'a; input_char: 'a -> char; close_in: 'a -> unit }
type chan = Any: 'a channel -> chan

but this has mainly the advantage to illustrate the fact that you are never using the non-existentially qualified 'a channel which means that in the current version of your code, modular (explicits or) implicits is not a good fit: we are not selecting a module to provide functions for a type, we have an object (aka an existentially qualified record) with some hidden inner type that we never need to know.

3 Likes

Sure, I can always convert the thing into a “object” with a functional API. We’ve been doing that forever, coding things that look weakly like objects, as records with fields that are all functions. But I wanted to retain the data-type-like feel of the code. In any case, this wasn’t a real problem, but rather something I transliterated from Rust to OCaml, to see how it would feel in OCaml.

ETA: And this was a tiny case; the general thing I was trying to express, was that it seemed like Rust’s “traits” (aka modular implicits) contributed to a certain succinctness. That was all.

it’s always fun to bust out the big guns, gotta admit : P

1 Like

I think it’s kind of counter-productive to want a in_channel type at all. This is what I’ve been doing, more and more:

module type INPUT = sig
  val read_char : unit -> char
  val read : bytes -> int -> int -> int
  val close : unit -> unit
end

type input = (module INPUT)

let open_file (filename:string) : input =
  let ic = open_in filename in
  (module struct
    let read_char() = input_char ic
    let read = input ic
    let close() = close_in ic
 end)


let do_sth (module IN:INPUT) =
  IC.read_char ();
  IC.read …

This behaves like classic objects in other languages and there’s no complicated typing going on (what with each implementation having its own channel type).

3 Likes

You got me curious: what’s the reason for using a first-class module here instead of a record or an object?

Of course!

  • compared to records, I find first-class modules to be a lot more convenient for this use case. I still use records for data, but a record-of-function is often less convenient. For example, modules allow you to use include, they directly handle down-casting as a way to hide internal state (whereas for modules you need to close over values created before the record); module types are structural, so I don’t need to worry about disambiguation, whereas records need more care there. In terms of performance both seem exactly the same, from my toy benchmarks.
  • compared to objects, first-class modules are a bit less convenient (no runtime-free cast, no true inheritance/mixin), but LSP and other tools are fragile. In addition, invoking an object method seems to be roughly twice as slow as a record/module field access — I suppose it’s because the latter is just an access via offset. That’s on a micro benchmark so in reality it might be worse.
7 Likes

I was trying to emulate the use of Rust traits, by writing code that could have been inferred by the modular-implicits proposed compiler. At least, as I understand it (I’m buried in Rust, so I can’t take the time to understand it well, sadly).

I managed to make it work… But you’d need a forall type at least… so it’s not much of an ergonomic improvement over passing a packed module:

let with_input_file fname
      (f : <run : 'ch. (module READER with type in_channel = 'ch) -> 'ch -> 'a>)
=
  let (module M) =
    if Fpath.(fname |> v |> has_ext "gz") then
      gzreader
    else stdreader in
  f#run (module M) (M.open_in fname)
2 Likes