Is there any way to create an in_channel?

Some packages are hardcoded to use in_channel (such as jsonm and ezjsonm @avsm ) even though they only use the input function for buffered io. I would like to read from a gzipped file and am looking at using camlzip. But camlzip creates its own Gzip.in_channel type and there seems to be no way to interface the two. It looks like to me the only constructors of Pervasives.in_channel are external C functions. So is the only way connecting the two to go through a temporary file or through string?

Temporary file seems like the simplest option to me. Filename.temp_file "name" "ext" will give you a unique file name, something like /tmp/nameabc123.ext.

1 Like

Otherwise you can use decompress which implements GZip over a stream. It can be usable as long as you can transmit contents by a bigstring. This function is probably what you want.

@dinosaure The problem is ezjsonm accepts only in_channel and mustache (the ultimate consumer) is compatible with ezjsonm. Otherwise having it over a stream like decompress is nice. I assume that it is not hard to wrap a stream around camlzip either.

@yawaramin Thanks for the temp_file tip. Seems like that is the way to go. It seems a tad sad but it should get the job done.

Heh: I’m not seriously suggesting this, but maybe a socketpair(2), and then use a thread to write into one end while reading from the other?

Haha. With that amount of effort it is probably easier just to change the offending package source to use stream instead of in_channel and create a PR. :slight_smile: The temp file approach, while inelegant, took only a few minutes (likely an overestimate) to implement.

I wonder if you could wrap the whole thing in a functor and parameterize by the type in_channel (and operations) ? Might be less invasive than changing to a stream.

Also, can you point me at what this stream type is? I don’t recognize it (but then, I’m olld and maybe just not keepin’ up with the kidz).

I agree parameterizing over an in_channel type (with just the needed input function) is the standard way to go. I’ve only come across the streams concept in tutorials. It seems a higher order readily packaged way to achieve the same.

Ohhhh, you mean -those- streams. OK, I know them. Indeed, I’m a fierce fanatic of that way of writing parsers. And certainly I think it’s wonderful stuff. But OTOH, it’s definitely deprecated in the OCaml community. So you might want to not use Stream.t for that simple reason.

Again, I’m not saying I think it’s a bad idea: just that … for widest acceptance, you probably want to stick to using modules that are widely-accepted.

Just a thought.

Oh! I am not up with the lore so it’s good to know. Thanks for the advice!

There is a language-change proposal by @c-cube to make precisely this use-case possible, modular IO.

7 Likes

If going through a string is fine, you could use ezgzip (which wraps camlzip I believe).

@roddy I ended up adapting camlzip's example and used a temporary file:

let gunzip src tgt =
  let ic = Gzip.open_in src in
  let oc = open_out_bin tgt in
  let bl = 1024 * 1024 in
  let buffer = Bytes.create bl in
  let rec decompress () =
    let n = Gzip.input ic buffer 0 bl in
    if n = 0 then () else (output oc buffer 0 n ; decompress ())
  in
  decompress () ; Gzip.close_in ic ; close_out oc

let json_from_gz gz =
  let tmp = Filename.temp_file "Some" "json" in
  gunzip gz tmp ;
  let ic = open_in_bin tmp in
  let json = Ezjsonm.from_channel ic in
  close_in ic ; Sys.remove tmp ; json
2 Likes

You don’t have have to use ezjsonm. You could just use the low level interface of jsonm which allows us you to feed input manually.

If you want to stick to channels but avoid the temp file, you could also use Unix.pipe.

Thanks for the tip about `Manual. That seems like the right way to go without needing a temp file. Although it does require reaching down to jsonm it does not seem particularly onerous given the examples. I will publish the final code here if I get it to work.

@rgrinberg Here is what I ended up with using jsonm's low level manual interface:

let json_from_gz gz =
  let ic = Gzip.open_in gz in
  let bl = 1024 * 1024 in
  let buffer = Bytes.create bl in
  let decoder = Jsonm.decoder `Manual in
  let rec decode d =
    match Jsonm.decode d with
    | `Await ->
        let n = Gzip.input ic buffer 0 bl in
        Jsonm.Manual.src d buffer 0 n ; decode d
    | `Lexeme l -> l
    | _ -> assert false
  in
  let rec value v k d =
    match v with
    | `Os -> obj [] k d
    | `As -> arr [] k d
    | (`Null | `Bool _ | `String _ | `Float _) as v -> k v d
    | _ -> assert false
  and arr vs k d = match decode d with `Ae -> k (`A (List.rev vs)) d | v -> value v (fun v -> arr (v :: vs) k) d
  and obj ms k d =
    match decode d with
    | `Oe -> k (`O (List.rev ms)) d
    | `Name n -> value (decode d) (fun v -> obj ((n, v) :: ms) k) d
    | _ -> assert false
  in
  value (decode decoder) (fun v _ -> v) decoder

And just to be sure that `Await branch actually works as expected I have tested with a much smaller buffer size to ensure it is exercised.