Some packages are hardcoded to use in_channel
(such as jsonm
and ezjsonm
@avsm ) even though they only use the input
function for buffered io. I would like to read from a gzipped file and am looking at using camlzip
. But camlzip
creates its own Gzip.in_channel
type and there seems to be no way to interface the two. It looks like to me the only constructors of Pervasives.in_channel
are external C functions. So is the only way connecting the two to go through a temporary file or through string?
Temporary file seems like the simplest option to me. Filename.temp_file "name" "ext"
will give you a unique file name, something like /tmp/nameabc123.ext
.
Otherwise you can use decompress which implements GZip over a stream. It can be usable as long as you can transmit contents by a bigstring
. This function is probably what you want.
@dinosaure The problem is ezjsonm
accepts only in_channel
and mustache
(the ultimate consumer) is compatible with ezjsonm
. Otherwise having it over a stream like decompress
is nice. I assume that it is not hard to wrap a stream around camlzip
either.
@yawaramin Thanks for the temp_file
tip. Seems like that is the way to go. It seems a tad sad but it should get the job done.
Heh: I’m not seriously suggesting this, but maybe a socketpair(2), and then use a thread to write into one end while reading from the other?
Haha. With that amount of effort it is probably easier just to change the offending package source to use stream
instead of in_channel
and create a PR. The temp file approach, while inelegant, took only a few minutes (likely an overestimate) to implement.
I wonder if you could wrap the whole thing in a functor and parameterize by the type in_channel
(and operations) ? Might be less invasive than changing to a stream.
Also, can you point me at what this stream
type is? I don’t recognize it (but then, I’m olld and maybe just not keepin’ up with the kidz).
I agree parameterizing over an in_channel
type (with just the needed input
function) is the standard way to go. I’ve only come across the streams concept in tutorials. It seems a higher order readily packaged way to achieve the same.
Ohhhh, you mean -those- streams. OK, I know them. Indeed, I’m a fierce fanatic of that way of writing parsers. And certainly I think it’s wonderful stuff. But OTOH, it’s definitely deprecated in the OCaml community. So you might want to not use Stream.t
for that simple reason.
Again, I’m not saying I think it’s a bad idea: just that … for widest acceptance, you probably want to stick to using modules that are widely-accepted.
Just a thought.
Oh! I am not up with the lore so it’s good to know. Thanks for the advice!
There is a language-change proposal by @c-cube to make precisely this use-case possible, modular IO.
If going through a string is fine, you could use ezgzip (which wraps camlzip I believe).
@roddy I ended up adapting camlzip
's example and used a temporary file:
let gunzip src tgt =
let ic = Gzip.open_in src in
let oc = open_out_bin tgt in
let bl = 1024 * 1024 in
let buffer = Bytes.create bl in
let rec decompress () =
let n = Gzip.input ic buffer 0 bl in
if n = 0 then () else (output oc buffer 0 n ; decompress ())
in
decompress () ; Gzip.close_in ic ; close_out oc
let json_from_gz gz =
let tmp = Filename.temp_file "Some" "json" in
gunzip gz tmp ;
let ic = open_in_bin tmp in
let json = Ezjsonm.from_channel ic in
close_in ic ; Sys.remove tmp ; json
You don’t have have to use ezjsonm. You could just use the low level interface of jsonm which allows us you to feed input manually.
If you want to stick to channels but avoid the temp file, you could also use Unix.pipe
.
Thanks for the tip about `Manual
. That seems like the right way to go without needing a temp file. Although it does require reaching down to jsonm
it does not seem particularly onerous given the examples. I will publish the final code here if I get it to work.
@rgrinberg Here is what I ended up with using jsonm
's low level manual interface:
let json_from_gz gz =
let ic = Gzip.open_in gz in
let bl = 1024 * 1024 in
let buffer = Bytes.create bl in
let decoder = Jsonm.decoder `Manual in
let rec decode d =
match Jsonm.decode d with
| `Await ->
let n = Gzip.input ic buffer 0 bl in
Jsonm.Manual.src d buffer 0 n ; decode d
| `Lexeme l -> l
| _ -> assert false
in
let rec value v k d =
match v with
| `Os -> obj [] k d
| `As -> arr [] k d
| (`Null | `Bool _ | `String _ | `Float _) as v -> k v d
| _ -> assert false
and arr vs k d = match decode d with `Ae -> k (`A (List.rev vs)) d | v -> value v (fun v -> arr (v :: vs) k) d
and obj ms k d =
match decode d with
| `Oe -> k (`O (List.rev ms)) d
| `Name n -> value (decode d) (fun v -> obj ((n, v) :: ms) k) d
| _ -> assert false
in
value (decode decoder) (fun v _ -> v) decoder
And just to be sure that `Await
branch actually works as expected I have tested with a much smaller buffer size to ensure it is exercised.