Rpmfile library v0.3.0 with new Eio-based reader

Hello everyone :wave:

Today I want to tell you about new version of Rpmfile library. Rpmfile is a library for reading metadata from RPM packages. Originally Rpmfile’s parser (reader) used Angstrom for parsing. And in the new release added new modern Eio-based reader.

Globally, the project is now split into four packages: rpmfile, which contains signatures and implementation-independent functions, rpmfile-unix with the original Angstrom parser, and rpmfile-eio (with rpmfile-cli) written using Eio.

My experience porting to Eio

Eio is a fantastic effect-based I/O library for a more modern age in multicore OCaml. I think it takes the best ideas from the ecosystem. So built-in Buf_read and Buf_write modules implement ideas from Angstrom and Faraday libraries. Almost API one-to-one, allowing porting via copy-paste.

But, of course, not everything is so perfect. Unlike the Angstrom.parse_ function, the Buf_read.parse function thinks I want to read a whole stream to end of input.

A snippet of the source code:

let parse ?initial_size ~max_size p flow =
  let buf = of_flow flow ?initial_size ~max_size in
  format_errors (p <* end_of_input) buf
  (*               ^^^^^^^^^^^^^^^           
                    0_0 nice (not)        *)

So I had to rewrite this function myself in a form similar to Angstrom.Consume.Prefix.

Is it a signed or unsigned integer?

BE.uint16 and other similar functions are signed int even though they have the prefix u in the name for some reason.

And a few other differences

  • Angstrom.advance is skip
  • Angstrom.pos is consumed_bytes

P.S.

Thanks for your attention!

4 Likes

The u means it interprets the thing being parsed as an unsigned integer:

utop # Eio.Buf_read.(parse_string_exn BE.uint16) "\xff\xff";;
- : int = 65535

(not -1)

Note that you might find some differences to Faraday’s behaviour, as I found a few bugs while converting it for Eio:

1 Like

How wrong I was… I found a get_int16_be function call in the Buf_read.uint16 implementation and was mistaken.

@talex5, but for BE.uint32 and BE.uint64 it’s work.

# Printf.sprintf "%lx" (-123456l);;
- : string = "fffe1dc0"

# Eio.Buf_read.(parse_string_exn BE.uint32) "\xff\xfe\x1d\xc0";;
- : int32 = -123456l

any_int32 in Angstrom is uint32 in Buf_read. They call the same function from Bigstringaf (Bigstringaf.get_int32_be).

OCaml doesn’t have unsigned integers (see Unsigned 32-bit and 64-bit integers by yallop · Pull Request #1201 · ocaml/ocaml · GitHub), and the convention is to use e.g. int64 for both signed and unsigned 64-bit ints:

I tend to think of fixed-width integer types (int32, int64) as bit vectors that can be used as signed integers, unsigned integers, or tuples of bits, depending on the operations applied to them.

This does mean you have to be very careful about how you use them. e.g. use an unsigned format when printing them, avoid polymorphic compare, etc.