Bit sequences in OCaml


#1

I’m looking for an implementation of arbitrary-length, immutable bit sequences in OCaml. I don’t care much about efficiency. I would like to be able to convert from and to various built in types, e.g.

of_int : int -> t
to_int_exn : t -> int
...

Finally, I would like to be able to perform standard bitwise operations on the data structure, e.g.

set_exn : t -> int -> bool -> t
or : t -> t -> t
and : t -> t -> t
xor : t -> t -> t
invert : t -> t
...

Obviously, I could roll my own implementation and use something like

type t = bool list

Before I reinvent the wheel though, I was wondering if there is already a library for this purpose? I was a bit shocked how little OCaml seems to offer when it comes to bit and byte level manipulation…
Thanks!


#2

#3

This looks almost perfect, except that these bitvectors are mutable.


#4

Is that a big problem ? You can use it immutably. Worst case, just encapsulate it in an immutable API.


#5

In the interest of learning, I would propose a somewhat direct approach, playing around with modulo and bitwise ops.


(* How many elements we need in the list/array (each with 31-positions) *)
let how_many len = (
  (len / 31) + (match len mod 31 with 0 -> 0 | _ -> 1)
)

(* Where (in the inner list/array) is the element describing the position of the bit n *)
let whereis_bit n = (
  let n = n + 1 in
  (n / 31) + (match n mod 31 with 0 -> 0 | _ -> 1) - 1
)

And how to use them to solve the problem with an array:


let test_validate_len_inner_lists () = (
  Alcotest.(check int "") 2 (how_many_lists 62);
  Alcotest.(check int "") 3 (how_many_lists 63);
)

let test_bits_by_index () = (
  Alcotest.(check int "") 0 (whereis_bit 0);
  Alcotest.(check int "") 0 (whereis_bit 10);
  Alcotest.(check int "") 0 (whereis_bit 30);
  Alcotest.(check int "") 1 (whereis_bit 31);
  Alcotest.(check int "") 1 (whereis_bit 61);
  Alcotest.(check int "") 2 (whereis_bit 62);
  Alcotest.(check int "") 2 (whereis_bit 92);
  Alcotest.(check int "") 3 (whereis_bit 93);
)


let set_bit ~(i:int) (data: int array) = (
  let where = whereis_bit i in
  if where > Array.length data - 1 then
    failwith "Not enough space"
  else (
    data.(where) <- data.(where) lor (1 lsl (i mod 31));
  );
)

let get_bit ~(i:int) (data: int array) = (
  let where = whereis_bit i in
  if where > Array.length data - 1 then
    failwith "Not enough space"
  else (
    let n = (1 lsl (i mod 31)) in
    (data.(where) land n) = n
  )
)

let test_sequence () = (
  let seq = [| 0; 0; 0 |] in
  set_bit ~i:0 seq;
  set_bit ~i:31 seq;
  set_bit ~i:32 seq;
  set_bit ~i:62 seq;

  Alcotest.(check bool "i = 0") true (get_bit ~i:0 seq);
  Alcotest.(check bool "i = 1") false (get_bit ~i:1 seq);

  Alcotest.(check bool "i = 31") true (get_bit ~i:31 seq);
  Alcotest.(check bool "i = 32") true (get_bit ~i:32 seq);
  Alcotest.(check bool "i = 35") false (get_bit ~i:35 seq);

  Alcotest.(check int "") 1 seq.(0);
  Alcotest.(check int "") 3 seq.(1);
  Alcotest.(check int "") 1 seq.(2);
)

The code is efficient, and you could easily guarantee immutability by either converting the array to list, or using a list directly. I don’t like the idea of using a list here because array gives access in constant. So I would maybe hide the type into a restricted API.


#6

It’s only a problem insofar as any unsafe API (i.e., an API that does not enforce the invariants it relies on) is a problem. Of course one can live with unsafe languages and APIs, I’d just rather not if I don’t have to.

Thanks for the pointer though! Exposing things via an immutable API would of course be an option.


#7

I’ve used big_ints as immutable arbitrary length bit vectors before. Bit of a hack, but it works well.


#8

I wrote a library including an immutable bit string type some time ago. I never published in opam and please don’t consider it actively maintained. The bit manipulation is done in C and should be fairly well optimised, but that means also it can’t be used in JavaScript (or Mirage?) in its current form. I used it for representing IP addresses, sets of subnets, and maps from subnets.


#9

Sweet, that looks perfect!