I’m looking for an implementation of arbitrary-length, immutable bit sequences in OCaml. I don’t care much about efficiency. I would like to be able to convert from and to various built in types, e.g.
of_int : int -> t
to_int_exn : t -> int
...
Finally, I would like to be able to perform standard bitwise operations on the data structure, e.g.
set_exn : t -> int -> bool -> t
or : t -> t -> t
and : t -> t -> t
xor : t -> t -> t
invert : t -> t
...
Obviously, I could roll my own implementation and use something like
type t = bool list
Before I reinvent the wheel though, I was wondering if there is already a library for this purpose? I was a bit shocked how little OCaml seems to offer when it comes to bit and byte level manipulation…
Thanks!
In the interest of learning, I would propose a somewhat direct approach, playing around with modulo and bitwise ops.
(* How many elements we need in the list/array (each with 31-positions) *)
let how_many len = (
(len / 31) + (match len mod 31 with 0 -> 0 | _ -> 1)
)
(* Where (in the inner list/array) is the element describing the position of the bit n *)
let whereis_bit n = (
let n = n + 1 in
(n / 31) + (match n mod 31 with 0 -> 0 | _ -> 1) - 1
)
And how to use them to solve the problem with an array:
let test_validate_len_inner_lists () = (
Alcotest.(check int "") 2 (how_many_lists 62);
Alcotest.(check int "") 3 (how_many_lists 63);
)
let test_bits_by_index () = (
Alcotest.(check int "") 0 (whereis_bit 0);
Alcotest.(check int "") 0 (whereis_bit 10);
Alcotest.(check int "") 0 (whereis_bit 30);
Alcotest.(check int "") 1 (whereis_bit 31);
Alcotest.(check int "") 1 (whereis_bit 61);
Alcotest.(check int "") 2 (whereis_bit 62);
Alcotest.(check int "") 2 (whereis_bit 92);
Alcotest.(check int "") 3 (whereis_bit 93);
)
let set_bit ~(i:int) (data: int array) = (
let where = whereis_bit i in
if where > Array.length data - 1 then
failwith "Not enough space"
else (
data.(where) <- data.(where) lor (1 lsl (i mod 31));
);
)
let get_bit ~(i:int) (data: int array) = (
let where = whereis_bit i in
if where > Array.length data - 1 then
failwith "Not enough space"
else (
let n = (1 lsl (i mod 31)) in
(data.(where) land n) = n
)
)
let test_sequence () = (
let seq = [| 0; 0; 0 |] in
set_bit ~i:0 seq;
set_bit ~i:31 seq;
set_bit ~i:32 seq;
set_bit ~i:62 seq;
Alcotest.(check bool "i = 0") true (get_bit ~i:0 seq);
Alcotest.(check bool "i = 1") false (get_bit ~i:1 seq);
Alcotest.(check bool "i = 31") true (get_bit ~i:31 seq);
Alcotest.(check bool "i = 32") true (get_bit ~i:32 seq);
Alcotest.(check bool "i = 35") false (get_bit ~i:35 seq);
Alcotest.(check int "") 1 seq.(0);
Alcotest.(check int "") 3 seq.(1);
Alcotest.(check int "") 1 seq.(2);
)
The code is efficient, and you could easily guarantee immutability by either converting the array to list, or using a list directly. I don’t like the idea of using a list here because array gives access in constant. So I would maybe hide the type into a restricted API.
It’s only a problem insofar as any unsafe API (i.e., an API that does not enforce the invariants it relies on) is a problem. Of course one can live with unsafe languages and APIs, I’d just rather not if I don’t have to.
Thanks for the pointer though! Exposing things via an immutable API would of course be an option.
I wrote a library including an immutable bit string type some time ago. I never published in opam and please don’t consider it actively maintained. The bit manipulation is done in C and should be fairly well optimised, but that means also it can’t be used in JavaScript (or Mirage?) in its current form. I used it for representing IP addresses, sets of subnets, and maps from subnets.
If you google for “functional bitfield library ocaml” you should find this project:
If some functions are missing just ask me, I will be happy to add it!
I’m surprised you were not able to find it, it’s there on GitHub since 2013, and since May 4, 2019 it’s in opam: https://opam.ocaml.org/packages/funbits/
I assume you’re not planning on dealing with sparsity? So if bit number N is on, the bitset will take space O(N) bits? In that case, why not (heh) try using Zarith and arbitrary-precision integers? Just a thought!
BAP Bitvector implements modular arithmetic on top of Zarith.
Bitvec a more efficient but not yet released implementation of modular arithmetic on top of Zarith.
The Zarith library is like a standard de-facto in the research community. It is extremely effective, with lots of low-level optimizations. When you bitvector fits into the native int, it will use use the native int representation. All operations for small vectors. are implemented in assembly. For longer bitvectors, the GNU GMP library is used, which is also very efficient. Of course, bitvectors are all immutable.