Bit sequences in OCaml

smolkaj · March 12, 2019, 6:31pm

I’m looking for an implementation of arbitrary-length, immutable bit sequences in OCaml. I don’t care much about efficiency. I would like to be able to convert from and to various built in types, e.g.

of_int : int -> t
to_int_exn : t -> int
...

Finally, I would like to be able to perform standard bitwise operations on the data structure, e.g.

set_exn : t -> int -> bool -> t
or : t -> t -> t
and : t -> t -> t
xor : t -> t -> t
invert : t -> t
...

Obviously, I could roll my own implementation and use something like

type t = bool list

Before I reinvent the wheel though, I was wondering if there is already a library for this purpose? I was a bit shocked how little OCaml seems to offer when it comes to bit and byte level manipulation…
Thanks!

Drup · March 12, 2019, 6:59pm

smolkaj · March 12, 2019, 7:58pm

This looks almost perfect, except that these bitvectors are mutable.

Drup · March 12, 2019, 8:21pm

Is that a big problem ? You can use it immutably. Worst case, just encapsulate it in an immutable API.

gersonmoraes · March 12, 2019, 9:20pm

In the interest of learning, I would propose a somewhat direct approach, playing around with modulo and bitwise ops.


(* How many elements we need in the list/array (each with 31-positions) *)
let how_many len = (
  (len / 31) + (match len mod 31 with 0 -> 0 | _ -> 1)
)

(* Where (in the inner list/array) is the element describing the position of the bit n *)
let whereis_bit n = (
  let n = n + 1 in
  (n / 31) + (match n mod 31 with 0 -> 0 | _ -> 1) - 1
)

And how to use them to solve the problem with an array:


let test_validate_len_inner_lists () = (
  Alcotest.(check int "") 2 (how_many_lists 62);
  Alcotest.(check int "") 3 (how_many_lists 63);
)

let test_bits_by_index () = (
  Alcotest.(check int "") 0 (whereis_bit 0);
  Alcotest.(check int "") 0 (whereis_bit 10);
  Alcotest.(check int "") 0 (whereis_bit 30);
  Alcotest.(check int "") 1 (whereis_bit 31);
  Alcotest.(check int "") 1 (whereis_bit 61);
  Alcotest.(check int "") 2 (whereis_bit 62);
  Alcotest.(check int "") 2 (whereis_bit 92);
  Alcotest.(check int "") 3 (whereis_bit 93);
)


let set_bit ~(i:int) (data: int array) = (
  let where = whereis_bit i in
  if where > Array.length data - 1 then
    failwith "Not enough space"
  else (
    data.(where) <- data.(where) lor (1 lsl (i mod 31));
  );
)

let get_bit ~(i:int) (data: int array) = (
  let where = whereis_bit i in
  if where > Array.length data - 1 then
    failwith "Not enough space"
  else (
    let n = (1 lsl (i mod 31)) in
    (data.(where) land n) = n
  )
)

let test_sequence () = (
  let seq = [| 0; 0; 0 |] in
  set_bit ~i:0 seq;
  set_bit ~i:31 seq;
  set_bit ~i:32 seq;
  set_bit ~i:62 seq;

  Alcotest.(check bool "i = 0") true (get_bit ~i:0 seq);
  Alcotest.(check bool "i = 1") false (get_bit ~i:1 seq);

  Alcotest.(check bool "i = 31") true (get_bit ~i:31 seq);
  Alcotest.(check bool "i = 32") true (get_bit ~i:32 seq);
  Alcotest.(check bool "i = 35") false (get_bit ~i:35 seq);

  Alcotest.(check int "") 1 seq.(0);
  Alcotest.(check int "") 3 seq.(1);
  Alcotest.(check int "") 1 seq.(2);
)

The code is efficient, and you could easily guarantee immutability by either converting the array to list, or using a list directly. I don’t like the idea of using a list here because array gives access in constant. So I would maybe hide the type into a restricted API.

smolkaj · March 12, 2019, 9:32pm

It’s only a problem insofar as any unsafe API (i.e., an API that does not enforce the invariants it relies on) is a problem. Of course one can live with unsafe languages and APIs, I’d just rather not if I don’t have to.

Thanks for the pointer though! Exposing things via an immutable API would of course be an option.

ifazk · March 12, 2019, 10:30pm

I’ve used big_ints as immutable arbitrary length bit vectors before. Bit of a hack, but it works well.

paurkedal · March 15, 2019, 7:36am

I wrote a library including an immutable bit string type some time ago. I never published in opam and please don’t consider it actively maintained. The bit manipulation is done in C and should be fairly well optimised, but that means also it can’t be used in JavaScript (or Mirage?) in its current form. I used it for representing IP addresses, sets of subnets, and maps from subnets.

smolkaj · March 15, 2019, 1:44pm

Sweet, that looks perfect!

fccm · May 17, 2019, 4:43am

If you google for “functional bitfield library ocaml” you should find this project:

If some functions are missing just ask me, I will be happy to add it!
I’m surprised you were not able to find it, it’s there on GitHub since 2013, and since May 4, 2019 it’s in opam:
https://opam.ocaml.org/packages/funbits/

Chet_Murthy · May 17, 2019, 5:08am

I assume you’re not planning on dealing with sparsity? So if bit number N is on, the bitset will take space O(N) bits? In that case, why not (heh) try using Zarith and arbitrary-precision integers? Just a thought!

ivg · May 17, 2019, 1:24pm

You may find the following libraries interesting

OCaml Zarith a very efficient implementation of multi-precision integer arithmetics.
OCaml Big_int less efficient implementation which was for a long time part of OCaml distribution.
Bignum - a Core Flavored wrapper around Zarith.
BAP Bitvector implements modular arithmetic on top of Zarith.
Bitvec a more efficient but not yet released implementation of modular arithmetic on top of Zarith.

The Zarith library is like a standard de-facto in the research community. It is extremely effective, with lots of low-level optimizations. When you bitvector fits into the native int, it will use use the native int representation. All operations for small vectors. are implemented in assembly. For longer bitvectors, the GNU GMP library is used, which is also very efficient. Of course, bitvectors are all immutable.

Topic		Replies	Views
Scala Vector-like structure in OCaml Learning datastructures	11	5057	February 21, 2018
Several Ocaml questions Learning	3	1240	December 19, 2017
Code review: chapter 14 exercises of ORA book Learning	7	778	June 20, 2019
Getting the n-th bit of a integer Learning help	5	1045	October 21, 2021
Type for mutable sets Learning	4	697	May 31, 2022

Bit sequences in OCaml

Related topics