 # How to compare Lists

How can I compare more than two lists and return the longest from them? (Is it possible when I do not know an exact number of lists?)

you can save both the current longest list and its length so you only need to iterate once.

Naive idea: pass a list of lists, recursively compute the length of each element and save the max into an accumulator variable.

You mean you need to find the longest list from a list of lists? Then you just need to find the max element, e.g.,

``````(* file demo.ml *)
open Base

let longest = List.max_elt ~compare:(fun x y ->
compare (List.length x) (List.length y))
``````

To build use (once)

``````dune init exe demo --libs base
``````

To run use

``````dune exec ./demo.exe
``````
1 Like

Maybe you want to compute lengths lazily, for instance if you have one list that is much longer than the others. Essentially, you’d want a generalization of `List.compare_lengths`. You may do it like this.

``````(* Return the index of the (first) longest list. *)
let longest_list : 'a list list -> int =
let pop_one tail =
begin match tail with
| (_, [])    -> None
| (i, _::tl) -> Some (i, tl)
end
in
let rec longest tails =
begin match tails with
(* no candidate remaining: won’t happen *)
| [] ->
assert false
(* only one candidate remaining: we have a winner *)
| (i, _) :: [] ->
i
(* at least two candidates remaining: continue selection *)
| (i, _) :: _ ->
(* eliminate empty candidates, pop an element from others: *)
begin match List.filter_map pop_one tails with
(* if we have just eliminated all remaining candidates, then
* then they were all the same length; we return the first one
* (not fair!) *)
| []     -> i
(* otherwise, continue selection *)
| tails' -> longest tails'
end
end
in
fun lists ->
begin match lists with
| [] -> raise Not_found (* cannot find the longest of no lists *)
| _  -> longest (List.mapi (fun i li -> (i, li)) lists)
end
``````

The idea of this code is to repeatedly pop an element from each list, eliminating tails as soon as they become empty. When only one tail remains, that is the tail of your longest list. In order to recover the full list at that step, you need to have paired the tails with some bit of info. Here I pair them with the index of their corresponding list in the list of lists but, depending on your needs, could pair it with the list itself (using `(fun _ li -> (li, li))`).

Wouldn’t you take advantage of `List.compare_lengths` here? (In the standard library since OCaml 4.05).

``````open Base

let longest = List.max_elt ~compare:List.compare_lengths
``````

Thank you everyone <3

I am not sure that this function is in the `Base`'s `List`, iirc it is not.

Should be then available as `Caml.List.compare_lengths`.

It’s worth noting, @Maelan, that while worrying about the cost of `List.length`, the function you’ve introduced to replace it does masses of allocation! You can generalise `List.compare_lengths` without needing Base; `max_elt` is just a fold:

``````let biggest_list = function
| [] -> invalid_arg "biggest_list"
| lists ->
let f max list =
if List.compare_lengths list max > 0 then
list
else
max
in
List.fold_left f [] lists
``````

You can get a sense of the allocations going on in your function by compiling code with `-dlambda` and looking for `makeblock` calls (there’s always one at the end of a module for the module itself): e.g. the above `biggest_list`:

``````(let
(biggest_list/80 =
(function lists/82
(if lists/82
(let
(f/83 =
(function max/85 list/86
(if
(>
(apply (field 1 (global Stdlib__list!)) list/86 max/85)
0)
list/86 max/85)))
(apply (field 22 (global Stdlib__list!)) f/83 0a lists/82))
(apply (field 0 (global Stdlib!)) "biggest_list"))))
``````

compared, for example, with your `pop_one`:

``````(let
(pop_one/81 =
(function tail/83
(let (*match*/161 =a (field 1 tail/83))
(if *match*/161
(makeblock 0
(makeblock 0 (field 0 tail/83) (field 1 *match*/161)))
0a))))
``````
1 Like

I would advise against mixing two standard libraries. I also strive for simplicity and readability and don’t think that novice users need to be overwhelmed by different variants of libraries and premature optimizations of such a simple function.

1 Like

Indeed! My function minimizes the number of cons cells read (so has the right asymptotic complexity), but it does keep allocating lists and pairs (options do not really matter, I hope they are optimized away with enough inlining). We can avoid it using mutable fields and a mutable linked list, but… at that point I don’t think there is any advantage over your function.

Initially I thought that, maybe, folding with `List.compare_lengths` might have a worse complexity, but now that I think about it, I realize it is just a factor 2*. So your allocation-less function is much better.

[*: If L is the length of the longest list, and ℓ1, ℓn are the lengths of all lists but the longest one, then the number of cons cells read by my function is ℓ1 + … + ℓn + max(ℓ1, …, ℓn), while with your function it is 2ℓ1 + … + 2ℓn, because `List.compare_lengths xs ys` reads `2 * min (length xs) (length ys)` cons cells.]

Your complexity analysis ignores the call to `List.filter_map` in your function - it builds a fresh list of lists each time with one element removed from each list (and empty lists then filtered). That’s expensive - much more expensive than traversing the lists.

A micro-benchmark: two random lists of lists, both with a total of million items (i.e. same number of cons cells). `l1` has 40730 lists where the largest individual list is 49 elements long; `l2` has just 35 lists where the largest individual list is 49813 elements. Calling that 100 times with each function gives something comparable for me.

In bytecode, `biggest_list` takes roughly 2.5 seconds on `l1` and 1.5s on `l2` and performs no allocations so is unimpeded by the GC. `longest_list` takes 21s on `l1` (and does 4389 minor collections and 200 major collections) and 6s on `l2` (4264 minor collections and 0 major collections).

Native code doesn’t affect the relative performance of the functions, and neither benefits from flambda with -O3.