How to compare Lists

Luka_Karoiani · May 7, 2021, 10:10am

How can I compare more than two lists and return the longest from them? (Is it possible when I do not know an exact number of lists?)

LdBeth · May 7, 2021, 11:16am

you can save both the current longest list and its length so you only need to iterate once.

drbb · May 7, 2021, 11:53am

Naive idea: pass a list of lists, recursively compute the length of each element and save the max into an accumulator variable.

ivg · May 7, 2021, 12:34pm

You mean you need to find the longest list from a list of lists? Then you just need to find the max element, e.g.,

(* file demo.ml *)
open Base

let longest = List.max_elt ~compare:(fun x y ->
    compare (List.length x) (List.length y))

To build use (once)

dune init exe demo --libs base

To run use

dune exec ./demo.exe

Maelan · May 8, 2021, 10:53am

Maybe you want to compute lengths lazily, for instance if you have one list that is much longer than the others. Essentially, you’d want a generalization of List.compare_lengths. You may do it like this.

(* Return the index of the (first) longest list. *)
let longest_list : 'a list list -> int =
  let pop_one tail =
    begin match tail with
    | (_, [])    -> None
    | (i, _::tl) -> Some (i, tl)
    end
  in
  let rec longest tails =
    begin match tails with
    (* no candidate remaining: won’t happen *)
    | [] ->
        assert false
    (* only one candidate remaining: we have a winner *)
    | (i, _) :: [] ->
        i
    (* at least two candidates remaining: continue selection *)
    | (i, _) :: _ ->
        (* eliminate empty candidates, pop an element from others: *)
        begin match List.filter_map pop_one tails with
        (* if we have just eliminated all remaining candidates, then
         * then they were all the same length; we return the first one
         * (not fair!) *)
        | []     -> i
        (* otherwise, continue selection *)
        | tails' -> longest tails'
        end
    end
  in
  fun lists ->
    begin match lists with
    | [] -> raise Not_found (* cannot find the longest of no lists *)
    | _  -> longest (List.mapi (fun i li -> (i, li)) lists)
    end

The idea of this code is to repeatedly pop an element from each list, eliminating tails as soon as they become empty. When only one tail remains, that is the tail of your longest list. In order to recover the full list at that step, you need to have paired the tails with some bit of info. Here I pair them with the index of their corresponding list in the list of lists but, depending on your needs, could pair it with the list itself (using (fun _ li -> (li, li))).

Maelan · May 8, 2021, 11:08am

Wouldn’t you take advantage of List.compare_lengths here? (In the standard library since OCaml 4.05).

open Base

let longest = List.max_elt ~compare:List.compare_lengths

Luka_Karoiani · May 8, 2021, 12:26pm

Thank you everyone <3

ivg · May 8, 2021, 6:13pm

I am not sure that this function is in the Base's List, iirc it is not.

Leonidas · May 10, 2021, 8:15am

Should be then available as Caml.List.compare_lengths.

dra27 · May 10, 2021, 10:53am

It’s worth noting, @Maelan, that while worrying about the cost of List.length, the function you’ve introduced to replace it does masses of allocation! You can generalise List.compare_lengths without needing Base; max_elt is just a fold:

let biggest_list = function
| [] -> invalid_arg "biggest_list"
| lists ->
    let f max list =
      if List.compare_lengths list max > 0 then
        list
      else
        max
    in
    List.fold_left f [] lists

You can get a sense of the allocations going on in your function by compiling code with -dlambda and looking for makeblock calls (there’s always one at the end of a module for the module itself): e.g. the above biggest_list:

(let
  (biggest_list/80 =
    (function lists/82
      (if lists/82
        (let
          (f/83 =
             (function max/85 list/86
               (if
                 (>
                    (apply (field 1 (global Stdlib__list!)) list/86 max/85)
                   0)
                 list/86 max/85)))
          (apply (field 22 (global Stdlib__list!)) f/83 0a lists/82))
        (apply (field 0 (global Stdlib!)) "biggest_list"))))

compared, for example, with your pop_one:

(let
  (pop_one/81 =
     (function tail/83
        (let (*match*/161 =a (field 1 tail/83))
           (if *match*/161
             (makeblock 0
               (makeblock 0 (field 0 tail/83) (field 1 *match*/161)))
             0a))))

ivg · May 10, 2021, 2:12pm

I would advise against mixing two standard libraries. I also strive for simplicity and readability and don’t think that novice users need to be overwhelmed by different variants of libraries and premature optimizations of such a simple function.

Maelan · May 10, 2021, 4:12pm

Indeed! My function minimizes the number of cons cells read (so has the right asymptotic complexity), but it does keep allocating lists and pairs (options do not really matter, I hope they are optimized away with enough inlining). We can avoid it using mutable fields and a mutable linked list, but… at that point I don’t think there is any advantage over your function.

Initially I thought that, maybe, folding with List.compare_lengths might have a worse complexity, but now that I think about it, I realize it is just a factor 2*. So your allocation-less function is much better.

[*: If L is the length of the longest list, and ℓ1, ℓn are the lengths of all lists but the longest one, then the number of cons cells read by my function is ℓ1 + … + ℓn + max(ℓ1, …, ℓn), while with your function it is 2ℓ1 + … + 2ℓn, because List.compare_lengths xs ys reads 2 * min (length xs) (length ys) cons cells.]

dra27 · May 10, 2021, 7:13pm

Your complexity analysis ignores the call to List.filter_map in your function - it builds a fresh list of lists each time with one element removed from each list (and empty lists then filtered). That’s expensive - much more expensive than traversing the lists.

A micro-benchmark: two random lists of lists, both with a total of million items (i.e. same number of cons cells). l1 has 40730 lists where the largest individual list is 49 elements long; l2 has just 35 lists where the largest individual list is 49813 elements. Calling that 100 times with each function gives something comparable for me.

In bytecode, biggest_list takes roughly 2.5 seconds on l1 and 1.5s on l2 and performs no allocations so is unimpeded by the GC. longest_list takes 21s on l1 (and does 4389 minor collections and 200 major collections) and 6s on l2 (4264 minor collections and 0 major collections).

Native code doesn’t affect the relative performance of the functions, and neither benefits from flambda with -O3.

Topic		Replies	Views
Some embarrassing code Learning	4	1835	August 7, 2017
Need help please about some exercices Learning	8	825	September 13, 2022
Traversing matrix Learning	5	1259	May 10, 2020
I need help with ocaml Learning announce	1	757	April 29, 2019
How do I reduce complexity of this algorithm that take first and last element? Learning	27	981	February 16, 2024

How to compare Lists

Related topics