How can I compare more than two lists and return the longest from them? (Is it possible when I do not know an exact number of lists?)
you can save both the current longest list and its length so you only need to iterate once.
Naive idea: pass a list of lists, recursively compute the length of each element and save the max into an accumulator variable.
You mean you need to find the longest list from a list of lists? Then you just need to find the max element, e.g.,
(* file demo.ml *) open Base let longest = List.max_elt ~compare:(fun x y -> compare (List.length x) (List.length y))
To build use (once)
dune init exe demo --libs base
To run use
dune exec ./demo.exe
Maybe you want to compute lengths lazily, for instance if you have one list that is much longer than the others. Essentially, you’d want a generalization of
List.compare_lengths. You may do it like this.
(* Return the index of the (first) longest list. *) let longest_list : 'a list list -> int = let pop_one tail = begin match tail with | (_, ) -> None | (i, _::tl) -> Some (i, tl) end in let rec longest tails = begin match tails with (* no candidate remaining: won’t happen *) |  -> assert false (* only one candidate remaining: we have a winner *) | (i, _) ::  -> i (* at least two candidates remaining: continue selection *) | (i, _) :: _ -> (* eliminate empty candidates, pop an element from others: *) begin match List.filter_map pop_one tails with (* if we have just eliminated all remaining candidates, then * then they were all the same length; we return the first one * (not fair!) *) |  -> i (* otherwise, continue selection *) | tails' -> longest tails' end end in fun lists -> begin match lists with |  -> raise Not_found (* cannot find the longest of no lists *) | _ -> longest (List.mapi (fun i li -> (i, li)) lists) end
The idea of this code is to repeatedly pop an element from each list, eliminating tails as soon as they become empty. When only one tail remains, that is the tail of your longest list. In order to recover the full list at that step, you need to have paired the tails with some bit of info. Here I pair them with the index of their corresponding list in the list of lists but, depending on your needs, could pair it with the list itself (using
(fun _ li -> (li, li))).
Wouldn’t you take advantage of
List.compare_lengths here? (In the standard library since OCaml 4.05).
open Base let longest = List.max_elt ~compare:List.compare_lengths
Thank you everyone <3
I am not sure that this function is in the
List, iirc it is not.
Should be then available as
It’s worth noting, @Maelan, that while worrying about the cost of
List.length, the function you’ve introduced to replace it does masses of allocation! You can generalise
List.compare_lengths without needing Base;
max_elt is just a fold:
let biggest_list = function |  -> invalid_arg "biggest_list" | lists -> let f max list = if List.compare_lengths list max > 0 then list else max in List.fold_left f  lists
You can get a sense of the allocations going on in your function by compiling code with
-dlambda and looking for
makeblock calls (there’s always one at the end of a module for the module itself): e.g. the above
(let (biggest_list/80 = (function lists/82 (if lists/82 (let (f/83 = (function max/85 list/86 (if (> (apply (field 1 (global Stdlib__list!)) list/86 max/85) 0) list/86 max/85))) (apply (field 22 (global Stdlib__list!)) f/83 0a lists/82)) (apply (field 0 (global Stdlib!)) "biggest_list"))))
compared, for example, with your
(let (pop_one/81 = (function tail/83 (let (*match*/161 =a (field 1 tail/83)) (if *match*/161 (makeblock 0 (makeblock 0 (field 0 tail/83) (field 1 *match*/161))) 0a))))
I would advise against mixing two standard libraries. I also strive for simplicity and readability and don’t think that novice users need to be overwhelmed by different variants of libraries and premature optimizations of such a simple function.
Indeed! My function minimizes the number of cons cells read (so has the right asymptotic complexity), but it does keep allocating lists and pairs (options do not really matter, I hope they are optimized away with enough inlining). We can avoid it using mutable fields and a mutable linked list, but… at that point I don’t think there is any advantage over your function.
Initially I thought that, maybe, folding with
List.compare_lengths might have a worse complexity, but now that I think about it, I realize it is just a factor 2*. So your allocation-less function is much better.
[*: If L is the length of the longest list, and ℓ1, ℓn are the lengths of all lists but the longest one, then the number of cons cells read by my function is ℓ1 + … + ℓn + max(ℓ1, …, ℓn), while with your function it is 2ℓ1 + … + 2ℓn, because
List.compare_lengths xs ys reads
2 * min (length xs) (length ys) cons cells.]
Your complexity analysis ignores the call to
List.filter_map in your function - it builds a fresh list of lists each time with one element removed from each list (and empty lists then filtered). That’s expensive - much more expensive than traversing the lists.
A micro-benchmark: two random lists of lists, both with a total of million items (i.e. same number of cons cells).
l1 has 40730 lists where the largest individual list is 49 elements long;
l2 has just 35 lists where the largest individual list is 49813 elements. Calling that 100 times with each function gives something comparable for me.
biggest_list takes roughly 2.5 seconds on
l1 and 1.5s on
l2 and performs no allocations so is unimpeded by the GC.
longest_list takes 21s on
l1 (and does 4389 minor collections and 200 major collections) and 6s on
l2 (4264 minor collections and 0 major collections).
Native code doesn’t affect the relative performance of the functions, and neither benefits from flambda with -O3.