Continuing the discussion from Seq vs List, optimization:
The above discussion is some years old and discusses a specific case. I wonder about list vs Seq.t on a more general level (and not sure if that old thread still reflects current OCaml with respect to performance).
When writing functions that process an ordered βlistβ of elements, I sometimes feel unsure whether I should use list or Seq.t (see Seq in the manual).
I understand that a list will reside completely in memory while a sequence may be lazy. So when I donβt want to keep all elements in memory (or if they havenβt even been generated by the time I start processing the first element), I will have to use Seq.t or some similar structure or mechanism.
Is it correct that OCaml distinguishes between list and Seq.t because it is not lazy by default? In Haskell, for example, both cases are covered by the same data type, right? In OCaml I have to make that decision myself, right?
I experimented a bit with speed by creating a list/sequence of 10,000 elements and consuming them. I have looked at three cases:
- A list where I consume the elements in the same order as they have been created
- by reversing it explicitly using
List.revafter adding all the elements, - by using non-tail-call recursion (accepting stack growth),
- by using the Tail Modulo Constructor transformation.
- by reversing it explicitly using
- A list where process the elements in reverse order of their creation (i.e. not reversing it).
- A sequence.
This is the code I ran:
let consume_list : int list -> int =
let rec aux acc = function
| [] -> acc
| x :: xs -> aux (acc + x) xs
in
aux 0
let consume_seq : int Seq.t -> int =
let rec aux acc seq = match Seq.uncons (seq) with
| None -> acc
| Some (x, xs) -> aux (acc + x) xs
in
aux 0
let provide_list_rev : unit -> int list =
let rec aux acc = function
| i when i < 10000 -> aux (i :: acc) (i + 1)
| _ -> List.rev acc
in
fun () -> aux [] 0
let provide_list_no_tail_call : unit -> int list =
let rec aux = function
| i when i < 10000 -> i :: aux (i + 1)
| _ -> []
in
fun () -> aux 0
let provide_list_tail_mod_cons : unit -> int list =
let[@tail_mod_cons] rec aux = function
| i when i < 10000 -> i :: aux (i + 1)
| _ -> []
in
fun () -> aux 0
let provide_list_lifo : unit -> int list =
let rec aux acc = function
| i when i < 10000 -> aux (i :: acc) (i + 1)
| _ -> acc
in
fun () -> aux [] 0
let provide_seq : unit -> int Seq.t =
let rec aux i () = match i with
| i when i < 10000 -> Seq.Cons (i, aux (i + 1))
| _ -> Seq.Nil
in
fun () -> aux 0
let () =
let args = ref [] in
Arg.parse [] (fun arg -> args := arg::!args) "";
let func = match !args with
| ["1"] -> fun () -> ignore @@ consume_list @@ provide_list_rev ()
| ["2"] -> fun () -> ignore @@ consume_list @@ provide_list_no_tail_call ()
| ["3"] -> fun () -> ignore @@ consume_list @@ provide_list_tail_mod_cons ()
| ["4"] -> fun () -> ignore @@ consume_list @@ provide_list_lifo ()
| ["5"] -> fun () -> ignore @@ consume_seq @@ provide_seq ()
| _ -> failwith "Unexpected command line argument"
in
for i = 1 to 100000 do
func ()
done
Guess what is fastest on my machine, using ocamlopt -O2 and taking the median of three runs each? (click below to uncover)
- List and consume in same order as creation
- Case #1 using
List.rev: 7.59 seconds - Case #2 using non-tail-call recursion: 5.53 seconds
- Case #3 using tail-modulo-cons: 6.75 seconds
- Case #1 using
- Case #4 using list in last-in-first-out order: 3.28 seconds
- Case #5 using a sequence: 4.45 seconds
Of course, these results may be non-representative in real-world scenarios and may also depend on data-size etc. And I didnβt use a sophisticated benchmarking framework, so donβt take these numbers too seriously.
But interestingly I see no clear performance advantage of list over Seq.t, unless the order of processing the created elements doesnβt matter or I can create or process the list in reverse order, such that List.rev or a similar mechanism becomes unnecessary.
Does that mean I should use Seq.t more often in cases where I donβt need a persistent list?
I have also been wondering about this:
Are there some downsides of Seq that I should know about, @dbuenzli? My rough benchmark above makes me feel like Seq.t is a good alternative to list in many cases (even though the syntax is sometimes a bit more confusing, using Seq.Cons and Seq.uncons).