What's the fastest way to concatenate strings?

So I recently made https://github.com/AriaFallah/ppx_str, and for the actual implementation, I’m breaking the string into a list and then using String.concat to join the final string together. What I’m wondering though is if there’s a faster way to join the strings together.

1 Like

My guess is that using a Buffer should be faster.

From general principles string concatenation is quadratic, and Buffer provides

accumulative concatenation of strings in quasi-linear time

to quote the documentation

1 Like

I wouldn’t bet on it in this case. String.concat is actually pretty efficient as it computes the length of the result from its argument (a list) and then blits the list items one by one. So there isn’t any allocation of intermediate results.

3 Likes

Not sure if your goal but, if the speed of string concatenation is your main concern, you may want to use a different data structure such as ropes.

1 Like

Hmm this would be kinda overkill.


So from the above comments is there no benchmark that compares ^ vs Buffer vs String.concat vs whatever else, and can decisively say one is better than the other?

Buffer is nice if you want to concatenate a lot of strings, the number and size of which are unknown. If you already have a list of strings that you concatenate to get your result, I don’t see how String.concat wouldn’t be the best.

3 Likes

Bit the bullet and wrote an actual benchmark. It’s by no means scientific, but it’s better than the speculation we have here. Results show that String.concat is the fastest

5 strings x 10000 length
┌───────────────┬──────────┬─────────┬───────────┬──────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │  mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────┼─────────┼───────────┼──────────┼────────────┤
│ ^ operator    │   5.59us │ 379.00w │ 1_506.43w │    0.43w │    100.00% │
│ String.concat │   2.30us │         │   627.00w │          │     41.13% │
│ Buffer        │   5.03us │   9.00w │ 1_254.00w │          │     89.95% │
└───────────────┴──────────┴─────────┴───────────┴──────────┴────────────┘

5 strings x 3 length
┌───────────────┬──────────┬─────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │ Percentage │
├───────────────┼──────────┼─────────┼────────────┤
│ ^ operator    │ 112.33ns │  13.00w │    100.00% │
│ String.concat │  76.13ns │   3.00w │     67.78% │
│ Buffer        │  86.06ns │  15.00w │     76.61% │
└───────────────┴──────────┴─────────┴────────────┘

1000 strings x 1000 length
┌───────────────┬──────────────┬─────────┬─────────────┬──────────┬────────────┐
│ Name          │     Time/Run │ mWd/Run │    mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────────┼─────────┼─────────────┼──────────┼────────────┤
│ ^ operator    │ 227_268.35us │ 378.89w │ 62_564.12kw │    0.11w │    100.00% │
│ String.concat │     421.89us │         │    125.00kw │          │      0.19% │
│ Buffer        │     965.91us │   9.35w │    250.00kw │          │      0.43% │
└───────────────┴──────────────┴─────────┴─────────────┴──────────┴────────────┘

1000 strings x 3 length
┌───────────────┬──────────┬────────────┬─────────────┬──────────┬────────────┐
│ Name          │ Time/Run │    mWd/Run │    mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────┼────────────┼─────────────┼──────────┼────────────┤
│ ^ operator    │ 444.82us │ 88_404.00w │ 100_920.97w │   74.97w │    100.00% │
│ String.concat │  16.43us │            │     377.01w │          │      3.69% │
│ Buffer        │  15.46us │      9.00w │     754.00w │          │      3.48% │
└───────────────┴──────────┴────────────┴─────────────┴──────────┴────────────┘

10000 strings x 3 length
┌───────────────┬──────────┬─────────┬──────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │ mjWd/Run │ Percentage │
├───────────────┼──────────┼─────────┼──────────┼────────────┤
│ String.concat │ 173.52us │         │   3.75kw │    100.00% │
│ Buffer        │ 154.85us │   9.03w │   7.50kw │     89.24% │
└───────────────┴──────────┴─────────┴──────────┴────────────┘
3 Likes

Cool. That’s pretty much what I was expecting. String.concat is nice but is has to traverse the list of strings twice, once to compute the resulting string length, and then to do the blitting.

If there are a lot of small strings, Buffer wins (a little)

2 Likes

the Buffer module works like String.concat but by over-allocating by doubling the buffer size and copying every time you resize:

type t =
 {mutable buffer : bytes;
  mutable position : int;
  mutable length : int;
  initial_buffer : bytes}

So while the run time complexity can be amortized, it performs pretty bad in some cases; if you mispredict the original size you always end up over-allocating and copying.
Re-reading the code I think it has a buffer overflow in there, too. I try to avoid it when possible.

@cfcs if that is the case, please send a PR to fix it.

2 Likes

It is not surprising that String.concat is the fastest one to concat strings from a list, it is built for it! :slight_smile:

However, I would imagine, one would want to compare its performance in some trickier situation. One thing that comes to mind is constructing a string from an escaped string while lexing. In this case, to use String.concat, you would have to cons each smaller string to a list, once you’re done you need to reverse it, then String.concat would have to iterate the list once more to calculate the sum of the lengths of strings to allocate the buffer for the result string, and then iterate once more to blit the buffer.

Now consider ropes, like the one in Core. It’s a pretty simple data structure. In the lexing example, you would append each small string to the rope tree, and as you do it, rope maintains the full length of the rope, so you don’t need to iterate to get it. Once you’re done constructing the rope, it has the length of the rope, which is used to allocate the buffer, and then blit the buffer in one go. In this case you would have to iterate once, and not three times, like in the case of String.concat.

If I have time in the evening I will add Rope to your benchmark. In your case String.concat gets its list constructed “for free”, so the results might be either way.

2 Likes

I added Core_kernel.Rope to the benchmark. I’ve also changed how strings are “fed”. The benchmark function used to take a list of strings (which I think is unfair). I’ve changed it to an array, so the benchmark takes into account construction and reversal of the list in case of String.concat. Still, the results are mixed.

~/Developer/ocaml-str-concat-benchmark   
$ ./_build/default/bench.exe 
5 strings x 10000 length
Estimated testing time 40s (4 benchmarks x 10s). Change using -quota SECS.
┌───────────────┬──────────┬─────────┬───────────┬──────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │  mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────┼─────────┼───────────┼──────────┼────────────┤
│ ^ operator    │  11.85us │ 379.00w │ 1_506.48w │    0.48w │    100.00% │
│ String.concat │   4.64us │  30.00w │   627.03w │          │     39.18% │
│ Buffer        │  11.24us │  10.00w │ 1_254.00w │          │     94.87% │
│ Rope          │   6.28us │  81.01w │   627.05w │          │     52.96% │
└───────────────┴──────────┴─────────┴───────────┴──────────┴────────────┘
5 strings x 3 length
Estimated testing time 40s (4 benchmarks x 10s). Change using -quota SECS.
┌───────────────┬──────────┬─────────┬────────────┐
│ Name          │ Time/Run │ mWd/Run │ Percentage │
├───────────────┼──────────┼─────────┼────────────┤
│ ^ operator    │ 172.21ns │  13.00w │     68.50% │
│ String.concat │ 206.19ns │  33.00w │     82.01% │
│ Buffer        │ 139.69ns │  16.00w │     55.56% │
│ Rope          │ 251.41ns │  84.00w │    100.00% │
└───────────────┴──────────┴─────────┴────────────┘
1000 strings x 1000 length
Estimated testing time 40s (4 benchmarks x 10s). Change using -quota SECS.
┌───────────────┬──────────────┬────────────┬─────────────┬───────────┬────────────┐
│ Name          │     Time/Run │    mWd/Run │    mjWd/Run │  Prom/Run │ Percentage │
├───────────────┼──────────────┼────────────┼─────────────┼───────────┼────────────┤
│ ^ operator    │ 455_022.26us │    379.00w │ 62_564.12kw │     2.08w │    100.00% │
│ String.concat │     985.36us │  6_000.29w │    125.72kw │   713.81w │      0.22% │
│ Buffer        │   2_220.14us │      9.99w │    250.00kw │           │      0.49% │
│ Rope          │   1_075.99us │ 14_011.86w │    126.19kw │ 1_190.80w │      0.24% │
└───────────────┴──────────────┴────────────┴─────────────┴───────────┴────────────┘
Benchmarks that take 1ns to 100ms can be estimated precisely. For more reliable 
estimates, redesign your benchmark to have a shorter execution time.
1000 strings x 3 length
Estimated testing time 40s (4 benchmarks x 10s). Change using -quota SECS.
┌───────────────┬──────────┬────────────┬─────────────┬──────────┬────────────┐
│ Name          │ Time/Run │    mWd/Run │    mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────┼────────────┼─────────────┼──────────┼────────────┤
│ ^ operator    │ 922.56us │ 88_404.00w │ 100_924.26w │   78.26w │    100.00% │
│ String.concat │  45.71us │  6_000.14w │     431.19w │   54.19w │      4.95% │
│ Buffer        │  22.16us │     10.00w │     754.01w │          │      2.40% │
│ Rope          │  45.13us │ 14_011.54w │     514.67w │  137.67w │      4.89% │
└───────────────┴──────────┴────────────┴─────────────┴──────────┴────────────┘
Benchmarks that take 1ns to 100ms can be estimated precisely. For more reliable 
estimates, redesign your benchmark to have a shorter execution time.
10000 strings x 3 length
Estimated testing time 30s (3 benchmarks x 10s). Change using -quota SECS.
┌───────────────┬──────────┬─────────────┬──────────┬──────────┬────────────┐
│ Name          │ Time/Run │     mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────────┼──────────┼─────────────┼──────────┼──────────┼────────────┤
│ String.concat │ 520.63us │  60_001.52w │   9.00kw │   5.25kw │     58.82% │
│ Buffer        │ 251.95us │      10.04w │   7.50kw │          │     28.46% │
│ Rope          │ 885.14us │ 140_016.44w │  18.22kw │  14.47kw │    100.00% │
└───────────────┴──────────┴─────────────┴──────────┴──────────┴────────────┘
Benchmarks that take 1ns to 100ms can be estimated precisely. For more reliable 
estimates, redesign your benchmark to have a shorter execution time.
2 Likes
Code
module Bench = Core_bench.Bench
open Core.Std

module Rope = Core.Std.Rope

let main () =
let short_str = “str” in
let long_str = String.make 1000 ‘c’ in

let short_list_long_str = Array.init 5 ~f:(fun _ -> long_str) in
let short_list_short_str = Array.init 5 ~f:(fun _ -> short_str) in
let long_list_long_str = Array.init 1000 ~f:(fun _ -> long_str) in
let long_list_short_str = Array.init 1000 ~f:(fun _ -> short_str) in
let very_long_list_short_str = Array.init 10000 ~f:(fun _ -> short_str) in

let add_strs array = fun () ->
let s : string = Array.fold_right ~init:"" ~f:(fun a b -> a ^ b) array in
s |> ignore
in

let concat_strs array = fun () ->
let l = Array.fold_right ~init:[] ~f:List.cons array in
let l = List.rev l in
let s : string = String.concat ~sep:"" l in
s |> ignore
in

let buffer_strs size array = fun () ->
let buffer = Buffer.create size in
Array.fold_right ~init:() ~f:(fun s () -> Buffer.add_string buffer s) array;
let s : string = Buffer.contents buffer in
s |> ignore
in

let rope_strs array = fun () ->
let rope = Array.fold_right ~init:Rope.(of_string “”) ~f:(fun a b -> Rope.(of_string a ^ b)) array in
let s : string = Rope.to_string rope in
s |> ignore
in

(* let concat_fair_strs lst = fun () -> String.concat ~sep:"" lst |> ignore in *)

print_endline “5 strings x 10000 length”;
Core.Command.run (Bench.make_command [
Bench.Test.create ~name:"^ operator"
(add_strs short_list_long_str);
Bench.Test.create ~name:“String.concat”
(concat_strs short_list_long_str);
Bench.Test.create ~name:“Buffer”
(buffer_strs 5000 short_list_long_str);
Bench.Test.create ~name:“Rope”
(rope_strs short_list_long_str);
]);

print_endline “5 strings x 3 length”;
Core.Command.run (Bench.make_command [
Bench.Test.create ~name:"^ operator"
(add_strs short_list_short_str);
Bench.Test.create ~name:“String.concat”
(concat_strs short_list_short_str);
Bench.Test.create ~name:“Buffer”
(buffer_strs 15 short_list_short_str);
Bench.Test.create ~name:“Rope”
(rope_strs short_list_short_str);
]);

print_endline “1000 strings x 1000 length”;
Core.Command.run (Bench.make_command [
Bench.Test.create ~name:"^ operator"
(add_strs long_list_long_str);
Bench.Test.create ~name:“String.concat”
(concat_strs long_list_long_str);
Bench.Test.create ~name:“Buffer”
(buffer_strs 1000000 long_list_long_str);
Bench.Test.create ~name:“Rope”
(rope_strs long_list_long_str);
]);

print_endline “1000 strings x 3 length”;
Core.Command.run (Bench.make_command [
Bench.Test.create ~name:"^ operator"
(add_strs long_list_short_str);
Bench.Test.create ~name:“String.concat”
(concat_strs long_list_short_str);
Bench.Test.create ~name:“Buffer”
(buffer_strs 3000 long_list_short_str);
Bench.Test.create ~name:“Rope”
(rope_strs long_list_short_str);
]);

print_endline “10000 strings x 3 length”;
Core.Command.run (Bench.make_command [
Bench.Test.create ~name:“String.concat”
(concat_strs very_long_list_short_str);
Bench.Test.create ~name:“Buffer”
(buffer_strs 30000 very_long_list_short_str);
Bench.Test.create ~name:“Rope”
(rope_strs very_long_list_short_str);
])

let () = main ()

@keleshev could you make a pull request to the repo: https://github.com/AriaFallah/ocaml-str-concat-benchmark?

1 Like

PR: https://github.com/AriaFallah/ocaml-str-concat-benchmark/pull/1