Tyxml performance

katel0k · June 7, 2025, 11:41am

Hello! I’m writing my diploma, where I’m profiling and comparing different solutions for templating HTML. My eye fell on tyxml, and I have found an interesting quirk. Simple benchmark for generating lots of more or less identical divs gave me this graph for times. Ox corresponds to amount of divs and Oy to time spent rendering result. It does that for smaller values as well (it is poorly shown on the graph), first such spikes I found at 17 and 25 divs. I dont think that there is anything special about these values, rather the spacing between them. If you run the snippet below, first 8 times are okay and then there is a spike. Then another (although smaller) spike after 8 more lines. Tbh I’m mostly baffled as to what is going on:) Graph was built using the same code, just for more values, and rerun several times with averaging.

I’m looking for possible explanations of this behaviour, and maybe if It can (or even should) be fixed? Of course I’m not excluding the possibility of my code being flawed, so here is the snippet that I’m using for testing

let repeating_value ~is_colored = Tyxml.Html.(
  div ~a:[a_class
    (["repeating_value"] @ (if is_colored then ["repeating_values-colored"] else [""]))
    ]
)

let rec _stress_render volume ind =
  let open Tyxml.Html in
    if ind < volume then
      (repeating_value ~is_colored:(ind mod 2 == 0) [txt (string_of_int ind)]) :: (_stress_render volume (ind + 1))
    else
      []

let stress_render volume = Tyxml.Html.(
  html
  (head (title @@ txt "Stress") [])
  (body [
    div (_stress_render volume 0)
  ])
)

let st volume = (Format.asprintf "%a" @@ Tyxml.Html.pp ()) (stress_render volume)

let time f x =
  let t = Sys.time() in
  let fx = f x in
  Printf.printf "%f\n" (Sys.time() -. t);
  fx

let () =
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;
  time st 17 |> ignore;

Kakadu · June 7, 2025, 12:32pm

The first guess that comes to my mind, is that is is somehow related to memory allocation. It would be a good idea to write a benchmark using a special library (for example, GitHub - Chris00/ocaml-benchmark: Benchmarking module for OCaml) which could measure both time and allocated memory

EmileTrotignon · June 7, 2025, 2:19pm

Format is a pretty complicated library, something could be happening there

You could use a regular unix profiler to find where the time is spent.

yawaramin · June 7, 2025, 4:54pm

Couple of suggestions, which may not be that important but just as a matter of personal style and preference:

I would do:

("repeating_value" :: if is_colored then ["repeating_values-colored"] else [])

Whenever possible, I would use :: instead of @.

let st volume = (Format.asprintf "%a" @@ Tyxml.Html.pp ()) (stress_render volume)

I would do:

let pp_html = Tyxml.Html.pp ()
let st volume = Format.asprintf "%a" pp_html (stress_render volume)

Whenever possible, lift up values that will always be the same in every invocation of the function, out of the function.

Finally, a small note: ind mod 2 == 0, in this particular case this works fine but usually you should be careful not to use == unless you actually mean to check that the memory address of two values is the same.

katel0k · June 8, 2025, 2:49pm

I have tried to

Just slapped latency and throughput functions from it. However I am struggling to interpret the results. For this run

let () =
  let s = latency1 ~repeat:5 1000L st x in
    print_gc s;
    print_newline ();
  let s = throughput1 ~repeat:5 1 st x in
    print_gc s;
    print_newline ();

I get such results:

Latencies for 1000 iterations of "" (5 runs):
[run 1000 times]:  1.26 WALL ( 1.25 usr +  0.00 sys =  1.26 CPU, minor = 5298.56 MB, major = 180.14 MB) @ 796.03/s (n=1000)
   1.22 WALL ( 1.22 usr +  0.00 sys =  1.22 CPU, minor = 5298.57 MB, major = 180.14 MB) @ 818.02/s (n=1000)
   1.22 WALL ( 1.22 usr +  0.00 sys =  1.22 CPU, minor = 5298.56 MB, major = 180.14 MB) @ 819.67/s (n=1000)
   1.22 WALL ( 1.22 usr +  0.00 sys =  1.22 CPU, minor = 5298.57 MB, major = 180.14 MB) @ 819.65/s (n=1000)
   1.22 WALL ( 1.22 usr +  0.00 sys =  1.22 CPU, minor = 5299.48 MB, major = 180.14 MB) @ 820.54/s (n=1000)
 minor_allocs/iter major_allocs/iter promoted/iter
           5.30 MB         180.14 kB     215.26 kB

Throughputs for "" running 5 times for at least 1 CPU second:
[run >   1 secs]:  1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU, minor = 4561.48 MB, major = 155.10 MB) @ 817.98/s (n=861)
   1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU, minor = 4562.40 MB, major = 155.10 MB) @ 819.55/s (n=861)
   1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU, minor = 4561.48 MB, major = 155.10 MB) @ 820.86/s (n=861)
   1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU, minor = 4562.40 MB, major = 155.10 MB) @ 819.61/s (n=861)
   1.05 WALL ( 1.05 usr +  0.00 sys =  1.05 CPU, minor = 4561.48 MB, major = 155.10 MB) @ 818.61/s (n=861)
 minor_allocs/iter major_allocs/iter promoted/iter
           5.30 MB         180.14 kB     215.24 kB

Could you please tell me what does it mean/where I can read up on it?
Another observation is that if I add old time function call right after those tests, this behaviour is not observed. So my guess is that those spikes correspond to memory allocations for garbage collector.
Another thing is that right before the spike, the opposite happens - code runs several times faster. I have tried plotting time spent per test relative to previous test and got this graph. Values below 1 correspond to the speedup spikes. Interestingly, frequency of spikes decreases with size, which is also unexplained.

conroj · June 8, 2025, 4:40pm

The reason why folks are recommending a benchmarking library like ocaml-benchmark is that the author has already done work to avoid common sources of noise. For example, you can see here how the library forces a GC collection in between benchmark iterations to reduce the performance impact of automatic collections.

Drup · July 23, 2025, 3:41pm

Interesting !

Format.asprintf emits inside a buffer. IIRC, This buffer is expanded by a constant factor (it’s not an exponential doubling) every time its end is reached, which incurs a copy of the content. I would not be surprised at all if this is what you are seeing.

To confirm that, you could try emiting directly to a chan to /dev/null.

Topic		Replies	Views
Generating HTML: TyXML vs Mustache Ecosystem	7	2196	January 10, 2021
Convert HTML to TyXML Learning tyxml	9	962	September 27, 2023
Htmx/htmlact web development approach Community	51	6613	June 12, 2022
> 1s lag on https://github.com/janestreet/incr_dom/tree/master/example/ts_gui Learning	20	905	May 20, 2023
Mysterious performance difference between close to identical programs Learning multicore	8	1252	January 17, 2022

Tyxml performance

Related topics