Performance of `Printf.sprintf`

With core_bench I’ve discovered a hot path in my code, namely formatting a date to string was rather slow with Calendar ( Then I went searching for some faster implementations and I found Ptime and Core strftime.
The conclusion is that core is by far the fastest, so I would like to use that. But a problem that I have is that I want support for Windows as well, so Core seems out (

So, because I only need fixed and basic formatting I was thinking of just rolling out my own implementation:

let format = ({tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec}: => {
    1900 + tm_year,
    tm_mon + 1,

But this is still between 3-4 times slower than Core. I’m not sure why that is the case? Maybe because Core delegates to a native c strftime?
Still I had not expected such a large difference.

Is there anything I can do to close the gap?

Or is there an option with dune to use Core on linux and mac, but my own implementation on windows?



Yes, Core just calls the C strftime function (source). So, you could do something similar on Windows, presumably. I don’t know what the Windows C stubs would look like, unfortunately.

Are there possible optimisations to my code that could get the performance closer to strftime. I tested creating the string myself with concatenation but that is similar or slower than sprintf.
To be honest I had expected the difference to be a bit smaller. it’s not too bad but I’m formatting millions of dates so it’s adding up.

Did you try to create a Bytes.t and write in-place the date? Your format looks like it is mostly fixed size, and you can probably afford a slow branch for years outside of the [1000,9999] interval.

I haven’t tried with bytes, so will try that. Not sure what what you mean with write the date in-place. I’m only interested in dates in the 2015-2025 range.

I meant something like:

let memory = Bytes.of_string "2015-12-31T23:58:59Z"
let set pos int = Bytes.unsafe_set memory pos (Char.code (48+int))
let print year month day h m s =
  set 2 (year/10 mod 10);
  set 3 (year mod 10);
  Bytes.to_string memory

Wow thank you very much. The performance of this is even 4 times faster than the C strftime implementation. So this is exactly what I need.

It’s also very good for me to learn that only having some string concatenation can be slower than Bytes. I didn’t think it would make this much of an impact here.


This is a nice demonstration of a general optimization principle: it is -amazing- how often “allocation avoidance” can speed up programs. Just amazing. And this holds across many different languages, but especially in GCed languages.

A useful way to profile programs, is to use a “clock” which is “bytes allocated”, and then perform the same analyses you would do if the clock were “seconds elapsed”. Finding allocation hotspots, in short.


Not exactly the question of the OP, but the following post may be of interest as well,

1 Like

Obligatory mention of pmpa :


w00t! Thank you, ygrek! I don’t need it today, but … I’m sure I will sooner or later!