Generating a map at compile time?

I have a program that programatically defines a very large map (specifically a hashtable, though this could be changed). However, the result is always the same (i.e., there is no runtime input that changes the contents of the map). For many runs, simply instantiating this map takes an appreciable percentage of the total runtime (read: up to half!)

Is there a good (or any) way to define this map in such a way where it is just compiled into the binary rather than constructed at start up every time? It seems like this is a problem that others may have run into, but I have been unable to find anything

3 Likes

This is not exactly what you are asking, but you can use Marshal to save the already constructed map as a binary blob in a file or as literal data in your program and read it back at runtime.

Cheers,
Nicolas

2 Likes

Is the computation of the data the problem, or the construction? If the latter, you could run the computation and dump out the expression for the result with deriving.show, then paste that back into your co de perhaps via cppo include and some text massaging. But I suspect even the construction is to expensive?

You seem to be asking for something equivalent to gperf - which precomputes a hash table at compile time. I donā€™t know of any such tool that exists already for OCaml, but it wouldnā€™t be overly difficult to create one.


I have done this in the past with mixed results. Itā€™s rather straightforward to compute a perfect hash table. I followed the scheme described in the ā€œCompress, Hash, and Displaceā€ paper. The slight problem I had was that I struggled to compete on lookup performance without some effort. If you precompute a hash table, you need to ensure the hashing routine used at compile time is the same one used at runtime. I opted to avoid using the builtin hash function because of stray fears about the compilerā€™s runtime hash function differing from the compiled programā€™s hash function - due to some modifications I was doing (these concerns are unwarranted in general, though, as the compilerā€™s switch will link your programs against the same runtime).

For example, when writing your own hash routine: for hashing strings, you may be tempted to do that at the granularity of individual chars. However, the runtime representation of stringsā€™ contents in OCaml is an array of words, so you could do it at that granularity and itā€™d be more efficient.


Concretely, to do this, you would probably want to use duneā€™s rule feature. That way you can define a module as being generated by an action. For example:

(rule (target table.ml) (deps (:gen ../gen/gen.exe))
  (action (with-stdout-to %{target} (run %{gen}))))

If you want to produce a dynamically-determined number of modules, you can use dynamic_include to generate an entire library defined by dynamically-generated modules. This workflow is more complex, but very useful in other scenarios.


My understanding is that constant data tends to be relocated by the OCaml compiler - in the sense that a literal [1;2;3] would require no heap allocations (itā€™d be emitted as constant data and referred to by labels in the resultant assembly?). Therefore, you may get quite far with a hard-coded ADT-based representation of the data (as suggested above). Otherwise, you can probably encode what you want as string constants (as I believe some lexer and parser generators may do).

2 Likes

I think gperf is primarily concerned with minimizing runtime performance of lookup, no? Itā€™s certainly similar, but my concern primarily is with computing the entries ahead of time (~25k items generated by ~2k lines of ocaml).

My belief is that it is the computation of the entries, though I may find that constructing a table this large is simply always slowā€¦

If you can precompute the entries, you can precompute a table that you hardcode into your program. I donā€™t know of such a tool that does this already, but it falls within the realm of what gperf (and others) do.

You should probably run some benchmarks, but both of the areas of concern - computing the entries and populating the table - are mitigated by precomputing tables that are just mapped into memory when you run your program. Of course, precomputing things doesnā€™t mitigate problems with your hashing and equality functions (if these are slow/degenerate, your lookups will be slow). Resolving symbols from ELF files uses a precomputed hash table, for example (itā€™s fairly common).

EDIT: I expect that the Hashtbl population code is fairly efficient in the grand scheme of things. To that end, you could even generate OCaml code that just populates a Hashtbl at module init time (assuming you conclude that computing the entries is whatā€™s dominating the runtime). Hereā€™s a simple example of doing this with dune.

1 Like

The features under development in the macocaml sysytem in the modular macros project will solve this problem, but itā€™ll be a little while before theyā€™re ready for general use. Once they are, youā€™ll be able to define a (type-safe) compile-time function that generates the data structure during compilation and inserts it into the compiled code.

5 Likes

As have been already suggested you can generate the hashtable and marshall it to a file. And if you can you ppx_blob you can embed the marshalled representation directly in the final program.

Generating ā€¦

let hashtable : (int, string) Hashtbl.t = Hashtbl.create 2000

let () =
  for i = 0 to 1000 do
    Hashtbl.replace hashtable i (string_of_int i)
  done

let out_bin = open_out_bin "hashtable.blob"
let _ = Marshal.to_channel out_bin hashtable []
let _ = close_out out_bin

Using ā€¦

let blob = [%blob "hashtable.blob"]
let hashtable = Marshall.from_string blob 0

Perhaps Iā€™m misunderstanding something (Iā€™m quite new to using Marshal), but my first attempt at doing this to try to get a sense of itā€™s validity as a solution led to a cryptic error when I try to run my resulting executable:

  (Failure
   "input_value: unknown code module 733030F9B8C976DCA4672E5AF5C0750A")

Okay, if I try marshalling and unmarshalling on an associative list rather than the hashtable directly, I donā€™t get the above Failure. Perhaps something about the core Hashtbl doesnā€™t like being marshalled directly.

That said, even with the cost of a from_alist, this experiment is enough to confirm that the that this seems like a good idea in my application, because the resulting binary only spends ~10% of its time loading the module with the large map, compared to ~50% before

Figuring out a build structure that makes use of this will certainly be a bit more difficult with dune, because the items in this map arenā€™t just ocaml primitives but my own types, and I think dune prevents a module from being both in a library and in an executable?

Something that would automate this (like @yallopā€™s modular macros link) would certainly be very nice!

Have you considered making it a gigantic match statement instead? I think Iā€™ve seen people do that to implement really crazy decoding protocols efficiently (even if the code looks insane).

2 Likes

That it fails to unmarshall a hashtable is quite strange for me. Iā€™ve seen it done.

That being said, my use case was ā€œpre-parsingā€ some big XML files. The speed gains were spectacular.

Update. Sorry, I had overlooked that you a speaking about core hashtable. The Stdlib hashtable should be safe in that regard.

Base hash tables cannot be marshaled because they contain functional values. Ignoring the

Fatal error: exception Invalid_argument(ā€œoutput_value: functional valueā€)

error is generally not a good option because the Marshal.Closure flag only allows to circumvent this restriction when the marshalling and demarshalling happen in the same executable.

2 Likes

Another alternative similar in spirit to ppx_blog is ocaml-crunch which can be used to pack a file(-system) into the final executable.

Update: The suggestions to use Marshal, ppx_blob, and custom dune rules to achieve this seem very well placed. The build is very transparent, and the payoff in terms of runtime does seem to be quite significant. Even things like printing the help text end up loading this module, so it was expensive even in the least useful of cases beforehand.

There definitely are some pain points:

  • I needed to do a good bit of refactoring to move just the code that generates this table into an executable. Especially since it is essentially one big implementation detail, it wasnā€™t factored very well in terms of its dependencies on the library it was living in in my project.
  • This also provoked me to move both the generating code and the place that holds the result into a new folder/dune library compared to its previous location. Unfortunately, code you want shared between these two needs to live somewhere else, due to duneā€™s prohibition on a module appearing in both a library and executable in the folder it lives in. If anyone knows why thatā€™s the case, Iā€™d love to hear it.
  • As noted above, Marshal and cores Hashtbl donā€™t mix, so the thing Iā€™m actually storing is an associative list. Presumably that is leaving some performance on the table as it must be converted at startup, not just un-Marshaled
  • Iā€™m concerned about the mental overhead this adds for new or unfamiliar developers. In particular, things like ā€œgo-to definitionā€ take you to the [%blob ...] line, which doesnā€™t help all that much

So, a far cry from the ideal of ā€œsome magic annotation or macro that just tells my value to become compile timeā€, or something like gprof, but not too bad!

Iā€™m still interested in hearing alternative ideas if people have any. If I end up actually going down this path for implementation, Iā€™ll include my fully worked example as a link in this post at some point

1 Like

I had never previously considered this, but it seems like it could be reasonable. I do more than just straight lookups (including dumping it as an alist or a string under certain situations), so it would need to be done a bit carefully. Actually looking things up in this list isnā€™t that big a bottleneck for us, so itā€™s probably not worth the additional engineering effort at this time

1 Like

I tried this approach once, generating a giant match statement of (int ā†’ unicode grapheme types) but I found it was pretty slow at performing the lookups compared to a Hashtbl generated at runtime.

I had a 2 second lookup slowdown with the match statement compared to the Hashtbl (invoking lookup many times in a hot loop).

Itā€™s possible your results will vary and I would be happy to acknowledge that I was doing something wrong or suboptimal, but I would personally prefer to stay away from this approach because of that experience.

1 Like

You might also want to wrap the whole thing in a lazy so you only pay the loading cost if needed.

1 Like

We precompute some unicode stuff and put it through marshal Precompute unicode tables to reduce exe size by SkySkimmer Ā· Pull Request #18687 Ā· coq/coq Ā· GitHub
This is to reduce executable size not for speed.

1 Like

the Marshal.Closure flag only allows to circumvent this restriction when the marshalling and demarshalling happen in the same executable.

I do not have a use-case for that, but Iā€™m just curious about a hypothetical.

Case a) I call my executable with the -dump option. It builds the a hashtable, dumps it a file and immediately exits.

Case b) When I just call my executable in loads the the dumped hashtable. Will it work?