I have a program that programatically defines a very large map (specifically a hashtable, though this could be changed). However, the result is always the same (i.e., there is no runtime input that changes the contents of the map). For many runs, simply instantiating this map takes an appreciable percentage of the total runtime (read: up to half!)
Is there a good (or any) way to define this map in such a way where it is just compiled into the binary rather than constructed at start up every time? It seems like this is a problem that others may have run into, but I have been unable to find anything
This is not exactly what you are asking, but you can use Marshal to save the already constructed map as a binary blob in a file or as literal data in your program and read it back at runtime.
Is the computation of the data the problem, or the construction? If the latter, you could run the computation and dump out the expression for the result with deriving.show, then paste that back into your co de perhaps via cppo include and some text massaging. But I suspect even the construction is to expensive?
You seem to be asking for something equivalent to gperf - which precomputes a hash table at compile time. I donāt know of any such tool that exists already for OCaml, but it wouldnāt be overly difficult to create one.
I have done this in the past with mixed results. Itās rather straightforward to compute a perfect hash table. I followed the scheme described in the āCompress, Hash, and Displaceā paper. The slight problem I had was that I struggled to compete on lookup performance without some effort. If you precompute a hash table, you need to ensure the hashing routine used at compile time is the same one used at runtime. I opted to avoid using the builtin hash function because of stray fears about the compilerās runtime hash function differing from the compiled programās hash function - due to some modifications I was doing (these concerns are unwarranted in general, though, as the compilerās switch will link your programs against the same runtime).
For example, when writing your own hash routine: for hashing strings, you may be tempted to do that at the granularity of individual chars. However, the runtime representation of stringsā contents in OCaml is an array of words, so you could do it at that granularity and itād be more efficient.
Concretely, to do this, you would probably want to use duneās rule feature. That way you can define a module as being generated by an action. For example:
If you want to produce a dynamically-determined number of modules, you can use dynamic_include to generate an entire library defined by dynamically-generated modules. This workflow is more complex, but very useful in other scenarios.
My understanding is that constant data tends to be relocated by the OCaml compiler - in the sense that a literal [1;2;3] would require no heap allocations (itād be emitted as constant data and referred to by labels in the resultant assembly?). Therefore, you may get quite far with a hard-coded ADT-based representation of the data (as suggested above). Otherwise, you can probably encode what you want as string constants (as I believe some lexer and parser generators may do).
I think gperf is primarily concerned with minimizing runtime performance of lookup, no? Itās certainly similar, but my concern primarily is with computing the entries ahead of time (~25k items generated by ~2k lines of ocaml).
My belief is that it is the computation of the entries, though I may find that constructing a table this large is simply always slowā¦
If you can precompute the entries, you can precompute a table that you hardcode into your program. I donāt know of such a tool that does this already, but it falls within the realm of what gperf (and others) do.
You should probably run some benchmarks, but both of the areas of concern - computing the entries and populating the table - are mitigated by precomputing tables that are just mapped into memory when you run your program. Of course, precomputing things doesnāt mitigate problems with your hashing and equality functions (if these are slow/degenerate, your lookups will be slow). Resolving symbols from ELF files uses a precomputed hash table, for example (itās fairly common).
EDIT: I expect that the Hashtbl population code is fairly efficient in the grand scheme of things. To that end, you could even generate OCaml code that just populates a Hashtbl at module init time (assuming you conclude that computing the entries is whatās dominating the runtime). Hereās a simple example of doing this with dune.
The features under development in the macocaml sysytem in the modular macros project will solve this problem, but itāll be a little while before theyāre ready for general use. Once they are, youāll be able to define a (type-safe) compile-time function that generates the data structure during compilation and inserts it into the compiled code.
As have been already suggested you can generate the hashtable and marshall it to a file. And if you can you ppx_blob you can embed the marshalled representation directly in the final program.
Generating ā¦
let hashtable : (int, string) Hashtbl.t = Hashtbl.create 2000
let () =
for i = 0 to 1000 do
Hashtbl.replace hashtable i (string_of_int i)
done
let out_bin = open_out_bin "hashtable.blob"
let _ = Marshal.to_channel out_bin hashtable []
let _ = close_out out_bin
Using ā¦
let blob = [%blob "hashtable.blob"]
let hashtable = Marshall.from_string blob 0
Perhaps Iām misunderstanding something (Iām quite new to using Marshal), but my first attempt at doing this to try to get a sense of itās validity as a solution led to a cryptic error when I try to run my resulting executable:
Okay, if I try marshalling and unmarshalling on an associative list rather than the hashtable directly, I donāt get the above Failure. Perhaps something about the core Hashtbl doesnāt like being marshalled directly.
That said, even with the cost of a from_alist, this experiment is enough to confirm that the that this seems like a good idea in my application, because the resulting binary only spends ~10% of its time loading the module with the large map, compared to ~50% before
Figuring out a build structure that makes use of this will certainly be a bit more difficult with dune, because the items in this map arenāt just ocaml primitives but my own types, and I think dune prevents a module from being both in a library and in an executable?
Something that would automate this (like @yallopās modular macros link) would certainly be very nice!
Have you considered making it a gigantic match statement instead? I think Iāve seen people do that to implement really crazy decoding protocols efficiently (even if the code looks insane).
error is generally not a good option because the Marshal.Closure flag only allows to circumvent this restriction when the marshalling and demarshalling happen in the same executable.
Update: The suggestions to use Marshal, ppx_blob, and custom dune rules to achieve this seem very well placed. The build is very transparent, and the payoff in terms of runtime does seem to be quite significant. Even things like printing the help text end up loading this module, so it was expensive even in the least useful of cases beforehand.
There definitely are some pain points:
I needed to do a good bit of refactoring to move just the code that generates this table into an executable. Especially since it is essentially one big implementation detail, it wasnāt factored very well in terms of its dependencies on the library it was living in in my project.
This also provoked me to move both the generating code and the place that holds the result into a new folder/dune library compared to its previous location. Unfortunately, code you want shared between these two needs to live somewhere else, due to duneās prohibition on a module appearing in both a library and executable in the folder it lives in. If anyone knows why thatās the case, Iād love to hear it.
As noted above, Marshal and cores Hashtbl donāt mix, so the thing Iām actually storing is an associative list. Presumably that is leaving some performance on the table as it must be converted at startup, not just un-Marshaled
Iām concerned about the mental overhead this adds for new or unfamiliar developers. In particular, things like āgo-to definitionā take you to the [%blob ...] line, which doesnāt help all that much
So, a far cry from the ideal of āsome magic annotation or macro that just tells my value to become compile timeā, or something like gprof, but not too bad!
Iām still interested in hearing alternative ideas if people have any. If I end up actually going down this path for implementation, Iāll include my fully worked example as a link in this post at some point
I had never previously considered this, but it seems like it could be reasonable. I do more than just straight lookups (including dumping it as an alist or a string under certain situations), so it would need to be done a bit carefully. Actually looking things up in this list isnāt that big a bottleneck for us, so itās probably not worth the additional engineering effort at this time
I tried this approach once, generating a giant match statement of (int ā unicode grapheme types) but I found it was pretty slow at performing the lookups compared to a Hashtbl generated at runtime.
I had a 2 second lookup slowdown with the match statement compared to the Hashtbl (invoking lookup many times in a hot loop).
Itās possible your results will vary and I would be happy to acknowledge that I was doing something wrong or suboptimal, but I would personally prefer to stay away from this approach because of that experience.