I’ve been wanting to port a spatial simulation to from OCaml 4 to parallel OCaml 5. I’m looking for some advice how to do it best.
From a high level perspective, the simulation loop updates the positions of N agents (walkers) in a 2D rectangular space (arena). Position updates are one major computational cost. Each walker may collide with another one, and this collision is handled by the code.
The naive N^2 scaling of collision detection is avoided by using neighbor lists. That is, the domain is divided in K x K subarenas, and each subarena has an associated list of contained walkers. On each step of the loop, a walker checks for collisions only with walkers in it’s own or neighboring subarenas; also each subarena member list needs to be updated by implementing transitions of walkers to a neighboring subarena. These steps scale as N. Still, the other significant part of the cost comes from collision detection.
There is a natural unit of granularity: the subarena. I thought I’d assign one
Domain.t to each and let the position updates of walkers within it be computed in this worker domain in parallel. The walker transfers and collision detection will require communication between domains.
However, the number of subarenas is a tuning parameter of the algorithm, and I can’t simply fix it to the number of cores. For a large system, I can easily be fastest to run 100 subarenas on the current sequential version of the code, while my machine has much less cores. So maybe it’s wrong to try to assign each subarena its own
Another question is that of inter-subarena communication. Is it best to keep the member lists of walkers in each subarena in the global heap, or should those be in thread-local storage, and walker transfers are passed as messages through appropriate domainslib channels? The walker positions are the most frequently updated data. Currently each walker is represented by a record with a unique ID and a mutable float subrecord for position, and member lists are hash tables using keyed by ID.
If possible with such a high-level description, I would appreciate good performance / program structuring advice!