Is it safe to use a global hash table in an ocaml-5 parallel program?

I don’t think this is a useful approach. For instance, the C++ standard takes a much stronger stance on this topic:

  • “A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.”
  • “Implementations are required to avoid data races when the contents of the contained
    object in different elements in the same container, excepting vector<bool>, are modified concurrently.”

Sure, this precludes having splay trees as containers in the C++ standard library, but I don’t think users care.

As for 'a Map.Make(String).t Atomic.t, I don’t see how it helps, since using Atomic.get (Atomic.make x) is the same as directly using x. The only thing Atomic guarantees is that the reference to the data structure is accessed atomically, but it does not guarantee anything about the atomicity of accesses to the data structure itself. Did you have something else in mind?

There’s also mcavl for lockfree sets and maps (… shameless plug ^^')

I believe multicore efficient hash tables are still a work in progress though, I’m only aware of the promising stuff happening in the paradict repo :slight_smile: . Anyone interested should also check the lockfree project in the future as they are likely to add hash tables to their repertoire…

(In the mean time, the good news is that a standard Hashtbl with a Mutex is surprisingly fast for most applications, you should benchmark it before considering alternatives!)

1 Like

@gadmm your point certainly has merit, but then I think the right course of action is to suggest a different specification/documentation, rather than ask various questions about the possibility of not respecting the current specification.

@silene the C++ rules sound like they ask data-structure authors to respect separation in the sense of my issue mentioned above. This is nice / interesting, but I guess it also requires complex (and possibly slower) implementations.

Regarding a 'a StringMap.t Atomic.t, this type lets you safely update the map concurrently, and ensure when you read it that you synchronize with other writers domain as you would expect. I think that I don’t understand what you mean by “it does not guarantee anything about the atomicity of accesses to the data structure itself”; maps are immutable, so what atomicity are we talking about? (Surely you do not expect guarantees on access to the elements of the map.)

The mcavl performance graphs have a y-axis that does not start at 0. How dare you? :slight_smile:

1 Like

For their defense, I think I used the same library and this is what is produced by default, otherwise one has to set the axis ranges manually which requires some coding.

What I mean is that the following code is not thread-safe.

let y = Atomic.get x in
let z = Map.add y ... in
Atomic.set x z

If there was a function Atomic.modify : ('a -> 'a) -> 'a Atomic.t -> unit, it would be a different matter. But you cannot expect users to write such a function.

Ah, you remind me of Atomic.modify, Atomic.modify_get by gasche · Pull Request #10798 · ocaml/ocaml · GitHub .

(Still I am not sure what is wrong with this type I proposed, maybe your feedback is “we need better memory support for atomic”?)

What is “wrong” with this type is that it still requires a full-blown synchronization from the user (e.g., mutexes) to be made thread-safe. So, at this point, using Atomic becomes kind of useless. You could directly use a map or even a hashtable instead.

@gasche Apologies for the noise. Upon re-reading the thread again, it seems you have answered my query above in one of your earlier replies.

No need to apologise! In fact the previous answer was on a different question. In presence of potential mutations, both mutating and non-mutating operations need to synchronize. But, as @gadmm hints at, if you know that the hashtable is permanently immutable (learning this may require synchronization), then reads don’t need synchronization.

@silene: right, I see your point now. I had a CAS-loop in mind. The problem is that you have to architect your code to be able replay whatever modification you wanted on the new version, which is non-trivial. So I agree that, in absence of a library-provided modify function on atomics (and possibly even in their presence) a mutex is a simpler approach. Has anyone published a type 'a locked = 'a * Mutex.t helper library yet with a wrapper over Mutex.protect?

2 Likes

Unsynchronized accesses to hash tables are a programming error. […] Thus, concurrent accesses to a hash tables must be synchronized (for instance with a {!Mutex.t}).

Well, to me it reads like the typical warning against races, and therefore one can do concurrent reads if there was a synchronization with prior writes. Then it is a matter of knowing which operations mutate the data structure and which one does not, where it becomes a problem of missing documentation, rather than going against specification.

Be careful about what you wish for, because if you start using language blaming users about “not respecting the current specification”, they might start to be more demanding of such a specification that makes OCaml a useful and practical language (starting with documentation).

I’m not sure I follow: of course improving the documentation/specification sounds like a good idea and we should do it.

(Going on a tangent: we should also be realistic about the fact that people frequently infer specifications by testing current implementations, so documenting rules clearly does not suffice to make issues go away. The concurrent race-finding / schedule testing tools are invaluable here.)

CCLock (containers-thread.CCLock) has been there for a while :slight_smile:

3 Likes

I went ahead and did a small core_bench benchmark of hashtbl find with and without mutex and comparing both against compile time assoc function.

$ cat dune
(executable
 (name bench_hashtbl)
 (libraries
  core
  core_unix.command_unix
  core_unix.time_stamp_counter
  core_bench))

$ cat bench_hashtbl.ml
open Core_bench

let kv = function
  | "a" -> "val a"
  | "b" -> "val b"
  | "c" -> "val c"
  | _ -> assert false

let assoc () =
  let _ = kv "a" in
  let _ = kv "b" in
  let _ = kv "c" in 
  ()

let (h, m) = (Hashtbl.create 10, Mutex.create ())
let () = 
  [("a", "val a"); ("b", "val b"); ("c", "val c")] 
  |> List.to_seq
  |> Hashtbl.add_seq h

let find_m k = 
  Mutex.lock m;
  let v = Hashtbl.find h k in
  Mutex.unlock m;
  v

let hashtbl_m () =
  let _ = find_m "a" in
  let _ = find_m "b" in
  let _ = find_m "c" in
  ()

let hashtbl () =
  let _ = Hashtbl.find h "a" in
  let _ = Hashtbl.find h "b" in
  let _ = Hashtbl.find h "c" in
  ()

let () =
  Command_unix.run
    (Bench.make_command
       [
         Core_bench.Bench.Test.create ~name:"assoc" assoc;
         Core_bench.Bench.Test.create ~name:"hashtbl_m" hashtbl_m;
         Core_bench.Bench.Test.create ~name:"hashtbl" hashtbl;
       ])

The results via dune exec bench_hashtbl/bench_hashtbl.exe --profile release gives the following:

Done: 87% (7/8, 1 left) (jobs: 1)Estimated testing time 30s (3 benchmarks x 10s). Change using '-quota'.
┌───────────┬──────────┬────────────┐
│ Name      │ Time/Run │ Percentage │
├───────────┼──────────┼────────────┤
│ assoc     │  10.17ns │     10.33% │
│ hashtbl_m │  98.52ns │    100.00% │
│ hashtbl   │  56.07ns │     56.91% │
└───────────┴──────────┴────────────┘

So Hashtbl.find with a mutex is about 50% slower than without?

On a small hashtable (= good locality) and small keys (= fast hashing/equality), apparently yes! But how much time is the application spending on Hashtbl.find for the slowdown to be an issue? (it’s so fast that it’s hopefully a small percentage of the runtime)

Another benchmark doing a bit of extrawork:

let size = 1_000_000

let random_key () = string_of_int (Random.int size)

let (h, m) = (Hashtbl.create 10, Mutex.create ())
let () = 
  for i = 0 to size - 1 do
    Hashtbl.add h (string_of_int i) i
  done

let find_m k = 
  Mutex.lock m;
  Fun.protect
    (fun () -> Hashtbl.find h k)
    ~finally:(fun () -> Mutex.unlock m)

let assoc () = ignore (random_key ())
let hashtbl_m () = ignore (find_m (random_key ()))
let hashtbl () = ignore (Hashtbl.find h (random_key ()))
┌───────────┬──────────┬─────────┬────────────┐
│ Name      │ Time/Run │ mWd/Run │ Percentage │
├───────────┼──────────┼─────────┼────────────┤
│ assoc     │  71.33ns │   2.00w │     14.01% │
│ hashtbl_m │ 509.26ns │  10.02w │    100.00% │
│ hashtbl   │ 484.46ns │   2.00w │     95.13% │
└───────────┴──────────┴─────────┴────────────┘

But yes, this will get worse if you are hammering the poor table with multiple domains… I’m curious about multiple-readers single-writer locks now :stuck_out_tongue:


@gasche > :smiley: Woups thanks, I’m bad at gnuplot! (it should be fixed, modulo caching)

1 Like

From the interface’s function signatures and the documentation, I really have no idea of how Domain.DLS is supposed to be used.

If Domain.DLS is used to store a single value per domain (as you just said), then why is there a damn key needed to retrieve that value?

set should have type: 'a → ().
get should have type: () → 'a.

If each domain needs an implicit key, that other domains are not using, then there is already Domain.self() that the implementation can use.

Domain.DLS is a distraction, it’s unlikely to be what you are looking for… The API clearly indicates that an 'a key is like an 'a ref, except that the stored value is specific to each domain (rather than shared). If DLS wasn’t available, you would have to implement something like this pseudo-code to replicate its functionalities:

type 'a key = { init : (unit -> 'a) ; values : 'a InfiniteSparseArray.t }

let new_key init = { init ; values = InfiniteSparseArray.create 0 }

let set k v = k.values.((Domain.self () :> int)) <- v

let get k =
  try k.values.((Domain.self () :> int))
  with Not_found -> let v = init () in set k v ; v

(but DLS does more stuff that can’t be implemented in user-space)

As to why it exists, well there are some modules like Random that use global state that shouldn’t be shared with other domains. As a dirty example, consider a module that provides a shell-like implicit current path depending on previous commands:

module Shell = struct
  let cwd = ref []

  let current_working_directory () = String.concat "/" ("" :: List.rev !cwd)

  let rec change_path new_path cur_path = match new_path, cur_path with
    | [], _ -> cur_path
    | ".." :: new_path, _ :: cur_path -> change_path new_path cur_path
    | ("" | "..") :: new_path, _ -> change_path new_path []
    | dir :: new_path, _ -> change_path new_path (dir :: cur_path)

  let cd new_path =
    cwd := change_path (String.split_on_char '/' new_path) !cwd
end

let test () =
  Shell.cd "/foo/bar" ;
  Shell.cd "../qux" ;
  Shell.current_working_directory () (* "/foo/qux" *)

The cwd reference is not thread-safe so we can’t use this module safely from multiple domains:

  • It’s not a good idea to switch the ref to a shared Atomic.t as we don’t want domains to randomly mess with the current path of another domain
  • We may not be able to break the API to require cd and current_working_directory to take an extra argument for the mutable state (which we could argue would have been a better design in the first place)
  • But we can use a Domain.DLS.key for the cwd reference and get reasonable semantics :slight_smile:
2 Likes

Yes, this seems to be the case – that you have no idea what Domain.DLS is doing and how it is supposed to be used. I will try to see if we can improve the Domain.DLS documentation to avoid the confusion. (I agree that it could be documented better.)

So is it fair to summarize DLS like this?

“DLS is a shim for backwards compatibility. It should not be used for new code. It allows using existing libraries that use global state in a concurrent program, such that the global state the library sees within each domain is local to that domain.”

I don’t really agree with “it should not be used for new code”. DLS is designed to store domain-local global mutable state. In general, it is better to avoid global mutable state, but I can see how new code could still make this choice for understandable reasons, and would naturally use DLS for this purpose.

In the standard library, DLS is used:

  • In Format, to store mutable state associated to the global values Format.std_formatter and Format.err_formatter respectively. There are a lot of discussions to be had about how one should ideally write concurrent programs that write to standard/error output, and maybe the choices made by Format could be improved, but deep down I think that it is reasonable for a library (past or future) to expose global system resources like “the standard output” as global state.
  • in Random, because we got used to a global-state API Random.int etc. with a global random generator state. If this was to be done again, I would just use Random.State as the main module and force people to pass their random state around explicitly. (One point against DLS for new code.)
  • in Hashtbl, we use random state inside the hashtable implementation as a supposed protection against conflict attacks. I think that it would be inconvenient to require users to pass explicitly their random state to initialize a hash table, and the idea of hiding it as global mutable state sounds very reasonable to me.

(On the other hand, passing random state around explicitly is necessary to get pixel-perfect reproductibility even on Hashtbl operations whose observable behavior unfortunately depends on the random state, such as the iterators. But I think this should be fixed instead by making all operations deterministic.)

2 Likes