Using thread_local memory in a C++ thread invoked from Multicore-OCaml

Hi folks,
Is there a way to use thread_local memory in a C++ thread invoked from Multicore-OCaml ?

thread.ml

external spawnThread_O : unit -> unit = "spawnThread_C"

let main () =
  let t1 =
    Domain.spawn (fun _ ->
        spawnThread_O ();
        spawnThread_O ())
  in
  Domain.join t1;
;;

main ()

thread_stubs.cpp

#include <bits/stdc++.h>
#include <thread>
#include <unistd.h>
extern "C"
{
#include <stdio.h>
#include <caml/mlvalues.h>
#include <caml/alloc.h>
#include <caml/memory.h>
#include <caml/fail.h>
};

using namespace std;

mutex log_lock;

struct Counter
{
    unsigned int c = 0;
    void increment() { ++c; }
    ~Counter()
    {
        std::cout << "Thread " << this_thread::get_id() << " was called "
                  << c << " times" << std::endl;
    }
};

thread_local Counter c;

void threadTask()
{
    c.increment();
    usleep(rand() % 1000);
}

extern "C" CAMLprim value spawnThread_C()
{

    srand(time(0));
    thread t = thread(threadTask);
    t.join();
    return Val_unit;
}

In the code snippets given above, I am expecting the following output:

Thread 140398209472256 was called 1 times
Thread 140398209472256 was called 2 times

However, I am getting the following output:

Thread 140398209472256 was called 1 times
Thread 140398209472256 was called 1 times

Is there a way to ensure the thread_local Counter c retains its value when spawnThread_O() is invoked again by the same domain ?

Thanks in advance.

Note that you are creating a brand new thread in spawnThread_C, so having the counter starts at 0 in threadTask is the expected behavior, since the counter is a thread-local object. And this has nothing to do with OCaml; you would get the exact same behavior with any other language that invokes spawnThread_C.

To share the same counter across all the threads created by the domain, you need the counter to be local to the domain, not to the created thread. For example, it might be done as follows:

thread_local Counter c;

void threadTask(Counter &c) {
    ...
}

extern "C" CAMLprim value spawnThread_C() {
    thread t = thread(threadTask, c);
    ...
}

Obviously, this is a rather poor solution, since the code now creates an unused counter per thread.

@silene Access to c would not be synchronised, though.

There is a notion of domain-local state (on the OCaml side) but you need to hold the domain lock to access it.

Definitely. But that is not an issue specific to my suggestion. That will be true of any solution, since c is supposed to be shared between several system threads.

That depends how you understand the original question.

@hemendra I think your question has some confusion about the difference between domains and threads. A domain is a collection of threads synchronised through a domain lock (only one thread runs at a time). This lets them access a domain-local storage (DLS) without additional synchronisation. The spawned threads are not part of the domain since they run without holding the domain lock. Is this what you intend?

From this, one could think about storing the counter in the DLS, but as an atomic integer so it can be accessed concurrently. But since the DLS is on the OCaml side, it is hard to access it in a safe way from a C thread that does not hold a domain lock for at least some domain. One solution is to malloc a pointer to an atomic_int and store it in the DLS (as an out of heap pointer, e.g. with tagging).

Another solution (I think the simplest), since the number of domains is meant to be small, is to allocate an array atomic_int[Max_domains] (cf. caml/config.h) and pass the value of Caml_state->id to the threads to serve as an index.

@silene Thinking about it, the solution you propose might work by making Counter.c atomic (with the drawback that you mention).

@silene @gadmm Thanks for your suggestions.