Most efficient way to generate process-unique ID's?

BrendanLong · August 17, 2020, 6:26pm

I’m writing some code where I have a hash table of arbitrary objects but I need a way to refer to them individually (so they can remove themselves from the hash table). Right now I’m using UUID’s to do this, but I’m wondering if there’s a better way to do this?

I’m tempted to allocate a random one-byte object and then use Obj.tag for this, but I’m not sure how to force an allocation or how safe that would be.

Is there a better way to do this?

nojb · August 17, 2020, 6:51pm

One way is to use Oo.id (object end).

Cheers,
Nicolás

mjambon · August 17, 2020, 7:19pm

Is there anything wrong with making your own counter?

let create_id =
  let n = ref 0 in
  fun () ->
    let id = !n in
    if id = -1 then
      failwith "Counter reached max value.";
    incr n;
    id

BrendanLong · August 17, 2020, 7:29pm

The counter version is not ideal because you could run out, but I guess with a 64 bit int it doesn’t really matter in practice.

dbuenzli · August 17, 2020, 7:43pm

IIRC @nojb’s technique is thread safe.

yawaramin · August 17, 2020, 7:47pm

Shouldn’t the ref be thread-safe too? After all, the assumption is that only one thread at a time can modify it…

dbuenzli · August 17, 2020, 7:53pm

The fact that is it thread safe is an implementation detail – no allocation occurs between the read and the increment so no thread redeschudling can happen and the block happens “atomically”.

This however is a property of the current implementation of the runtime system not a guarantee provided by the language.

c-cube · August 17, 2020, 7:57pm

With the current implementation, there is no allocation between the test
and the incr n in the simple reference, so I think it’s also
threadsafe? (In a fragile way, arguably.)

Chet_Murthy · August 17, 2020, 8:02pm

Heh, and to follow down this thread a few more klicks … once you have real SMP threading, this becomes the famous (in transaction-processing) “serial number problem”. In the presence of high “client” (the threads wanting to get new serial numbers / unique-ids, you end up allocating chunks of ids from the central counter and each thread maintains a cache of allocated-but-unused ids, from which it gets new ones. Then of course when/ if a thread dies, you need a way of recovering those unused IDs, so you can reuse them (assuming you want “no gaps” in the ID space). Of course, if you’re OK with gaps, you can make the ID be a pair of an ID for each client/thread, and the value of a counter for each thread.

Loads of fun!

c-cube · August 17, 2020, 8:11pm

Hopefully when multicore arrives, it’ll provide support for the basic atomic types? That should limit the overhead of incr_and_get when one wants to get a new ID.

Chet_Murthy · August 17, 2020, 8:31pm

Uh, I guess. Having lived thru this in Java (and being partially responsible for the debacle), I’d say that:

having good SMP/multicore support in a language runtime is a double-edged sword.
On the one hand, you don’t need to organize your system with multiple processes, in order to fully-exploit hardware
On the other hand, being able to use multiple cores (and sometimes sockets) in a single process brings its own troubles, for instance heavier SMP contention for hardware.
And programmers who would never have had to know about (for instance) high-concurrency solutions to the serial number problem, end up having to learn about them, because instead of it being impossible to end up needing such a thing, you can end up needing it inadvertently because you used high-concurrency without realizing its cost.

Concretely, you’re right that adding some low-level primitives to allow manipulating atomic values efficiently by mapping down to the hardware atomic-ops, is probably going to be necessary. But then programmers will need to understand those things, and they’re not trivial to understand – when to use 'em and when not to, what will and will not be affected. And then, well, people end up over-using the things, because that’s what people do, and you’ll end up needing lock-free datastructures, and cache-insensitive algorithms, and … it’s a Pandora’s box.

I’m not saying that these things aren’t useful: they are, they are. But for MOST code, even most high-concurrency transaction-processing code, the best way to access SMP parallelism is via a kernel that does the stuff, and with most of the code being able to pretend it’s living in single-threaded environments.

I guess what I’m saying is: sure, SMP/multicore is great stuff. Try not to use it, if you don’t absolutely gotta. B/c you’re gonna get bit.

BTW, something else: there’s a difference between SMP and multicore. The former assumes symmetric access to memory, where the latter does not. And for many situations, in order to properly exploit multicore, you MUST have memory-bank-location-aware programs. And the most common way of doing this is to run separate processes on each socket/memory-bank. I’ve seen applications in telecoms (SIP servers) where it was necessary to do this precise thing with the JVM (hence, already about as SMP-friendly as you can get in a GCed language) in order to fully-exploit multi-socket hardware on blades. I mean, this wasn’t even on big-ass server motherboards: this was on -blades-, which typically don’t have a lotta sockets.

mjambon · August 17, 2020, 8:50pm

Yes, this can be a problem on 32-bit platforms (31-bit ocaml ints would deliver about 2 billion IDs before failing). If that’s an issue, using int64 instead of int should work due to 2^64 being big enough in practice.

mjambon · August 17, 2020, 9:09pm

Correct. Although I would add for our less experienced audience that not much is safe about using threads in the first place.

BrendanLong · August 18, 2020, 12:41am

I think the int64 version makes sense for now, since I think it would be easier for other people to understand, but might switch to the Oo.id version when OCaml has multithreading.

Chet_Murthy · August 18, 2020, 1:30am

I wonder how Oo.id is implemented. That is, I wonder how the value of %field1 of an object is filled-in.

jeffsco · August 18, 2020, 1:40am

For what it’s worth, here is what I see in 4.10.0. In obj.c there is this:

static value oo_last_id = Val_int(0);

CAMLprim value caml_set_oo_id (value obj) {
  Field(obj, 1) = oo_last_id;
  oo_last_id += 2;
  return obj;
}

And in camlinternalOO.ml you have things like this:

external set_id: 'a -> 'a = "caml_set_oo_id" [@@noalloc]

let create_object table =
  let obj = Obj.new_block Obj.object_tag table.size in
  Obj.set_field obj 0 (Obj.repr table.methods);
  Obj.obj (set_id obj)

Chet_Murthy · August 18, 2020, 1:48am

So, a counter. One presumes that’ll get mutex-protected with the multicore runtime. I wonder if there’ll be replication of the counter onto each thread, with thread-id added in somehow …

vlaviron · August 18, 2020, 8:38am

You can actually see how it’s done on the multicore branch already:

github.com

ocaml-multicore/ocaml-multicore/blob/5ced1fa750ac60fc0f245623320872fde1cf5634/runtime/obj.c#L244


  }
  /* return 0 if tag is not there */
  return (tag == Field_imm(meths,li) ? Field_imm (meths, li-1) : 0);
}

/* Allocate OO ids in chunks, to avoid contention */
#define Id_chunk 1024

static atomic_uintnat oo_next_id;

CAMLprim value caml_fresh_oo_id (value v) {
  if (Caml_state->oo_next_id_local % Id_chunk == 0) {
    Caml_state->oo_next_id_local =
      atomic_fetch_add(&oo_next_id, Id_chunk);
  }
  v = Val_long(Caml_state->oo_next_id_local++);
  return v;
}

CAMLprim value caml_set_oo_id (value obj) {
  value v = Val_unit;

xavierleroy · August 18, 2020, 9:10am

Hopefully when multicore arrives, it’ll provide support for the basic atomic types?

An Atomic standard library module was recently added to standard OCaml and will be in OCaml 4.12: ocaml/stdlib/atomic.mli at trunk · ocaml/ocaml · GitHub . It offers operations such as atomic increment or compare-and-swap. This is the same interface as the Atomic module from Multicore OCaml, only the implementations differ. This way users can start coding against this library even before Multicore OCaml is fully merged.

yawaramin · August 18, 2020, 1:18pm

Wow, this is new to me: type !'a t

Is the ! a new marker? Does it indicate mutability or something like that?

Topic		Replies	Views
Unique compile time id Learning	7	706	July 7, 2019
Unique identifier for function object Learning	3	682	December 29, 2020
OCaml port of JS CUID library Community announce	0	1104	April 11, 2018
[ANN] Uuidm 0.9.9 Community announce	2	158	September 26, 2024
Hashids: generate short unique ids from integers Community announce	0	1697	September 14, 2017

Most efficient way to generate process-unique ID's?

Related topics