Most efficient way to generate process-unique ID's?

I’m writing some code where I have a hash table of arbitrary objects but I need a way to refer to them individually (so they can remove themselves from the hash table). Right now I’m using UUID’s to do this, but I’m wondering if there’s a better way to do this?

I’m tempted to allocate a random one-byte object and then use Obj.tag for this, but I’m not sure how to force an allocation or how safe that would be.

Is there a better way to do this?

One way is to use Oo.id (object end).

Cheers,
Nicolás

2 Likes

Is there anything wrong with making your own counter?

let create_id =
  let n = ref 0 in
  fun () ->
    let id = !n in
    if id = -1 then
      failwith "Counter reached max value.";
    incr n;
    id
1 Like

The counter version is not ideal because you could run out, but I guess with a 64 bit int it doesn’t really matter in practice.

IIRC @nojb’s technique is thread safe.

1 Like

Shouldn’t the ref be thread-safe too? After all, the assumption is that only one thread at a time can modify it…

The fact that is it thread safe is an implementation detail – no allocation occurs between the read and the increment so no thread redeschudling can happen and the block happens “atomically”.

This however is a property of the current implementation of the runtime system not a guarantee provided by the language.

With the current implementation, there is no allocation between the test
and the incr n in the simple reference, so I think it’s also
threadsafe? (In a fragile way, arguably.)

Heh, and to follow down this thread a few more klicks … once you have real SMP threading, this becomes the famous (in transaction-processing) “serial number problem”. In the presence of high “client” (the threads wanting to get new serial numbers / unique-ids, you end up allocating chunks of ids from the central counter and each thread maintains a cache of allocated-but-unused ids, from which it gets new ones. Then of course when/ if a thread dies, you need a way of recovering those unused IDs, so you can reuse them (assuming you want “no gaps” in the ID space). Of course, if you’re OK with gaps, you can make the ID be a pair of an ID for each client/thread, and the value of a counter for each thread.

Loads of fun!

Hopefully when multicore arrives, it’ll provide support for the basic atomic types? That should limit the overhead of incr_and_get when one wants to get a new ID.

Uh, I guess. Having lived thru this in Java (and being partially responsible for the debacle), I’d say that:

  1. having good SMP/multicore support in a language runtime is a double-edged sword.
  2. On the one hand, you don’t need to organize your system with multiple processes, in order to fully-exploit hardware
  3. On the other hand, being able to use multiple cores (and sometimes sockets) in a single process brings its own troubles, for instance heavier SMP contention for hardware.
  4. And programmers who would never have had to know about (for instance) high-concurrency solutions to the serial number problem, end up having to learn about them, because instead of it being impossible to end up needing such a thing, you can end up needing it inadvertently because you used high-concurrency without realizing its cost.

Concretely, you’re right that adding some low-level primitives to allow manipulating atomic values efficiently by mapping down to the hardware atomic-ops, is probably going to be necessary. But then programmers will need to understand those things, and they’re not trivial to understand – when to use 'em and when not to, what will and will not be affected. And then, well, people end up over-using the things, because that’s what people do, and you’ll end up needing lock-free datastructures, and cache-insensitive algorithms, and … it’s a Pandora’s box.

I’m not saying that these things aren’t useful: they are, they are. But for MOST code, even most high-concurrency transaction-processing code, the best way to access SMP parallelism is via a kernel that does the stuff, and with most of the code being able to pretend it’s living in single-threaded environments.

I guess what I’m saying is: sure, SMP/multicore is great stuff. Try not to use it, if you don’t absolutely gotta. B/c you’re gonna get bit.

BTW, something else: there’s a difference between SMP and multicore. The former assumes symmetric access to memory, where the latter does not. And for many situations, in order to properly exploit multicore, you MUST have memory-bank-location-aware programs. And the most common way of doing this is to run separate processes on each socket/memory-bank. I’ve seen applications in telecoms (SIP servers) where it was necessary to do this precise thing with the JVM (hence, already about as SMP-friendly as you can get in a GCed language) in order to fully-exploit multi-socket hardware on blades. I mean, this wasn’t even on big-ass server motherboards: this was on -blades-, which typically don’t have a lotta sockets.

1 Like

Yes, this can be a problem on 32-bit platforms (31-bit ocaml ints would deliver about 2 billion IDs before failing). If that’s an issue, using int64 instead of int should work due to 2^64 being big enough in practice.

Correct. Although I would add for our less experienced audience that not much is safe about using threads in the first place.

I think the int64 version makes sense for now, since I think it would be easier for other people to understand, but might switch to the Oo.id version when OCaml has multithreading.

I wonder how Oo.id is implemented. That is, I wonder how the value of %field1 of an object is filled-in.

For what it’s worth, here is what I see in 4.10.0. In obj.c there is this:

static value oo_last_id = Val_int(0);

CAMLprim value caml_set_oo_id (value obj) {
  Field(obj, 1) = oo_last_id;
  oo_last_id += 2;
  return obj;
}

And in camlinternalOO.ml you have things like this:

external set_id: 'a -> 'a = "caml_set_oo_id" [@@noalloc]

let create_object table =
  let obj = Obj.new_block Obj.object_tag table.size in
  Obj.set_field obj 0 (Obj.repr table.methods);
  Obj.obj (set_id obj)
1 Like

So, a counter. One presumes that’ll get mutex-protected with the multicore runtime. I wonder if there’ll be replication of the counter onto each thread, with thread-id added in somehow …

You can actually see how it’s done on the multicore branch already:

Hopefully when multicore arrives, it’ll provide support for the basic atomic types?

An Atomic standard library module was recently added to standard OCaml and will be in OCaml 4.12: ocaml/stdlib/atomic.mli at trunk · ocaml/ocaml · GitHub . It offers operations such as atomic increment or compare-and-swap. This is the same interface as the Atomic module from Multicore OCaml, only the implementations differ. This way users can start coding against this library even before Multicore OCaml is fully merged.

5 Likes

Wow, this is new to me: type !'a t

Is the ! a new marker? Does it indicate mutability or something like that?