Actual Performance Costs of OOP Objects

Yes, I consider this split to be desirable anyway. If I’m writing an HTTP request parser, I want to accept many different sources of data (e.g. TCP socket, TLS decoder), but I don’t want to support multiple different user-provided buffering implementations.

If your workload is one integer addition, then yes you should avoid objects.

2 Likes

I didn’t think it would make an impact, and indeed it doesn’t (on my machine at least).

This is fascinating! May I ask what architecture your laptop has?

If your source is a buffer (e.g., a byte array), you don’t necessarily want another buffer on top of that. Leaving it up to the implementation makes sense – in an ideal OOP hierarchy, a user could choose which elements they want. But in this case, if the user chooses to use the direct OOP API (rather than buffering) they could presumably be hit with a performance hit. This is not very reasonable IMO. Having to buffer against language features is… problematic to say the least.

On mine, the difference is huge:

  • record +1: 1.56s
  • record +y: 1.96s

Intel i7-8650U, so a Skylake architecture.

I have a different result (Linux - ryzen 5950x) where the overhead is bigger:

ref: <0.5s
record: 1.08s
object: 3.5s 

That looks like the kind of results I would expect with flambda. Can you confirm whether it is enabled or not ?
Comparisons using flambda are not very useful here, as this test presents a lot of optimisation opportunities that wouldn’t occur in a real program.

There’s a long thread of discussion (and a little acrimony, which seems out-of-place) about whether objects are an afterthought or not. I thought I’d weigh in, since I spent a number of years writing a large caml-light codebase, and then went off to spend well over a decade in the depths of Java, JVMs, and Java-based commercial products.

There’s a saying that a good language makes the perceived cost of using a feature commensurate with the overall undesirability of having people use it. And a language that doesn’t make those two things line up, is promising its users a world of trouble. For instance, Java makes concurrency trivial to use (“new Thread()” and “synchronized”, wheeee!) and yeah, users get into a world of trouble. I think the way that OCaml has made objects available, but not too attractive, is about the right balance.

Your argument that “objects are an afterthought in OCaml, mostly there for a research paper” might be correct. But I’ll note that in the O-O world, the use of subtyping/inheritance (== “O-O”) has decreased pretty monotonically over time: in C++ with the rise of templates and large template libraries like STL and Boost, most programmers rarely need to construct subclass hierarchies, and more and more, O-O is a tool for building those templates.

I remember when O-O arrived in OCaml; I even used it for a couple of moderate-sized projects (a big one, a modular packet-sniffer/stream-reassembler/performance-analyzer in 2001), before deciding that it was more trouble than it was worth. And that was when I was well-within my decade of commercial Java systems hacking.

What am I trying to say? It isn’t obvious that spending a massive amount of energy on improving O-O would be the best bang-for-the-buck for OCaml overall. I’m not even sure it’s wise to encourage programmers to use O-O when other paradigms suffice.

6 Likes

Just used dune with default ocam 4.13

Here’s one aspect which hasn’t been mentioned in this thread yet:

This came up in “the real world” for me, where I help maintain a fairly large binary that had lots of classes linked into it.

1 Like

I know @bluddy started this discussion about the performance of OOP, but having followed the discussion leading up to this which was about a more expressive I/O layer in Ocaml, and OOP is a possible answer to that, but not necessarily the only solution.

To focus on the I/O problem it seems to me:

  1. If we used OOP for this, in many cases, the overhead of a method call would be completely dwarfed by the cost of performing I/O.
  2. Most common uses of I/O pull out, or put in, fairly large chunks of data from the channel to avoid performing too many calls to that layer, even if buffered and even if the underlying structure is an in-memory string, so the cost of a method call, again, seems like it is unlikely to be a large overall cost in the program.
  3. There are various programmer sentiments that, rightly or wrongly, will impact one’s reaction to the I/O layer of ocaml being objects. Personally, I have a possibly irrational distaste for objects (in generally I don’t like sub-typing as I believe it makes programs harder to understand), so module-based implementation “feels” better to me even if it’s using objects underneath. I have my own I/O library that I use for async code that is implemented similar to what @talex5 has done except I use a record for methods underneath a module handling the buffering.
  4. If the primary counter to doing an OO low-level layer and a Buffered module/other modules above that is in the case of an in-memory representation we end up paying a double-copy cost, that might be a reasonable cost to pay for an interface people might like a bit more.
  5. I’m not sure how much any of this matters given a lot of Ocaml code is in some async code, which have their own I/O primitives which, I don’t think, would be workable with this interface given the types would be different.

One final thought: I don’t know the cost for looking up a function with first-class modules. But this strikes me as maybe a good compromise in that with first-class modules, you always have an escape hatch if it’s important for performance. That is, imagine you had a Buffered_stream module that was backed by some kind of I/O layer built on a dispatch table and it turns out you’re mostly doing I/O in an in-memory representation and that is too slow. Well, for that performance sensitive code you could pass in a module with your reduced-copy implementation that matches Buffered_stream for those specific cases and pass in the standard Buffered_stream implementation for non-in-memory situations. Yes, it’s more verbose, but at the same time the amount of code that needs that is probably quite low so maybe that is a fair balance to be made.

There is nothing magical about first-class module. OCaml needs to create an adapter module of the fly if there is a discrepancy between the interface of the original module and the interface of the expected one. For example, the micro-benchmark above can be adapted as follows. And, as expected (one allocation per call), it is much much slower than the object-oriented version (at least on my laptop).

module type A = sig
  val f: int -> unit
  val get: unit -> int
end

module B = struct
  let x = ref 0
  let get () = !x
  let f y = x := !x + y
end

let call (module M : A) y =
  M.f y

let () =
  let module M = B in
  for i = 0 to 1000000000 do
    call (module M) 1
  done;
  Printf.printf "result = %d\n" (B.get ())
3 Likes

I don’t have my home laptop with me to run this. Could you add the comparison times?

  • plain: 1.33s
  • record: 1.96s
  • object: 2.64s
  • module: 4.27s

You should get slightly better performance if you define call as:

let call (m : (module A)) y =
  let module M = (val m) in
  M.f y

or

let call y (module M) =
  M.f y

The reason is that (module M) in a function argument prevents un-currying the rest of the arguments, leading to extra closure allocations and function calls. These extra operations can be optimised away by Flambda with -O2 or -O3, but the non-flambda compiler will be slower.
In fact, you would probably have a fairer comparison if you had written your code as:

for i = 0 to 1000000000 do
  let module M_A : A = M in
  M_A.f 1
done;

The call function introduces overhead that is not present in the other versions.

I agree. But unfortunately, OCaml is sufficiently smart to inline the call in that case. That is why I used the call proxy.

Indeed, the performances are better. And since OCaml inlines the call to call (but not the one to M.f), this better reflects the actual cost of first-class modules.

  • plain: 1.33s
  • record: 1.96s
  • object: 2.64s
  • module: 3.05s

@vlaviron The code is written with let module M = B in ... call (module M); if we wrote call (module B) ... instead, I would have naively expected the (module B) coercion to be recognized as a constant and be allocated statically. But this doesn’t work, and I think it is because the module field reads are treated as mutable loads, probably due to the compiler scheme for recursive modules.

Intuitively we should be able to do better here, because (1) B is not recursive, and (2) in the general case, all code that is not lexically inside the mutual-recursion nest of B will never be able to observe mutation, and should be able to treat those loads as immutable. (It’s not exactly an “initializing writes” scenario, but B could be considered redefined as an immutable structure after its recursive declaration.) What do you think?

(I haven’t tried to dig into the actual code, so this is mostly (informed) speculation)

I believe that recursive modules are unrelated; when you reach Lambda, all field reads are mutable (you might remember that I’ve tried to push for changes to that in the past). The only information that the middle-end can recover is which allocations are immutable or not, and indeed a module block allocation is always immutable. In particular, with Flambda I would expect the coercion itself to disappear.
Note that let module M = B in ... call (module M), without any coercion, should be equivalent to ... call (module B). There are some subtleties if the module definition is at toplevel, but that isn’t the case here.
So my guess for the reason why the module isn’t allocated statically is that Closure has limits on the shape of constants that it can statically allocate (it’s possible that it can’t statically allocate blocks containing closures except in fields of the global module, or something like that).

That is clearly an unfair comparison :slightly_smiling_face: akin to creating a new object in every iteration of the loop. Normally one would unpack the module outside the loop and call its functions in the loop, which amounts to a record access.

4 Likes

The internals of liquidsoap rely on objects for its sources. Historically, I believe it was in order to be able to implement inheritance patterns from a general source API to specialized operators.

For us, performance bottleneck is not in the OO side so we’re not worried about it too much. However, I do believe that, in general, the object paradigm is indeed phasing out in programming languages in general. It feels like an antiquated concept which is not particularly well suited for static analysis, manual debugging and code reasoning in general.

In OCaml, modules, especially now with first-class modules, provide much better tools for most of the task you could use objects for. In this context, I am not much worried about the lack of development of objects. I don’t see it as an important feature of the language.

2 Likes