Actual Performance Costs of OOP Objects

I’d like to understand what the actual performance implications are of OCaml objects. Everyone likes to denigrate them, but what are their performance characteristics really like (and why)?

The implementation of objects is (necessarily) quite heavy - see Objects in Interfacing with C in the manual. Method dispatch is slower and objects themselves take more space. Neither of these are killer problems, but it does mean that often it’s not worth using objects if other parts of the language can be used to achieve the same end.

4 Likes

It’s slower than accessing a record field because the offset of a particular method is not known statically (it depends on how the object was built). There is some caching done, but not as aggressive/optimized than in OO-heavy languages.

It’s hard to answer this question in isolation. It depends a lot on the specific use-case (inheritance or not, “closed” object or not, etc.), and on how critical the method-lookup performance are in the broader application. The performance overhead compared to record field lookup is large (I’d guess at least 10x in many cases), but in most codebases it’s only a neglectible part of the runtime. (I think in particular in the context of Eio, performance was not really a relevant argument).

For more precise information, you should run benchmarks :slight_smile:

My rule of thumb is that if objects are used heavily, you are going to see some cost (to be weighted against the benefits of this programming style, in particular the extensibility/openness of row types), otherwise there is probably nothing to worry about (just look at the profiles).

For example: the visitors package uses objects to implement AST traversals. If visitors-derived traversal on large objects are the critical path of your application, you will probably observe object performance. If you just use it in passing, it’s fine. For example if we used visitors in the compiler, it would probably be fine, because structural mapping/folding is only a fraction of the work we do. On the other side, the Gilian project had a large object-related performance regressions because it uses this kind of objectful AST traversals in performanc-critical paths. (To implement formula substitutions in a symbolic analyser, if I remember correctly.)

3 Likes

I’d like to start by mentioning the one point where objects outperform everything else: subtyping is free. So if you have some code that by design relies on some version of subtyping, then objects could (in theory) outperform versions based on manual casting or first class modules.

But objects are very expensive to allocate (much more than the amount of memory they take would imply), and method calls are strictly more expensive than a field load and a function call.
And their worst property in my opinion is that they’re completely opaque to the optimiser. Simple things like (object method m = 42 end)#m will be compiled to a bunch of CamlinternalOO functions calls, then an indirect call to a function that will eventually return 42. This may look like a silly example, but this kind of optimisation is very important when your code uses lots of small, single purpose functions that get inlined by the compiler. Adding the fact that method calls can never be inlined anyway, this kind of style (lots of small functions) becomes very expensive in an objects-based project compared to all other alternatives.

2 Likes

The main historical bottleneck in the compiler is the type-checker, which indeed wouldn’t be badly impacted by using objects for traversal. But many other parts of the compiler are basically a single pass on the whole term, doing very simple operations (typically, all the passes in Simplif), so I think that if we used objects for traversal of all our terms we would get a very noticeable slowdown (my random guess based on my work on Selectgen is that we would get at least 50% slowdown on those parts, which when taking into account the whole compilation process would lead to somewhere between 10% and 25% slowdown. It’s just a guess, not a benchmark, though).

1 Like

I can’t stress this enough. I think that focusing on the fact that objects are slower than records and functions completely misses the point. Even if they were 10x or 20x slower, it may not matter at all for many workloads.

If objects are the most natural way to express your code (ie using subtyping, open recursion, inheritance, etc), I think you should not hesitate to use them. Only later, if their use is identified as a performance bottleneck, alternatives should be considered (like with any other code).

There are some good reasons not to abuse objects: eg their typing theory is more complex than the rest of the core language, but performance is not one of them, in my view.

Cheers,
Nicolas

1 Like

So the takeaway I get here is if you do have a critical path in your code that uses objects, you’re better off recreating them with records-containing-functions and passing in the record as this. That’s quite unfortunate, and it makes me wish objects were implemented the standard way rather than the way that they were in OCaml.

Regarding the makeshift solution, would it make sense to have the functions in the record hold this in their closure via partial application?

(And yes, I get that the performance impact may not be that big in some cases, but I’m talking about cases where we know for certain that we’re in the critical path).

1 Like

Another question: I understand that there’s some caching that happens when you call a method, so that next time when calling from that call site, the method can be resolved faster. How can this work in code that possibly calls different object types? Does the call site compare the object id first before using the cache?

Slower than what? More space than what? Are we comparing using objects to calling functions in a module known at compile-time, to using a record with one field for each method, or to using a GADT, for example?

The link says:

The first field of the block refers to the object’s class and associated method suite, in a format that cannot easily be exploited from C. The second field contains a unique object ID, used for comparisons. The remaining fields of the object contain the values of the instance variables of the object.

This is exactly what you would expect (except possibly the ID). The benchmark I gave earlier compared against a GADT:

type source = Source : (module SOURCE with type t = 'a) * 'a -> source

So here, the first field is the method table (module) and the second is a pointer to the record with the instance variables. We’ve lost the ID field, but gained an extra block and an extra indirection.

But method lookup is slower (a pre-computed hash rather than a record index).

There are some other benchmarks at GitHub - talex5/flow-tests: Just for testing, and there are also some notes at eio/doc/rationale.md at main · ocaml-multicore/eio · GitHub.

My thoughts are:

  • Avoiding objects is a benefit for speed-critical code, especially if you know the target statically.
  • If you need sub-types, objects will be a good choice.
  • For most uses, the overhead won’t matter compared to whatever other processing you’re doing.
4 Likes

Based on your rationale.md file, it looks like we got to this issue dealing with the same problem: OCaml lacks a good way to deal with steams of data. I’m currently dealing with this on a personal project, and it’s quite an annoyance. C++ has streams, which are subclassed into FileStreams, OutputStreams, InputStreams StringStream and what not (this isn’t meant to be 100% accurate, but you get the idea). OCaml completely lacks a notion of a data stream (I guess you call it Flow) and abstracting over it. The OOP model is a perfect fit here, as we need an easily extendible type.

So am I to understand that you used OOP Flows in eio?

Each object comes with its own map from method labels to closures. The map is implemented using an array of two-word cells, one word for the key and one word for the values. Keys are ordered, so looking up a given method from the associated label is a dichotomic search.
For each method call in the generated code, a global cache entry is generated that will hold the last index found by dichotomic search at this particular call. So the next dichotomic search can be skipped if the index from the cache points to the correct label. This happens fairly often if you don’t have too many different classes/raw objects implementing the same signature. But that’s also all the cache saves you: a few iterations of a short loop.

2 Likes

Yes, flows are objects. But typically you then wrap a flow in a buffered reader, which isn’t an object. So reading a character or line doesn’t involve a method call, unless the buffer is empty and it needs to refill it (see Buffering and Parsing).

Also, we sometimes optimise away the method calls. For example, you can ask to copy from any source flow to any sink, but if the source and sink are both wrapping Unix file descriptors then it will automatically extract them and do the bulk of the copy directly.

1 Like

I see. So the buffering (which would normally be optional, and part of the flow) is externalized and serves as a method to avoid many method calls. Totally understand why you did it, but it’s kinda sad that we have to step around a half-baked language feature. Presumably there’s a lot of optimization OCaml could have done here, including adding a strict object option to the language that would only allow objects if they came from a class, thus allowing proper compilation methods.

Did you by any chance measure the performance impact without buffering? In theory, repeat calls to the same method shouldn’t be bad, right?

As a side comment, it seems like this Flow business should be spun out to another library.

Thank you! I’ve been seeking a proper explanation for a long time. However, I don’t think this is a minor cost that’s saved when considering that this short loop is an integral part of method dispatch. We’re getting to python-level costs here.

1 Like

Just to test things out myself, I compared 3 versions of a loop:

(* plain *)
let x = ref 0;;

for i = 0 to 1000000000 do
  x := !x + 1
done;;
Printf.printf "result = %d\n" !x 

(* OOP *)
let x = 
  object (self)
    val mutable x = 0

    method add y =
      x <- y + 1

    method print =
      Printf.printf "result = %d\n"
end;;

for i = 0 to 1000000000 do
  x#add 1
done;;
x#print;;

type r = {
  mutable x: int; f: r -> unit;
};;

(* record-based *)
let v = {
  x=0;
  f = fun r -> r.x <- r.x + 1;
};;

for i = 0 to 1000000000 do
  v.f v
done;;

Printf.printf "result = %d\n" v.x

All files were compiled with ocamlopt, no flambda.

Times:
plain version: 1.95s
record version: 2.2s
oop version: 5.5s

So yes, it definitely seems like it’s worth emulating objects with records in high performance regions of the code. And no, method caching doesn’t really help.

That’s a depressingly low bar for a compiled language…

You’ll probably want to use (Sys.opaque_identity v).f v, if you want it closer to the performance you’ll get in real code. Otherwise the function might get inlined.

3 Likes

I doubt it does without flambda…? But I’ll try.

I just checked and there’s no impact without flambda.

I just realized I had a ‘bug’ in my object version as I wasn’t actually incrementing self.x. If I change the mistaken line to x <- x + y, we go up to 6.3s for the object-based version!!!

I never worked on objects directly myself but I still find it a bit vexing to see them described as a “half-baked language feature”.

You assume that the design here is related to the cost of method calls, but we don’t know that. From a distance it looks like @talex5 chose a two-level design, where both layers solve different problems, and objects seem to be useful for one layer (some convenience around two_way I guess; but I would guess that flows could also be made without object types if one wanted) but not the other.

So yes, it definitely seems like it’s worth emulating objects with records in high performance regions of the code. And no, method caching doesn’t really help.

Wait, your benchmark shows that even in a super-tight loop, the overhead of method dispatch is at worst 2.5x, and you react like it’s a huge cost? This is not a huge difference! It means that as soon as your code will do something 100x slower than incrementing an integer (which is: most other things), adding objects in this way will be a 2.5% slowdown, which is not noticeable for most sections of the code – only those, again, that are performance-critical.
(Note: you claim that caching doesn’t help here, but in fact probably does.)

(But: I’m not sure that your baseline is correct, it would probably be noticeably faster with a local reference instead of a global variable.)

What’s the next feature to be criticized because it’s slower than not doing anything? We can show you micro-benchmarks where exceptions incur a 2.5X overhead just as well, should we get rid of exceptions? Effects, lots of overhead, gone? Functors should never be used! int option has python-level costs compared to encoding invalid values as -1!

7 Likes

That’s true. That’s how I read @talex5’s explanation here.

Yes, I’ve been focusing on performance-critical areas. I said as much.

The question is what are the alternatives. If objects have a record-based encoding that works almost as well and is 2.5x faster (and I agree that it’s just a microbenchmark, but it reflects a basic issue), then yes, I’ll just use that, thank you very much. If effects have a serious performance overhead compared to the alternative, which is monadic code (and I don’t believe they do – quite the contrary, in fact), then yes, we should generally avoid them. Functors allow for language features that cannot be expressed differently, but I would still expect their performance cost to be minimized or they aren’t worth using in production code. The same applies to option – obviously it provides type safety for which there is no alternative, but it’s also true that in many cases that extra allocation is painful, and the language should try to minimize the cost of using this essential feature.

Bottom line: OCaml is a compiled language and it competes in a space where performance matters. We’re not python. Our competitors are constantly improving their performance, and we need to do the same.

AFAICT the object model is half-baked, in the sense that performance issues were not sufficiently considered. To me, it seems like a feature that was put into the language in order to publish a research paper, rather than carrying out the obvious move of implementing a high performance OOP system as is available in other compiled languages. And the evidence is that the ecosystem avoids this feature like the plague.