Actual Performance Costs of OOP Objects

gasche · February 24, 2022, 10:47am

@vlaviron The code is written with let module M = B in ... call (module M); if we wrote call (module B) ... instead, I would have naively expected the (module B) coercion to be recognized as a constant and be allocated statically. But this doesn’t work, and I think it is because the module field reads are treated as mutable loads, probably due to the compiler scheme for recursive modules.

Intuitively we should be able to do better here, because (1) B is not recursive, and (2) in the general case, all code that is not lexically inside the mutual-recursion nest of B will never be able to observe mutation, and should be able to treat those loads as immutable. (It’s not exactly an “initializing writes” scenario, but B could be considered redefined as an immutable structure after its recursive declaration.) What do you think?

vlaviron · February 24, 2022, 11:37am

(I haven’t tried to dig into the actual code, so this is mostly (informed) speculation)

I believe that recursive modules are unrelated; when you reach Lambda, all field reads are mutable (you might remember that I’ve tried to push for changes to that in the past). The only information that the middle-end can recover is which allocations are immutable or not, and indeed a module block allocation is always immutable. In particular, with Flambda I would expect the coercion itself to disappear.
Note that let module M = B in ... call (module M), without any coercion, should be equivalent to ... call (module B). There are some subtleties if the module definition is at toplevel, but that isn’t the case here.
So my guess for the reason why the module isn’t allocated statically is that Closure has limits on the shape of constants that it can statically allocate (it’s possible that it can’t statically allocate blocks containing closures except in fields of the global module, or something like that).

c-cube · February 24, 2022, 2:30pm

That is clearly an unfair comparison akin to creating a new object in every iteration of the loop. Normally one would unpack the module outside the loop and call its functions in the loop, which amounts to a record access.

toots · March 2, 2022, 3:56pm

The internals of liquidsoap rely on objects for its sources. Historically, I believe it was in order to be able to implement inheritance patterns from a general source API to specialized operators.

For us, performance bottleneck is not in the OO side so we’re not worried about it too much. However, I do believe that, in general, the object paradigm is indeed phasing out in programming languages in general. It feels like an antiquated concept which is not particularly well suited for static analysis, manual debugging and code reasoning in general.

In OCaml, modules, especially now with first-class modules, provide much better tools for most of the task you could use objects for. In this context, I am not much worried about the lack of development of objects. I don’t see it as an important feature of the language.

s.t.s · October 1, 2024, 11:01pm

I wonder if the objects are still used in that software?

There has been a rather interesting discussion about object performance on this forum very recently.

s.t.s · October 2, 2024, 1:49am

The essential question is whether there exist any principal, theoretical limitations on the possible performance of objects. Is there some literature available on the subject?

I feel that there should not be principal limitations here, since objects should be representable in many cases through the ordinary language constructs. Though, that is only a guess.

Hmm… and if we start considering the memory bandwidth, and how hopelessly multicore the current CPU designs have become — and the fact that the minimum power dissipation threshold is pretty high anyway in the current designs … — then we could arrive to a conclusion that we might as well increase the core count for a lot of problems where we started to use objects in performance-critical sections. There are relatively few problems, where latency at the micro level starts to matter, and something tells me those could be solved in a somewhat more static manner anyway, through the records.

The objects there are to add dynamicity to the expression. Currently, going a slightly more static route of records is more akin to premature optimization (which is “a root of all evil”, as some programmer folklore implies).

Curiously, the results on pretty old Phenom II are

result = 1000000001
dt = 1.758s

object
result = 1000000001
dt = 6.073s

record
result = 1000000001
dt = 3.078s

module
result = 1000000001
dt = 1.881s

I changed the ref cell case to

  (Sys.opaque_identity x) := !x + 1

Without the opaque identity, the time measures at about 0.8s.

Curiously, the object case is not egregiously slow.

s.t.s · October 2, 2024, 6:12pm

There are some here problems where multiple inheritance is desirable, and easy unification of object types (the tagless — variant-less — subtyping) is very useful. The complex related properties and relationships among them are naturally representable with objects.

I’m trying to work out how to implement a similar concept with modules only. However, as @Chet_Murthy notes, the solution seems to converge to the use of some form of code generation (of which templates an instance).

Are there any serious arguments against the extensive use of ppx for these purposes? With records and relationships, a lot of the problem is indeed syntactic in nature, with the semantical part being the interface itself rather — something that is easily checked by a compiler.

Well, the OO is quite expressive. However, it seems to me that interestingly there apparently may exist an approach to dealing even with inheritance through the module level consructs (like functors). Yes, even with module system, there’s a significant problem space remaining to be solved somehow — and the solution thereby appears to be through some form of code generation or templating.

I suppose, if we could solve the following three problems without objects in a seamless fashion, then there would be an argument that objects are not really needed:

A natural representation of many distinct yet related records with overlapping record fields.
An implementation of visitors approach to data traversal.
A natural representation of interfaces to databases.

gasche:

bluddy:

To me, it seems like a feature that was put into the language in order to publish a research paper, rather than carrying out the obvious move of implementing a high performance OOP system as is available in other compiled languages.

Well I’m waiting for your PR doing just that.

Many aspects of the OCaml implementation reflect the fact that it’s the product of a small team of people – it’s regularly in the zone where it’s getting 80% of the benefits with %20 of the work. It’s not realistic to expect the same language-engineering approach than what you get from, say, Oracle and its full teams of people extracting extra percents of performance from the JVM. (Or Google and v8, etc.). In particular, the more dynamic the programming-language feature, the harder it is (for everyone) to get reliably good performance. I think you have to accept this idea, and in particular there is no need to get vaguely insulting about it.

This is not to say that people shouldn’t work on making OCaml programs faster (I’m also working on this!), including the object-oriented layer if they feel like it. More power to them! But the particular approach to (1) reasoning about performance and (2) programming-language implementation expectations in this thread is just unhealthy.

It is worth noting that F# has objects. I wonder, what is the level of performance of their implementation…

What is their relative memory consumption?

In principle, we could even store records, and instantiate objects referencing one or more records whenever a more sophisticated interface is desired, just as a special case optimization approach. Still, if the object data representation was compact enough… then even that wouldn’t be necessary.

amongonz · October 2, 2024, 8:42pm

F#'s object dispatch is in the JVM/v8 tier of optimisation efforts. While the F# compiler itself hasn’t got as many resources as C#, both target the CLR, which is an OO runtime, so every data type in F# is implemented over the underlying OO type system and relies on the same dispatch optimisations as C# code does.

Also note that CLR’s OO type system is nominal, there’s no builtin support for structural interfaces as in OCaml, which makes dispatch easier to optimise.

s.t.s · October 2, 2024, 9:00pm

I wonder, if OCaml could’ve targeted an existing compiler backend, like e.g. LLVM? Are there any principal obstacles to that? If it was attempted already, why did it not take off? It seems to me like a good approach, since the language core could be developed with much less effort in that case, and the heavy lifting of portability and environment integration could be taken care by the other project.

What difference does it make, given that the types are erased at compilation of OCaml programs? My mental picture — of a programmer — is that the programs in OCaml are transformed into some intermediate representation of which I could think of as a lisp expression, and then it is compiled and optimized, using both its own structural data, and the structural data offered by the compilation passes that were aware of the type information. My hope is that this picture is not too far from reality.

My mental picture of the ideal object-oriented representation is that it is represented largely in terms of existing language primitives behind the scenes, and the compiler should elide most of the object structure whenever it can — especially in the context of a single program, with whole program optimization being performed.

amongonz · October 2, 2024, 11:01pm

LLVM is a much lower level target than the JVM or CLR, it would just replace the Cmm-to-native-code pass. LLVM doesn’t provide an object model nor dispatch, only optimisation passes and codegen for procedures; it’s still up to you to architect a fast object model.

When types declare their supertypes upfront, the implementation can just precompute method tables for every type-supertype pair, each method being assigned a constant offset, which makes it easier to generate fast code for dispatch (as in C++, Java, C#).

This strategy gets harder without nominal static typing, because you’d need to take the closure of all possible supertypes for every data type instead, which is infeasible. So you either lazily compute method tables at runtime (like Go), or, most commonly, you do a method lookup on each call and add a caching layer (OCaml, Objective-C and most dynamic OO languages).

Even if OCaml can completely erase types, its type system still affects the implementation strategies. Which wouldn’t be so bad with some more optimisation work around devirtualisation, but that won’t happen if no one is interested.

s.t.s · October 2, 2024, 11:20pm

But in the context of the whole program, or perhaps even whole project we have a finite (and known) set of types involved. Therefore, it is always feasible to precompute the static invocation table within the given project.

What baffles me slightly is why is there not an extra information emitted by compiler to enable such optimization across modules?

The more fundamental question to me is as follows. My impression, is that it should be feasible to implement efficient objects in OCaml — it is a matter of optimization design for a compiler. The context is full project and full program optimization, since otherwise we do have a challenge of unbound number of possible supertypes. What is your perspective on that?

Compiler needs to make a decision about allocation of methods. This depends on the entire set of source texts fed into it. Now, suppose that the compiler generates the binary with types erased mostly, and also another annotation file, that leaves enough information about the decisions that compiler had made to allow a static allocation of methods for compiling any valid input in addition to the one already produced. The full[er] type information thus would be within the additional annotations output artefact.

amongonz · October 3, 2024, 12:09am

There’re always trade-offs involved. How many resources are you willing to spend to find and manage all those method tables? You could try to be precise and take a long time during compilation tracking data flow across the program, or you could do it conservatively and risk ending up with too many method tables bloating the executable, most of which would be unused and could slow down table lookups.

In contrast, method lookups or lazy tables are fast to compile, lean on executable sizes and their costs are predictable; OCaml in general isn’t a high throughput language anyway ^[1]. But even without switching to method tables, I’d bet method calls in OCaml could be optimised further through (profile-guided?) devirtualization and better caching.

But ultimately, a more efficient implementation hasn’t happened yet, not because it’s infeasible, but because having to put the effort in is also an engineering trade-off.

I mean, the whole ecosystem is still using list, which is clearly a functional stack, as a substitute immutable array, and copying substrings instead of passing slices around… and I’m supposed to worry about a few extra microseconds for the seldom object dispatch? ↩︎

s.t.s · October 3, 2024, 12:33am

What about the size of attributes contained in object? Is this comparable to the corresponding record size (i.e. of a record containing all these fields)?

amongonz · October 3, 2024, 9:07am

Good question! The representation of objects in OCaml mentions the usual two words for the block header, plus another two for the object class pointer and the object ID, and then the instance fields. So it adds two extra words over a record for the same fields.

Most OO languages use a similar object header, but not on top of another block header as the one OCaml uses for all blocks. Maybe it would be possible to compact those into 2 or 3 words.

But the maths change if you compare an object type to a record of closures: The latter takes an allocation for the record plus one for each closure, and if multiple closures capture the same variable they will duplicate the captured field. (I think this is also the representation of first class modules?)

In contrast, an object will allocate a single block in which all methods share their captured variables as instance fields. The trade-offs here are the same as in F# between records of closures and interface types, just with slower dispatch.

Also, using mutable state with the closures will force you to allocate more ref blocks and capture them, whereas the object can simply place mutable fields in the same allocation.

vlaviron · October 3, 2024, 9:28am

This isn’t completely true, actually.
First, with regular closures you can share the free variables if you put all the functions in a mutually recursive definition.
Second, the method array for objects is a record full of closures, so they can still capture variables independently. There are various tricks used to share free variables in common cases, so in practice most of the free variables captured by methods will end up stored in the object rather than the closure, but it’s not always the case (and it involves more indirections than your explanation suggests).

amongonz · October 3, 2024, 9:35am

Oh sorry, I must have been misled by my quick-and-dirty experiments with Obj before answering.

Does this involve yet another allocation? If so I don’t see the improvement, really.

But they presumably take the instance object as an argument, right? So closed-over variables have no reason to become part of the closure instead of the object (except for, maybe, variables that are constant across all instances of the same underlying class), and the method array can be allocated upfront and shared. Please don’t tell me OCaml is even worse at compiling this stuff than I already knew.

vlaviron · October 3, 2024, 10:11am

No:

type r = { f : int -> int; g : int -> int }
let captured = ...
(* Version 1:
   - One closure for f
   - One closure for g
   - One record
*)
let f x = x + captured
let g x = x * captured
let r = { f; g; }
(* Version 2:
   - One closure for f and g
   - One record
*)
let rec f x = x + captured
and g x = x * captured
let r = { f; g; }

It depends on where the variables are bound.
If a variable is bound outside the class, then there are two possibilities:

If the class is at toplevel, then the variable will be captured by closures individually. With the native compiler, this case doesn’t really exist as most toplevel variables are compiled into static expressions (which don’t need to be part of any closure).
.- If the class is not at toplevel, the variable is captured in a class environment. When creating objects for this class, this class environment is stored into the object (as an anonymous instance variable). Methods then access the variable through this environment.

If the variable is bound inside the class (for example, it is a class parameter), then it is registered as an instance variable and stored inside the object directly.

Your intuition is mostly correct, but OCaml objects and classes are complex and full of corner cases, so it is very hard to be sure that your code is compiled decently.

Topic		Replies	Views
A Proper OCaml I/O Layer Learning io	14	1626	February 25, 2022
Objects use cases in OCaml Learning oop , objects	38	8546	June 19, 2019
In defense of OCaml objects Learning objects	4	708	February 29, 2024
How to optimize the memory consumption in OCaml? Learning performance , datastructures , data-structure	8	707	April 1, 2024
Can compiled code automatically cache computation results? Learning compiler	4	1100	October 26, 2017

Actual Performance Costs of OOP Objects

Related topics