If they were representable through proper primitives, then the optimization would come for free. I also have a feeling that there’s already being a lot done towards the whole-program optimization in at least flambda-xxx variants of the compiler. Please correct me if I’m wrong (since the feeling is more of a guess).
Nothing written down, but as @s.t.s said above the first step is to delay compilation to allow the optimisation passes to see the OO operations as dedicated primitives. After that, some additional work is needed to actually find out what we can optimise and estimate the benefits, but this part can be done incrementally.
Delaying compilation is hard because the compilation scheme was mostly undocumented (there’s now a PR to add documentation, but that required reverse-engineering the current implementation). In addition, pushing it down the backend will require having code in both the bytecode and native backends to handle objects, which has a maintenance cost that we’re not looking forward to.
The flambda versions are slightly more convenient for whole-program optimisation, and there had been a prototype at some point to do whole-program optimisations with a flambda-based compiler, but there were too many drawbacks to actually integrate the feature properly, and there hasn’t been any work in this direction since.
Note that we do have cross-module optimisations already (with and without flambda), which cover many of the use cases for whole-program compilation, with a few notable exceptions like whole-program dead code elimination.
What are the obstacles, — essential principal and theoretical — and the ones reflecting the current state of implementation and related pragmatic considerations?
The approach I thought about would be to at least inline what could be inlined within a whole program. This should be more than enough for the use case of objects as extensible records.
Note that the extensible records problem also related to the solution to the problem of providing an interface to databases without relying too heavily on code generation.
Are you referring to an approach, where there’s an external map into sets of ids or even records? That is, an approach based on the use of maps and records in the style of relational database? Did you instead suggest to use an actual relational database, like sqlite or pgsql?
Note: the question is partially inspired by this page.