Returning an object with additional fields

vlaviron · September 30, 2024, 11:22pm

If you translate classes to modules, inheritance is expressed through functors (ancestors are the functor parameters). There is no convenient equivalent to the self parameter, but you can encode it manually if you want (add a self parameter of the right type to each function in the module) or wrap the functions in references to make them overridable.
The compilation of objects and classes actually does something similar (the OCaml runtime itself has no particular support for OO features), so you wouldn’t necessarily be losing on performance.

s.t.s · September 30, 2024, 11:26pm

This is a a rather interesting perspective.

So, what is your vision of data representation? Would a record be represented as follows?

module type ITEM = sig
  (* ... within some module ... *)
  val weight : float
  val sprite : Sprite.t
  val inventory_width : int
  val inventory_height : int
  (* etc. *)
end

Now, how would we actually implement self? Or, do we need explicit variants (outside the module datatype), like in the following code?

type = t
  Generic_item of (module GENERIC_ITEM)
  Special_item of (module SPECIAL_ITEM)
  (* etc. *)

vlaviron · October 1, 2024, 7:39am

Records should be represented as OCaml records, not modules… But if you want to live completely in the module world, that’s the way to go (although if you want some of these fields to be mutable you will need to wrap them in references).

If you only need self to access a few specific methods, you can have a specific module type for the things you need and use that as the self type:

module type Self_type = sig
  val foo : int (* data *)
  val bar : int -> unit (* method *)
end
module type ITEM = sig
  (* whatever you need here *)
  val foobar : (module Self_type) -> unit
end
module Item1 = struct
  let foo = 0
  let bar x = print_int x
  let foobar (module Self : Self_type) = Self.bar Self.foo
end

let () = Item1.foobar (module Item1 : Self_type)

It’s possible to handle more complex cases this way, but you’re going to stretch the limits of the module type system (I tried a version where the self type contains methods that take a self parameter too, and it kind of works but requires -rectypes and eventually triggered a stack overflow in the compiler when I tried to push it a bit further).
In conclusion, if you really need the full power of object-oriented programming, you should just use objects and classes. But if you don’t actually need the full genericity, objects will still make you pay for the full costs so you may prefer using a more suitable paradigm.

s.t.s · October 1, 2024, 7:45am

No, I don’t want to live completely in the module world. Currently, I see that extensible records (and by extensible, I really mean avoiding to repeat re-defining the same fields for largely overlapping records) are only approachable with through the use of objects and classes. And it is unclear to me why the objects are apparently so disliked within important segments of the OCaml community. The only workaround the objects that I see is indeed to use records and to generate code. The question is why generate essentially a certain portion of object and class functionality, when such functionality is present?

My attempt at considering first-class modules as records is actually an attempt to solve the extensibility problem, that with objects and classes I could solve through inheritance at once. So far, I don’t see how such approach could be made workable.

So, the only cost of using objects currently is the cost associated with performance, under currently available compilation scheme? Is that correct?

If so, then I wonder if advanced options, like flambda or flambda-2 might actually inline all method calls that are known statically? If yes, then objects would be fantastically useful and efficient.

While working on this problem, I’m little by little starting to work in a way that pretty well matches the visitors paradigm, that @lukstafi mentioned here. Basically, I define classes that contain attributes and interconnected properties in a relatively dynamic fashion, and that are used as a layer for interacting with larger data structures defining relationships between essential game entities. This approach is turning out to be fantastically useful. Given that, I don’t understand what appears as a condemnation that objects and classes receive within the aforementioned segments of the OCaml community. I don’t see how these problems could have been solved without objects or without unreasonable amount of code generation (all without the semantic benefits of object system).

octachron · October 1, 2024, 8:12am

Objects are not disliked, they are merely considered unnecessary in many situations or too slow in others (after all they have essentially the same performance characteristic than the hmap that you dislike so much), but if you have a solution that works for you with objects you should not hesitate to use them.

vlaviron · October 1, 2024, 8:33am

No, method calls cannot be known statically with the current compilation scheme (that’s part of what I meant when I said that you always pay the full costs with objects).

s.t.s · October 1, 2024, 8:40am

Even when the methods are all invoked in the same project? Within the same linking set? That feels a bit limiting. I know some things about the caml abstract machine, and the data representation in OCaml, and I do not see any reason why such optimization would not be possible in principle, at least if the relevant optimization are done behind the wall of public interface of a single library.

I’ll try to benchmark at some point later the performance with record access — thankfully the JS’s Core made such things much simpler. I guess, unless we’re dealing with absolutely performance-critical code (where the cost of code generation of static structures might as well be justified), we wouldn’t see anything worse than a classical C access-by pointer cost equivalent.

threepwood · October 1, 2024, 8:41am

It’s simply because defining and accessing them is nicer and the lack of subtyping is a plus when you do not need it since it allows for more robust typing / more informative errors.

You say you need “extensible records”, but you also say you know everything at compile time and the structure is not very complicated, so it sounds like you do not need extensible records? If you just work out a hierarchy of parametric record types that can represent all your kinds of entities with different parameters you do not need accessors, the only boilerplate is type definitions. Functions that only touch part of the content can be polymorphic over the other type parameters.

dbuenzli · October 1, 2024, 8:51am

Personally I highly dislike them because they are one more thing to learn and remember about the language for as, you point out, little in return.

I certainly learned the O at the beginning of the century when I got into the language but quickly completely forgot about them. They eventually returned as terribly unergonomic phantom types in js_of_ocaml, but I got rid of that too.

At the beginning of this decade I thought their row polymorphism would be good for typing my relational database interactions, but that failed too (you get all sorts of loss of polymorphism problems and eventually you realize it’s not what you want, what you want in this context is first class record fields).

So yes I dislike objects because they add complexity and choice in an already complex language.

s.t.s · October 1, 2024, 8:52am

It seems that the concept of using first-class modules with values as record fields (that is, vals representing individual fields of a record, to which the module itself corresponds) also seems to be a contentious point.

Things already become cumbersome and repetitive enough at this level, that the question of code generation emerges. The question is whether the light-weight self-contained ppx is a good approach here or not.

I don’t see how. If I have one record type with 4 fields and another record type with 5 fields, I’ll need different functions to process things.

I could construct the bundle of processing functions for all record types by generating code. However, besides being cumbersome, I would actually be generating way too many (combinatorially) functions than I would if I e.g. used objects… That is a pollution of binary code-space.

s.t.s · October 1, 2024, 8:54am

Thank you for sharing your insight.

How would you approach a problem of defining a series of largely overlapping record types?

dbuenzli · October 1, 2024, 8:58am

I would simply define a record for the shared parts and have a field for them in the records that share them.

vlaviron · October 1, 2024, 9:01am

That’s the curse of objects: they’re not first-class citizens, they’re emulated using type-unsafe low-level primitives. The optimising compiler has no chance to actually recover the intended object structure.
If we changed the way we handle objects in the compiler, a lot more could be done, but it’s a huge amount of work, and there is very little interest both in the core developers group and in the general community for this.

s.t.s · October 1, 2024, 9:03am

That means that different combinations of common records would be accessed somewhat differently in the code that uses these records.

Why not use one of variants or fields ppxs from ppx_jane? They break up records into functions over fields. You may just define each of the overlapping records (yes, duplicating code), and then program against the interface (expressed in terms of functions) generated by the ppx. Why introduce the extra fields and indirection?

s.t.s · October 1, 2024, 9:10am

vlaviron:

s.t.s:

Even when the methods are all invoked in the same project? Within the same linking set? That feels a bit limiting. I know some things about the caml abstract machine, and the data representation in OCaml, and I do not see any reason why such optimization would not be possible in principle, at least if the relevant optimization are done behind the wall of public interface of a single library.

That’s the curse of objects: they’re not first-class citizens, they’re emulated using type-unsafe low-level primitives. The optimising compiler has no chance to actually recover the intended object structure.
If we changed the way we handle objects in the compiler, a lot more could be done, but it’s a huge amount of work, and there is very little interest both in the core developers group and in the general community for this.

I see. This is a compelling argument.

So, what is the solution to the problem of overlapping records? The @dbuenzli mentioned an approach based on nesting records (defining the common subsets of fields as individual records, and referencing those from aggregate records). However, that does not feel like a very regular design to me. The remaining options are to either use first-class modules as records (and as noted in our discussion, there’s already an issue arising from attempting to model requisite object functionality within the current type system confines), or perhaps to consider records and fields ppx from ppx_jane. I just started to learn the latter facilities.

I’m pretty sure that I’m not the only one encountering this problem, and perhaps someone already solved it, and even developed a decent ppx code generator, that is not over-engineered (since this problem is very easy to utterly over-engineer).

threepwood · October 1, 2024, 9:25am

Just put the non-shared fields in a record type and make it a parameter.

type a_data = { ... }
type b_data = { ... }
type 'd entity = { id : Id.t; data : 'd }
let do_thing_with_id { id; data } = { id = Id.do_thing id; data }

s.t.s · October 1, 2024, 9:32am

How to create a union of fields from various xxx_data types in a natural fashion?

 type a_data = { ... }
 type b_data = { ... }
 type c_data = { ... }
 type d_data = { ... }

 type data_01 = { (* should have fields from b_data and c_data *) }
 type data_02 = { (* should have fields from d_data and e_data *) }
 type data_03 = { (* should have fields from b_data and a_data *) }

The only approach I see currently is to manually define the fields by copy-pasting.

As for assigning id and sid, as was asked originally in the initial post of this thread, I am agree with your approach. In fact, even with objects, I’m ending up with a similar method — instead of extending the original attributes object, I instead create a class that holds a reference to another.

vlaviron · October 1, 2024, 10:01am

type a_data = { ... }
type b_data = { ... }
type c_data = { ... }
type d_data = { ... }

type data_01 = { (* should have fields from b_data and c_data *)
  b_data : b_data;
  c_data : c_data;
  ...
}
type data_02 = { (* should have fields from d_data and e_data *)
  d_data : d_data;
  e_data : e_data;
  ...
}
type data_03 = { (* should have fields from b_data and a_data *)
  b_data : b_data;
  a_data : a_data;
  ...
}

It involves some indirections, particularly if your hierarchy is deep, but that’s the natural way of doing this in OCaml.
If you use modules instead of records, you also have access to the include syntax which allows you to have all fields at the same level (and handle duplicates decently). Although module types are more complex than record types, so for example if you need to make your types parametric then it’s probably going to be easier with records.

s.t.s · October 1, 2024, 10:03am

Are there any good arguments available against using modules as records?

Is there a code generation solution that I may consider here as an alternative?

(One possibility may to represent the record as a bundle of functions using ppx_jane, and then use module inclusion and copy-pasting for enumerating fields in the type itself.)

Kakadu · October 1, 2024, 10:57am

@vlaviron Is there any written down notes how objects could be improved?
I always thought that objects in OCaml are cursed to be slow, because devirtualization requires either whole-program or optimization or C++ -like CRTP patterns.

Topic		Replies	Views
Generalize the fields of a record to apply validation (& other ops) Learning type-system	7	2308	November 30, 2018
Records, tuples and variants subtyping Community type-system	6	2824	September 29, 2020
Ocaml record <-> SQL Learning	3	462	February 6, 2023
What's the use case of `val` field in class type? Learning	5	824	April 10, 2021
Empty records and record types? Learning type-system , ocaml , syntax , records	4	651	March 29, 2024

Returning an object with additional fields

Related topics