Returning an object with additional fields

I’m trying to implement an approach, where several related record types are inserted into a data structure. Since the record types are related, I am using classes and inheritance to form the desired set of fields.

The data structure works as follows:

  type t (* data structure *)
  type id (* basically, a key *)
  module R = struct
    class record1 = object (* ... *) end
    class record2 = object (* ... *) end
    class record3 = object (* ... *) end
    (* etc. etc. etc. *)
  end
(* tagged variant of various object types
               representing various records *)
  type item =
   | Record1 of R.record1
   | Record2 of R.record2
   | Record2 of R.record2
   (* etc. etc. etc. *)
  val get : t -> id -> item
  (* let's assume a persistent data structure ... *)
  val add : t -> id -> item -> t

The problem that I come across is that of inserting a value. I could initialize the object in whatever way that I need. I could insert it into a structure with a given id. With the get, I could get the same set of fields back. So far so good.

However, I desire a slightly different semantics. I would like to see the object being returned with get to carry its own id. This means that the item being inserted should be somewhat different from the item received.

  val get : t -> id -> item'
  (* let's assume a persistent data structure ... *)
  val add : t -> id -> item -> t

Disregarding variant tags, the difference between item' and item is that item' would carry additional fields. At the type level that might actually be invisible, since the object types only carry about methods. At the level of class types, we have the extra fields reflected of course.

Now, the question is how do I form the item' from item in a generic fashion? I do not wish to manually code a family of classes with additional fields defined along the lines of

  class record1' = object
    inherit record1
    val id : Id.t
  end
  class record2' = object
    inherit record2
    val id : Id.t
  end
  (* ... *)

How may I construct a new object with additional attributes from a given object?

A more general question is: how should I approach the problem of emerging “boilerplate” code in the context of this problem? Is code generation the only solution?

Alternative Approaches?

I would not mind to consider how to solve this problem without involving objects. I could use first-class modules as records. However, in practice, packing and unpacking seems to be rather unwieldy, and moreover it appears that explicit variants tagging is required for storing such records based on first-class modules in any collection. I also don’t see how my problem could be resolved — with modules in general the signatures need to be fully enumerated.

To my eye, this is insufficient information to know what conceptual or practical problem you’re facing, since you could simply have type item' = metadata * item and there’s no boilerplate in defining either.

You’re correct that it is possible to return a tuple. However, the structure is more complicated. The items being returned conceptually form a rather complex graph. There are several IDs available as well (sid, vid, and id, the latter being a unique integer, as well as ord representing a global numerical ordering of items). Conceptually, these IDs are inseparable from the record (not to be conflated with the OCaml’s language construct of the same name), and thus it is natural for them to be actual fields in the actual representation of the record within the language. There are fields in the other records (note that there are several categories of records) and other data structures that form a context that refer to these IDs.

For example, an item could have several research dependencies. Also, a work order could produce several item types. The nature of dependencies is so specific that they belong in the records themselves (I hope this does not raise the questions). The question that arises naturally is why should the record itself not thus contain all these IDs.

The situation I have described in the original post arose from the following observation. Say, I have some item type, with multiple fields

class some_item
  ~weight ~inventory_weight ~inventory_height ~inventory_width
  (* etc. *) =
object
  val weight : int = weight
  val i_weight : int = inventory_weight
  val i_width : int = inventory_width
  val i_height : int = inventory_height
  (* etc. etc. etc. *)
end

It may have a more interesting initialization with even more non-trivial sets of parameters. Now, consider a question of how the signature of a function that inserts these specifications into a global context should look like?

It may be along the lines of

  type t (* universe *)
  val insert_some_item : t -> weight:int -> inventory_weight:int ->
    inventory_height:int -> inventory_width:int -> Sid.t -> Vid.t -> t

In this case, the insert_some_item might as well take care of generating and validating the desired set of IDs, and instantiating an object as a record with all desired IDs included, and adjusting the global context (the “universe” t in our example) data structure to map to this newly instantiated record accordingly.

But this approach is not optimal for a number of reasons. Instead, it makes more sense to approach the instantiation like this:

  (* type item is a class type here *)
  val insert_item : t -> item -> Sid.t -> Vid.t -> t

The behavior of the insert_item should be as follows. Generate and validate the desired set of IDs, instantiate a record that contains all the same attributes as item, plus fields representing the IDs (specifically, id, vid, sid), and adjust the global context data structure to map to this newly instantiated record accordingly. This way, the signature of the insert_item would be generic.

The advantage of such methodology is also in that sometimes the specific ID types could not be supplied externally, and only some of their prototypes could be given, which are later resolved by the routine responsible for insertion. That is, a consistent set of IDs may only exist for a whole collection of items, and would not make sense for a single item. The following API should be more accurate thus:

  (* global context representing a consistent state *)
  type t
  (* cursor to the global context, allowing incremental
     insertion of items *)
  val get_cursor : t -> cursor
  (* the IDs are private types, so we may only provide protos
     to generate IDs *)
  val insert_item : cursor -> item -> Sid.proto -> Vid.proto -> cursor
  (* validate the set of items being inserted ;
     if valid, generate new state *)
  val seal : cursor -> t option

In either case, the global context should contain a record that is largely similar to the one supplied as item. The difference between the class some_item and the actual class type of the item being inserted in the global context is the presence of additional fields representing the IDs.

Would you argue against representation as objects? Why?

Objects represent a finite set of behavior with potentially some hidden states. If what you are only interested is the hidden states part, objects are not a good fit. Moreover, since your overall description is mostly untyped, it sounds like an heterogeneous maps would be a better fit (Hmap / Erratique or GitHub - Octachron/orec: Open records implemented using map over universal type for an example of those) .

1 Like

I wanted to object against specifically the Universal_map (in core) or the Hmap in anticipatory fashion in my original post. What I seek is a lightweight approach to constructing a network of records that are composed from a well-defined set of fields, which are largely overlapping. The light-weight also means efficient — thus in principle, the multitude of records with similar fields is an option, but that would require some code generation.

I’m trying to find a solution that would not require me to develop a code generation routines on my own, and would rely on existing tools within the OCaml ecosystem (anything from JS would be admissible, and to a lesser extent anything on opam).

Or this example from the standard library :–)

1 Like

Right — and thank you for sharing this. That is a technique used to implement so-called “heterogeneous dictionaries”, which frankly I don’t understand the need for in cases where the possible combinations of fields are known at compile time.

The following uncertainty (note the _ option) is completely unnecessary and undesirable. I do know the set of available fields at the compile time already.

 val find : 'a key -> t -> 'a option
  
  	(** find k d is the binding of k in d, if any. *)

Also, the fact that the hetero-map uses a hash table at the bottom is a problem. It’s just not the functionality I seek. I appreciate a more direct and efficient access given by the modules (first-class), objects, and records…

Honestly I’m not exactly sure I understand what you are trying to achieve. But if you are trying to store multiple types of values that share a common interface in a data structure. Then you could use an existential type. Something like that:

 module type SIZED = sig
  type t
  val width : t -> int
  val height : t -> int
 end

module Store : sig
  type item = Item : (module SIZED with type t = 'a) * 'a -> item
  type id = int
  type t
  val add : t -> id -> item -> t
  val get : t -> id -> item option
end = struct
  type item = Item : (module SIZED with type t = 'a) * 'a -> item
  type id = int
  type t = (id * item) list
  let item m v = Item (m, v)
  let add store id item = (id, item) :: store
  let get store id = List.assoc_opt id store
end

In contrast to your item type when you retrieve your values you don’t have to match. However your are only allowed to use them on the common interface.

Am I storing a tuple (as item) in this approach? The reference to a module and a value? Why not just the module as a record (fields being vals)?

What do you mean? The mental image I’m getting here is along the lines “only within a designated monad”, which I assume is clearly not what you had in mind…

I could in theory try to have a module being stored generically return its richer representation. It’s not an issue here. The issue is instantiation of the IDs in the record, within the record, not outside.

Yes it’s a tuple. Because you don’t want to represent your values by modules. In some contexts you may be perfectly fine using values and the module acting upon like you always naturally do.

Let’s leave monads out of the discussion :–) I mean that the existential says: there exists a value on which the SIZED interface can be used. So when you unpack the existential after a get that’s the only thing you will be allowed to use on the value.

In my case, I would need lots of very similar records.

  item_weight : int
  item_cost : int
  item_volume : float
  item_width : int
  item_height : int
  (* etc. etc. etc. *)

Some of the attributes make sense for some items, but not others.
We may have a finite set of item classes (say, ten, for the sake of example) that have their corresponding sets of attributes.

Now, I could define these ten record types in separate modules. I could probably use some of the JS’s ppx libraries to allow records be represented as functions as in

  type t (* record type *)
  val item_weight : t -> int
  (* etc. *)

And then I could define some operations on records through these functions, and have the module that operates on them to be fixed.

I thought that by using modules, I could form fields by including other modules that contain subsets of attributes. Thus, I could have defined e.g. six subsets of attributes, and had defined ten modules by including these subsets. The representation would have been

  val item_weight : int
  (* etc. etc. *)

This is an approach, that I understand, you object to, correct?

If I do not take that approach, and instead approach through the use of records, then I would have a lot of repetitive code. How to solve that problem? Through code generation?

I see. In this case, why store a reference to a module, and not just a variant?

Then again, for a common API — and in storing objects we end up with this constraint — we actually might use a module reference, if it returns a generic type.

I mean, if SIZED has a val encode : t -> External_encoding.t, then we won’t have any variables escaping, correct?

So, SIZED might as well be anything that operates on the set of fields?

Write some code and see how it goes. You could also factorize common fields into a record and then use that as fields of other records.

The quest to get rid of boilerplate at all costs is often not worth pursuing (as is using ppx for devising accessors for your record fields…). People often forget that their metageneration monstruosities need to be understood and maintained. A few years later it’s often more convenient to go through a bit boilerplate to easily understand what is happening :–)

1 Like

How would you approach a similar problem? So far, I have attempted going through the classes, and the only problem I have encountered is the snag I described in the original post here.

I think, some light-weight, even local, meta-generation is possible these days. But is that really the right approach? To generate concrete modules with concrete fields, then somehow tie that together, through concrete variants?

Honestly I’m still not sure I understand what you are trying to do. But perhaps: use a relational database :–)

A database would be waaa-aay-ay-y too heavy-weight for this problem.

An obvious solution along the lines of creating a class corresponding to item' that holds reference to the object of a class corresponding to item exists. Is this a good approach from the standpoint of classic OO school of thought? And also, from the standpoint of OCaml school of thought? Once again, the question is more about correctness than anything else.

  class item' ~id ~sid ~vid item =
    object
      val id = id
      val sid = sid
      val vid = vid
      val item = item
      method present = item#present
    end

That instance could be created in the right context, within the innards of the larger graph, that operates with IDs.

Since the interface to item is relatively uniform, it’s relatively trivial to implement it in the item', by referencing the item. The direct access to fields is sacrificed, but I suppose all relevant operations on the fields could be encapsulated in the item, and that may be even more correct approach from the standpoint of abstraction. What do you think about that?

In principle, if the objects were possible to optimize properly in OCaml — even if it was the matter of inlining only — then we might as well have objects as a default approach to implementing extensible records in OCaml.

Like previous people I don’t have a full solution as it is not clear what your problem is exactly, but I think I would use nested parametric records (to spell out something @dbuenzli suggests):

type 'a metadata = {
  id : Id.t;
  extra : 'a;
}

type ('a, ...) record = {
  metadata : 'a metadata;
  ...
}

You can even do this with records of functions to enforce interfaces (but of course more sophisticated solutions involving existential types can be combined with the above).

1 Like

The previous post suggests a similar approach, but with objects. Consider a question why would one prefer records (for data) rather than objects and classes?

If you’re used to OO paradigms, you can use objects and classes, but if you don’t need the specific features of objects (mostly subtyping) records are better in every way. Even if you need subtyping, depending on the context first-class modules might be actually better.

1 Like

It seems on this forum, the question of using first-class modules as extensible records is a bit contentious. What is your argument for taking this approach?

With classes, it’s pretty easy to cherry-pick fields from superclass definitions… With modules, it’s somewhat more challenging, since any operations would require explicit module signature, and I am challenged to define a val self that would return the full version of the subclass module, so I’m limited to explicitly constructing variants in order to store different modules.