Objects use cases in OCaml

oop
objects

#2

I have seen on regular occasions fairly advanced OCaml programmers write something like this:

module type S = sig
  type t
  val do_thing : t -> t -> t
  val to_string : t -> string
  val other_function : ....
end

type 'a state = ('a * (module S with type t = 'a))

If you are tempted to use that pattern on a large scale at some point in your ocaml programming life, just use objects. :slight_smile:


#3

I’d give the opposite advice. I basically never use objects, and I think first class modules rightly absorb most of the use-cases that would otherwise lead one towards objects, and that seems preferable to me.

All in, I’d prefer if OCaml didn’t have an object subsystem, though really swallowing all of the uses of objects would require some more sophistication at the module-system level.

y


#4

That reminds me of that quote by Joe Armstrong on IBM and OO programming :

Isn’t it the same that happened with the “O” in OCaml ?


#5

Two fairly common use cases that I encountered:

  • bindings to object-oriented libraries e.g. Gtk and various JavaScript libraries
  • visitors/traversers, here even Jane Street’s ppx_traverse uses objects

#6

Amusingly, ppx_traverse is close to the only spontaneous use of OCaml objects in our libraries. @Jeremie Dimino could probably explain more about why objects are the right choice here, because it’s not obvious to me.

y


#7

Open recursion? If you want to extend and modify the behavior of an existing piece of code with a minimum of fuss, objects are a good choice.


#8

@Yaron_Minsky When we initially wrote this code, we started with records. However the code looked like manual inheritance and it was very easy to call the wrong callback. With objects the code was much clearer.


#9

Additionally the syntax for record of functions is a bit ugly compared to the syntax of objects.


#10

One other argument can be that if there’s a need to pass a visitor handling a superset of node types as an argument to function that only need to apply it to a subset of the types, there’s no need in a coercion. First-class modules need a re-packing in those cases, records need re-allocation. But of course with objects there’s than a cost of method lookup and no method inlining…

module type S = sig val x : int end;;
module type S' = sig val x : int y : int end;;
let f (module M : S) = ();;
module M : S' = struct let x = 5 let y = 6 end;;
let m = (module M : S');;
f m;;
(*Error: This expression has type (module S')
       but an expression was expected of type (module S)*)
f (m :> (module S));;
(* Error: Type (module S') is not a subtype of (module S) *)
f (let (module M) = m in (module M));; (* OK *)

#11

BTW, for performances, we found that a lot of the cost of ppx rewriters came from the fact they were applied as separate whole passes. Merging them all into one pass divided the time spent in preprocessing quite a bit. We have a blog post about it: https://blog.janestreet.com/ppx_core-context-free-rewriters-for-better-semantic-and-faster-compilation/


#12

The package visitors, of François Pottier, makes wide usage of objects. https://gitlab.inria.fr/fpottier/visitors


#13

When OCaml first introduced its object system, it was, at least in my memory, at around the peak of the popularity of OO as a language design feature (mid to late 90s). OO was considered the holy grail that would solve all issues of reusability. That’s what we basically got shoved down our throats in freshman classes.

But when I actually tried to implement more complex OO applications, something never felt right to me. I started reading books on language design and downloaded OCaml as well as other “academic” languages. The fact that it also supported OO gave me confidence that I would be able to use my OO skills while learning about functional programming. Needless to say, it took only a few months before I came to the conclusion that using algebraic datatypes, first class functions, and modules was a much better way to design applications.

I tried once in a while to use objects when it seemed convenient. In all cases I eventually regretted this choice. As far as OO type systems go, OCaml is much better designed than just about any alternative. Nevertheless, I still wouldn’t waste a tear if the OO part were removed. That said, it helped get me interested in OCaml back in the days.


#14

As a side note, I only use the to_string name when there is only one, obvious way to convert t to a string. When there are several functions that might be called “to_string”, I prefer to call none of them “to_string” and force myself to find more descriptive names for each.

The OO philosophy of “reusing as much as possible” the to_string method name actually makes users spend more time browsing the manual to see what “to_string” really does, IMHO.


#15

I’ve had some success using class types in the constraint expressions for shadow type parameters in abstract types representing objects of foreign language types where class inheritance is a thing.


#16

Is the following recommendation from ocaml.org still valid and complete?
https://ocaml.org/learn/tutorials/guidelines.html#How-to-choose-between-classes-and-modules
(if it is not, it would be a great chance to update it, because ocaml.org is a main entry point for new comers where there is very good stuff as well as ambiguous or too old information).

How to choose between classes and modules
You should use OCaml classes when you need inheritance, that is, incremental refinement of data and their functionality.

You should use conventional data structures (in particular, variant types) when you need pattern-matching.

You should use modules when the data structures are fixed and their functionality is equally fixed or it’s enough to add new functions in the programs which use them.

@mmottl (and to others who feel concerned): Can you describe your experience about turning from OOP to FP? Can you explicitly tell us what you should really change in your mindset and toolset?
Today, if you have a look at a “(unusually) well designed” UML model (especially constrained with clear OCL), can you see any limitations of OCaml to implement it? Or does it appear always clear that OCaml modules will do the job ?

We could distinguish two different contexts:
Case 1: the model is quite definitive and only satellites classes will be added, or global implementation will continuously be improved (new libs…).
Case 2 (more usual): it is already known that the core model will evolve, progressively but rather drastically (think about the case of a merging between two companies).


#17

It’s worth mentioning that objects are useful for modeling existing object-oriented API’s. I think bucklescript and js_of_ocaml both use them to model the DOM api. Lablgtk also exposes the Gtk api directly in OCaml, and I think was part of the incentive for researching and adding objects to OCaml.

I also really like the structural polymorphism that it adds: being able to write a function that doesn’t care the type of a particular piece of data, so long as it exposes a certain set of methods. The cool part is that nowhere do you have to specify the set of methods: OCaml can infer it. I worked on a small project that plays with this a bit called Orb


#18

My main initial reason for switching away from OOP was the observation that algebraic datatypes + pattern matching proved to be greatly superior in clarity and conciseness. Later I realized that the module system was better at expressing relations between different types and allowed for stronger abstraction guarantees. Overall, FP code was simply much easier to reason about.

As for changing your mindset, convince yourself by implementing some frequently used data structures (i.e. the nitty-gritty details) with either objects or algebraic datatypes. Then practice big-picture programming by e.g. refactoring code either using the object system or the module system (functors). With some practical experience most people realize soon that OOP rarely if ever delivers better results.

I never look at UML models. Back in my days as a student, CASE (Computer Aided Software Engineering) was all the rage so all kinds of graphical formalisms (including UML) were pushed on us, invented to delude business managers that all it takes to develop great software is to make some beautiful drawings that the computer would then magically implement. Finally they could get rid of all those overcompensated computer nerds! When this didn’t quite work, they forced cheap imported programmers to robotically implement them, with predictable outcomes for delivering quality products in a timely manner. Eventually, CASE fell out of vogue.

AFAIK, today UML is mostly used sloppily and pretty much exclusively within the OO community to facilitate the communication of design ideas. Nothing wrong with that. But more “rigorous” UML doesn’t seem to add much value. I’m not aware of graphical formalisms having any traction in the FP community. Academic research associated with FP is heavily invested in formal methods (type systems, automated theorem proving) to automatically implement (or at least help implement) provably correct software. The advanced static type systems modern FP languages like OCaml offer are essentially an intermediate step towards that goal.


#19

I note that even in languages like C++, inheritance is not nearly so used as it was when the language was conceived. Instead, polymorphism is usually achieved with the template system.


#20

Classes provide open recursion with late binding with a row polymorphic self type. In a separate, all these facilities are provided by other language mechanisms that are more straightforward and easier to use and understand. So following Occam’s razor principle it is better to use the least heavy tool. Let me expand this definition before going any further.

  1. open recursion - is a technique when a recursive function calls itself not directly, but via an explicit parameter, e.g.,

let fact self  n = if n = 0 then 1 else n * self (n - 1)

The fixpoint combinator could be used to tight things up, e.g.,

let rec fix f n = f (fix f) n
let fact n = fix fact n 

So, as you can see, there is no need to use classes for open recursion as you can use explicit functions or records for that (as done in the OCaml’s AST rewriters).

  1. Late binding is a technique when a function is not called directly, but via a slot that could be overridden in runtime by any other function with a matching type. Technically, this means that the self parameter is mutable. That allows an algorithm to call a method that is dispatched based on some runtime information. Beside the obvious solution with mutable records, it is possible to use recursive modules, e.g.,
type student = {age : int}

module type Student = sig
  type t
  val create : int -> t
  val age : t -> int
  val say : t -> string
end with type t := student

module Base(S : Student) : Student  = struct
  let create age = {age}
  let age s = s.age
  let say t = "I'm " ^ string_of_int (S.age t) ^ " years old\n"
end

module Older(S : Student) : Student = struct
  include S
  let age s = S.age s + 1
end

module Younger(S : Student) : Student = struct
  include S
  let age s = S.age s / 2
end

module Same : Student = Older(Init)

module rec S1 : Student = Base(Younger(S1))
let s1 = S1.(say@@create 20);;
module rec S2 : Student = Base(Younger(S2))
let s2 = S2.(say@@create 20);;

As you can see we can override methods, and even explicitly specify the the order of inheritance. Underneath the hood the compiler allocates a function table for each operation in the interface and assigns each operation an implementation in the order of the functor applications. This is basically what happens with classes, so we nearly got OO with functors, however, we still have some limitations wrt to what classes can give us.

  1. polymporhic self type - as we can see from the previous example, we had to constrain the student type to some concrete implementation. But what if we need to keep it polymorphic and enable refinement (i.e., adding new operations or deleting constructors) of the self type in the derived classes? Thanks to the addition of private row types, we can actually do this with recursive modules, (depending on what we want we can use polymorphic variants or object types to denote row types). The object type example is provided in the OCaml manual, an example with a polymorphic variant is provided below:
module type Ops = sig
  type expr
  val eval : expr -> expr
  val show : expr -> string
end
type 'a expr0 = [`Num of int | `Plus of 'a * 'a]

module F(X : Ops with type expr = private [> 'a expr0] as 'a) = struct
  let () = print_string "eval F"
  type expr = X.expr expr0
  let eval = function
    | `Num _ as x -> x
    | `Plus (x,y) -> match X.eval x, X.eval y with
      | `Num m, `Num n -> `Num (m+n)
      | z -> `Plus z

  let show = function
    | `Num x -> string_of_int x
    | `Plus (x,y) -> "(" ^ X.show x ^ " + " ^ X.show y ^ ")"
end

module rec L : (Ops with type expr = L.expr expr0) = F(L)

So, using functors together with private row types we can implement arbitrary class architectures. However, at some point of time, using classes directly will become a cleaner and even more robust solution. For example, the classes infrastructure in OCaml comes with some prebuilt mechanisms, like virtual methods whose implementation is checked at compile time (recursive functors will fail in runtime if any method is not implemented).

So once your domain actually requires you to model complex and open recursive types, so that you need to use modularity to split the definitions of recursive algorithms for those types into different modules, and you have to keep the type polymorphic and open to extension (i.e., the hierarchy is not closed). Then you may choose classes for that.

In the real world such hierarchies are very rare and such design constraints are even rarer. A particular example is a complex and extensible recursive language, i.e., when you have to write an analysis for the language that is extensible (i.e., when the set of branches in AST is not closed). A particular example, would be camlp4 and camlp5 which are frameworks for writing extensible pretty printers and parsers.

However, when not only the language is extensible but the set of analysis (i.e., the set of operations applied to the language) is also meant to be extensible, then you hit the so called Expression Problem. In that case neither algebraic data types, nor class hierarchies will work and you have to switch to Object Algebras aka Tagless Final approach.


#21

@Drup suggested that the following is possibly inferior or more complex than using objects:

module type S = sig
  type t
  val do_thing : t -> t -> t
  val to_string : t -> string
  val other_function : ....
end

type 'a state = ('a * (module S with type t = 'a))

I see his point but as an alternative, I was thinking of doing something like this:


(* A basic, typical module definition. *)
module type S = sig
type t
val get : t -> unit -> bool
end

(* An implementation of [S] which simply returns its associated value. *)
module Id : S with type t = bool = struct
type t = bool
let get t () = t
end

(* In spirit of [S], but with [type t] stripped out of the signature. *)
module type S' = sig val get : unit -> bool end

(* Functor to convert an [S] to [S'], given a factory function for producing values of [type t]. *)
module Make(T : sig include S val create : unit -> t end) : S' =
struct
let t = T.create ()
let get = T.get t
end

(** Example instantiations of [S'] from [S]. *)
module Always_true = Make(struct include Id let create () = true end)
module Always_false = Make(struct include Id let create () = false end)

(* Example of a function aggregating over all the module instances. *)
let for_all l = List.for_all (fun (module T:S') -> T.get ()) l

(** Should return [false]. *)
let result = for_all [(module Always_true);(module Always_false)]