The shape design problem

Real World OCaml has one example implementation for a similar problem, but it missis a critical point: that area is pure but draw effectful, and maybe should be dealt with differently because of that. It also doesn’t highlight trade-offs between different designs. Maybe we can add different sample implementations to ocaml.org/learn?

Pretty theoretical point of view. :smiley: I was trying to improve the ergonomics of the module solution.

Do they still achieve polymorphism this way?

Edit: Looks like it: First-Class Modules - Real World OCaml

I think the two approaches (both employing first class modules conforming to an interface) are identical in effect. One keeps instance data in a wrapper module wrapping both the data and the functions to operate on it, and the other keeps them in a GADT with the existential type (which is the type of the instance data) as a bridge between the typing of the instance data and that of the first class module.

That’s as I understand it, but as I said these existential types have come as a bit of a surprise to me. However the answer to your question is I think “yes”.

You can read these forums as a mailing list (by enabling mailing-list mode). I find it more convenient that going to a web site.

The ergonomic problem comes from the existential wrapper, but it is only useful when you want to put different kind of shapes in the same container. Otherwise you could stick to jane street design use of first class modules.

What do you think of this ergonomic?

module type SHAPE = sig
  module type S = sig
    type t
    val area : t -> float
    val draw : t -> unit
  end

  (* here for ergonomic reason, it's important to note that this constructor
      has only one argument that is a pair *)
  type shape = Shape : ('a * (module S with type t = 'a)) -> shape

  (* two useful combinators to deal with `shape` type *)
  val to_shape : (module S with type t = 'a) -> 'a -> shape
  val with_shape : 
    f:((module S with type t = 'a) -> 'a -> 'b) ->
    'a * (module S with type t = 'a)  -> 'b
   
  (* here we use the jane street design of first-class modules *)
  val foo : (module S with type t = 'a) -> 'a -> int
  val bar : (module S with type t = 'a) -> 'a -> unit
end

(* now a signature for a type that implement the Shape interface *)
module type SHAPE_IMP = sig
  type t
  val area : t -> float
  val draw : t -> unit
  val make : unit -> t
end

(* an example of code usage with this design *)
module Code (Shape : SHAPE) (Point : SHAPE_IMP) (Rect : SHAPE_IMP) = struct
  open Shape

  let p1 : Point.t = Point.make ()
  let r1 : Rect.t = Rect.make ()

  let i : int = foo (module Point) p1
  let j : int = foo (module Rect) r1

  let () = bar (module Point) p1; bar (module Rect) r1

  let () = 
   (* mix of shapes of different kinds in a list *)
    let l : shape list = [
      to_shape (module Point) p1;
      to_shape (module Rect) r1;
    ] in
    (* that's where it's useful that the Shape constructor has only on argument *)
    List.iter (fun (Shape shape) -> with_shape ~f:bar shape) l
end

Edit: add some type annotations in the code usage example to make things clearer.

FWIW I once posted a few instructions on how to do that here.

I have no experience with the expression problem. When is code reuse through polymorphic variants really more appropriate ?

Oooh, yeah! I guess it would make it possible to add new data without touching the original definition? Jeez, so many different versions possible!

I guess polymorphic variants would make more sense in a library situation, where the original code cannot be changed?

Tagless Final is another thing, code is parametrized over interpretation of terms, and it’s not really possible to actually use first-class module because we need parametrized type (@ivg actually uses first-class modules in BAP, but he had to make some adaptations).

Since we’re talking about shapes, I’ll say the oldest book that illustrate this design is Euclid’s Elements of Geometry (especially book V and its theory of proportion). :smiley:

To be more serious (even if I’m really serious about Euclid :wink:), the principle is similar to the use of type classes in Haskell, traits in Rust or interface in Go. So, I think resources about this subject in these languages could be useful. And, it’s also pretty similar to the use of object in OOP design.

module type Shape = sig
  type t
  val area : t -> float
  val draw : t -> unit
end

type 'a shape_method = (module Shape with type t = 'a)

type 'a shape_object = 'a * 'a shape_method

One difference with an object is that the type 'a, in a 'a shape_object, is the type of its instance variable, and in an object system we cannot talk about that type. But here, we can split the type 'a and the method that operate on it: it’s the whole purpose of the module system to do so.

Edit: people will may be less scared, if I add some type aliases to the signature of the shape module

module type SHAPE = sig
  module type S = sig
    type t
    val area : t -> float
    val draw : t -> unit
  end

  type 'a shape_methods = (module S with type t= 'a)
  type 'a shape_object = 'a * 'a shape_methods
  type shape = Shape : 'a shape_object -> shape

  val to_shape : 'a shape_methods -> 'a -> shape
  val with_shape : f:('a shape_methods -> 'a -> 'b) -> 'a shape_object -> 'b
   
  val foo : 'a shape_methods -> 'a -> int
  val bar : 'a shape_methods -> 'a -> unit
end

I hope that the correspondence with OOP design will be clearer this way.

1 Like

For a practical use-case here, I would try to resist the temptation to reach for the big
guns — GADTS, first-class-modules, polymorphic variants etc.

Simple records come a long way. First, defining a Shape module (the type t
can be made abstract):

module Shape = struct
  type t = { area : float ; draw : unit -> unit }
  let make ~area ~draw = { area; draw }
  let draw_all = List.iter (fun s -> s.draw ())
end

Second, define the Shape instances as separate modules where each module exposes
a to_shape function:

module Point = struct
  type t = { x : float; y : float}
  let make ~x ~y = { x; y }
  let to_shape { x; y } =
    Shape.make ~area:0. ~draw:(fun () -> Printf.printf "Point (%f,%f)" x y)
end

module Rectangle = struct
  type t = { bottom_left : Point.t; bottom_right : Point.t }
  let make ~bottom_left ~bottom_right = { bottom_left; bottom_right }
  let to_shape { bottom_left; bottom_right } = failwith "TODO"
end

To create some shapes and bundle them together:

let my_point = Point.make ~x:10. ~y:20.

let my_rectangle =
  Rectangle.make
    ~bottom_left:(Point.make ~x:1. ~y:1.)
    ~bottom_right:(Point.make ~x:4. ~y:5.)

let my_shapes = [ Point.to_shape my_point; Rectangle.to_shape my_rectangle ]

let () = Shape.draw_all my_shapes

Sure, you need to lift each instance manually using the to_shape functions, but the solutions
where you pack a first-class module along with a value also require lifting.

3 Likes

Thanks ! Today I learned (a lot)

Also a nice solution! But I wouldn’t say polymorphic variants are “big gun”, they’re pretty basic, no? And I think their trade-offs are worth considering (will do one later, must just find time/brain space). But yes, it’s always important to keep in mind what is idiomatic and what is advanced type-hackery. ^^

Instead of to_shape, you can also do to_drawable or drawable_of_shape. Same with from/to SQL representation.

1 Like

I’ll be glad if someone can explain me what is intelligible in my solution. To me, it seems the most natural and clearest way to solve his problem.

Can someone explain me how programmers used to define their concepts? To me, that’s how I proceed.

In our case we don’t have a birch, a lime and a oak, but a point and a rectangle and abstracting the way they differ we want to define the notion of a shape.

Using a sum type seems clearly strange and clumsy to me : do we really define a tree in enumerating all the different species : birch, lime, oak… And each time we find a new tree, we add it to this enumeration? No, we compare them and try to find what they have in common.

In our case, the most plausible situation, as describe in the original message, is that we first have three compilations unit Point, Rectangle and Circle. That’s how the message begins:

and what they have in common in their behavior:

The question in then as follow:

At this point, we don’t still have the notion of shape (it should be find), and the most natural way to write this in OCaml is doing so:

(* I use toplevel module, but in practice in will be compilation unit *)
module Point : sig
  type t
  val make : int -> int -> t
  val area : t -> float
  val draw : t -> unit
end = struct
  type t = {x : int; y : int}

  let make x y = {x; y}
  let area {x; y} = 0.0
  let draw {x; y} = ()
end

module Rectangle : sig
  type t
  val make : Point.t -> Point.t -> t
  val area : t -> float
  val draw : t -> unit
end = struct
  type t = {bottom_left : Point.t; top_right : Point.t}

  let make p1 p2 = {bottom_left = p1; top_right = p2}
  let area _ = 3.0
  let draw _ = ()
end

module Circle : sig
  type t
  val make : Point.t -> float -> t
  val area : t -> float
  val draw : t -> unit
end = struct
  type t = { center : Point.t; radius : float}

  let make center radius = {center; radius}
  let area {center; radius} = Float.pi *. 2.0 *. radius
  let draw _ = ()
end

So, here is our starting point. And now, I ask my self what is the common part between these three modules: they all satisfy these following interface

module type SHAPE = struct
  type t
  val area : t -> float
  val draw : t -> unit
end

That’s why, at this stage, I use first-class module to define a shape as something that satisfy this interface, in the same way that Jane Street does in their libraries.

But then, there is this constraint:

So I need to define a type that contains all the possible shapes, abstracting away the underlying type of this particular one. That’s exactly what existential types are aimed for.

When I write:

type shape = Shape : ('a * (module Shape with type t = 'a) )-> shape

that really means a value is a shape if it can be drawn and has an area. In other words, if there exists functions area and draw on it.

When you have two incompatible types, say int and float, the way to put them in a unique one is to use a sum type:

type int_or_float = Int of int | Float of float

and you use the Int and Float constructor to construct such a value.

But when you have an infinity of possible incompatible types, in OCaml the most natural way to define this (infinite) sum is to use the existential GADT I defined. And then you will use this constructor to construct a shape value.

val to_shape : (module Shape with type t = 'a) -> 'a -> shape

What is really not intelligible in this code?

2 Likes

For my own part (I cannot speak for others), if you reread my post I think you will see that my posting to which you refer was not commenting on the way you propose to set out the overall structure of your code in order to provide a solution to the OP’s posting.

I was commenting on whether, given that structure (involving the combining of data with a first class module containing the functions to operate on the data), this is more intelligibly done using an existentially quantified type via a GADT in the style you have shown (and which was new to me) on the one hand, or using a further wrapper module in the style shown in Real World Ocaml on the other hand, given that the effect of either mechanism is the same. I expect they have roughly equivalent efficiency also.

On the wider structural question you raise, with the simple case originally put by the OP, I would probably do neither, and instead just use variants holding records as a payload. If an explicit supertype/subtype relationship was wanted with respect to the shapes (and it may not be) I would use polymorphic variants with the records.

1 Like

I forgot a negation in my message, I wanted to say not intelligible about my solution. But if I understand correctly what you say here:

you don’t find my solution not intelligible. I believed that you find the version in Real World OCaml (without existential) more intelligible than mine, and so that mine was not intelligible.

Just a problem of misunderstanding from me, sorry.

I think the issue is that, although I understand the principles of GADT’s (attaching a secondary type to a variant case which can be used for type deduction purposes) and I can use them successfully, I find the inner working of the GADT’s type system quite mysterious, particularly in the behavior of existentially quantified type variables provided in the constructor of a GADT. What the existentially quantified type is doing in your code is informing the compiler that the type of t in the first class module concerned is the same as the data type (which is the existential type). Fair enough, but there are other ways to do it.

This reminds me of template metaprogramming in C++ which is OK in moderation but used to excess can make code unintelligible. So I do find the approach in Real World Ocaml easier to grasp.

Ok, I see. For me existential are not difficult to grasp, and I find it’s just a trivial use case of GADT (the simplest one).

But, in the OP case, if you just use variants holdings records, how do you add a new shape? All the solution proposed to solve this question, except mine, are absolutely anti-modular by design: the complexity of the shape components is dispatched on all the compilation units. With mine you can remove the shape.ml compilation unit, that won’t change anything for the Point, Rectangle and Circle ones. You just have to add this file to the existing code base (the one with the simplest version for these modules) and there is a new shape component in the system. Among all the proposed solutions, I don’t see any other one with this property.

For programming in small, I will use plain old algebraic data types. For public libraries and large programs designed for change, I will use the dependency inversion principle and make sure that my high-level policy code (e.g., drawing facilities) doesn’t depend on the low-level implementation details.

Large vs. Small

First of all, let’s define what is programming in large and what is in small. The truth is that there is no definite answer, you have to develop some taste to understand where you should just stick with plain ADT (PADT) or where to unpack the heavy artillery of GADT and final tagless styles. There is, however, a rule of thumb that I have developed and which might be useful to you as well.

ADT is the detail of implementation

ADT shall never go into the public interface. Both PADT and GADT are details of implementation and should be hidden inside the module and never reach the mli file, at least the mli of a publically available code (i.e., a library that you plan to distribute and for which you install cmi/mli files). Exposing your data definitions is much like showing your public member values in C++ or Java.

The less the extent of an ADT in your codebase the easier it will be to maintain it. Indeed, notice even how hard and ugly it is to work in OCaml with ADT defined in other modules. So even the language resists this. And if the language resists then don’t force it.

With that said, 90% of your code should be small code with data types defined for the extent of a compilation unit, which, in general, should be less than 300-500 lines of code (ideally start with 100) and have a couple of internal modules. Basically, a compilation unit should have a size that easily fits into short-term memory.

Since programming in small is a more or less clear concept, let’s move forward.

Programming In Large

So you’re designing a large project, possibly with a public API, that will help lots of developers with different backgrounds and skills. And as always it should be delivered next week, well at least the minimal viable product. The upper management is stressing you but you still don’t want to mess things up and be cursed by the generations of software developers that will use your product.

One of the killer features of OCaml, and the main reason why I have switched to it a long time ago and use it since then, is that it ideally fits the above-described use case. It fully supports the programming in large style (like Java and Ada) but enables at the same easy prototyping and delivering MVPs the next day your manager asks (like Python). And if you have young students in your team, their enthusiasm will not leave the boundaries of the module abstraction. And if you are the manager you have excellent language (the language of mli files) that you can use to convey your design ideas to other team members and even to non-technical personnel.

So let’s do some prototyping and design, using the language of signatures. First of all, we shall decide which type should be designed for change and which should be regular data types. E.g., we probably don’t want to have a string data type with pluggable implementation. But we definitely know that our figures will change and more will be added, possibly by 3rd-party developers.

We finally decide that Point and Canvas will be the regular types (for Canvas we might regret our design decision). For prototyping, we will write the module type of our project. At the same time, we should also write documentation for each module and its items (functions, types, classes, etc). Writing documentation is an important design procedure. If you can’t describe in plain English what a module is doing then probably it is a wrong abstraction. But now, for the sake of brevity and lack of time we will skip this important step and unroll the whole design right away.

module type Project = sig

  module Point : sig
    type t
    val create : int -> int -> t
    val x : t -> int
    val y : t -> int
    val show : t -> string
  end

  module Canvas : sig
    type t
    val empty : t
    val text : t -> Point.t -> string -> unit
  end

  module Widget : sig
    type t
    val create : ('a -> Canvas.t -> unit) -> 'a -> t
    val draw : t -> Canvas.t -> unit
  end

  module type Figure = sig
    type t
    val widget : t -> Widget.t
  end

  module Rectangle : sig
    include Figure
    val create : Point.t -> Point.t -> t
  end

  module Circle : sig
    include Figure
    val create : Point.t -> int -> t
  end
end

Let’s discuss it a bit. The Canvas and Point types are pretty obvious, in fact this design just assumes that they are already provided by the third-party libraries, so that we can focus on our figures.

Now the Widget types. Following the dependency inversion principle, we decided to make our rendering layer independent of the particular implementation of the things that will populate it. Therefore we created an abstraction of a drawable entity with a rather weak definition of the abstraction, i.e., a drawable is anything that implements ('a -> Canvas.t -> unit) method. We will later muse on how we will extend this abstraction without breaking good relationships with colleagues.

Another point of view on Widget is that it defines a Drawable type class and that Widget.create is defining an instance of that class, so we could even choose a different naming for that, e.g.,

module Draw : sig 
  type t
  val instance : ('a -> Canvas.t -> unit) -> 'a -> t
  val render : t -> Canvas.t -> unit
end

The particular choice depends on the mindset of your team, but I bet that the Widget abstraction would fit better more people.

But let’s go back to our figures. So far we only decided that a figure is any type t that defines val widget : t -> Widget.t. We can view this from the type classes standpoint as that the figure class is an instance of the widget class. Or we can invoke Curry-Howard isomorphism and notice that val widget : t -> Widget.t is a theorem that states that every figure is a widget.

The rest should be trivial with these signatures, so let’s move forward and develop MVP,

module Prototype : Project = struct

  module Point = struct
    type t = {x : int; y : int}
    let create x y = {x; y}
    let x {x} = x
    let y {y} = y
    let show {x; y} = "(" ^ string_of_int x ^ ", " ^ string_of_int y ^ ")"
  end

  module Canvas = struct
    type t = unit
    let empty = ()
    let text _canvas _position = print_endline
  end

  module Widget = struct
    type t = Widget : {
        draw : 'obj -> Canvas.t -> unit;
        self : 'obj;
      } -> t

    let create draw self = Widget {self; draw}
    let draw (Widget {draw; self}) canvas = draw self canvas
  end

  module type Figure = sig
    type t
    val widget : t -> Widget.t
  end

  module Rectangle = struct
    type t = {ll : Point.t; ur : Point.t}
    let create ll ur = {ll; ur}
    let widget = Widget.create @@ fun {ll} canvas ->
      Canvas.text canvas ll "rectangle"
  end

  module Circle = struct
    type t = {p : Point.t; r : int}
    let create p r = {p; r}
    let widget = Widget.create @@ fun {p} canvas ->
        Canvas.text canvas p "circle"
  end
end

Et voila, we can show it to our boss and have some coffee. But let’s look into the implementation details to learn some new tricks. We decided to encode widget as an existential data type, no surprises here as abstract types have existential type and our widget is an abstract type. What is existential you might ask (even after reading the paper), well in OCaml it is a GADT that captures one or more type variables, e.g., here we have 'obj type variable that is not bound (quantified) on the type level, but is left hidden inside the type. You can think of existential as closures on the type level.

This approach enables us to have widgets of any types and, moreover, develop widgets totally independently and even load them as plugins without having to recompile our main project.

Of course, using GADT as encoding for abstract type is not the only choice. It even has its drawbacks, like we can’t serialize/deserialize them directly (though probably we shouldn’t) it has some small overhead.

There are other options that are feasible. For example, for a widget type, it is quite logical to stick with the featherweight design pattern and represent it as an integer, and store the table of methods in the external hash table.

We might also need more than one method to implement the widget class, which we can pack as modules and store in the existential or in an external hash table. Using module types enables us gradual upgrade of our interfaces and build hierarchies of widgets if we need.

When we will develop more abstract types (type classes), like the Geometry class that will calculate area and bounding rectangles for our figures, we might notice some commonalities in the implementation. We might even choose to implement dynamic typing so that we can have the common representation for all abstract types and even type casting operators, e.g., this is where we might end up several years later.

module Widget : sig
  type 'cls t

  module type S = sig
    type t
    ...
  end

  type 'a widget = (module S with type t = 'a)

  val create : 'a widget -> 'a -> 'a cls t

  val forget : 'a t -> unit t
  val refine : 'a Class.t -> unit t -> 'a cls t option
  val figure : 'a t -> 'a
end

We’re now using a module type to define the abstraction of widget, we probably even have a full hierarchy of module types to give the widget implementors more freedom and to preserve backward compatibility and good relationships. We also keep the original type in the widget so that we can recover it back using the figure function. yes, we resisted this design decision, because it is in fact downcasting, but our clients insisted on it. And yes, we implemented dynamic types, so that we can upcast all widgets to the base class unit Widget.t using forget, but we can still recover the original type (downcast) with refine, which is, obviously, a non-total function.

In BAP, we ended up having all these features as we represent complex data types (machine instructions and expressions). We represent instructions as lightweight integers with all related information stored in the knowledge base. We use dynamic typing together with final tagless style to build our programs and ensure their well-formedness, and we use our typeclass approach a lot, to enable serialization, inspection, and ordering (we use domains for all our data type). You can read more about our knowledge base and even peek into the implementation of it. And we have a large library of signatures that define our abstract types, such as semantics, values, and programs. In the end, our design allows us to extend our abstractions without breaking backward compatibility and to add new operations or new representations without even having to rebuild the library or the main executable. But this is a completely different story that doesn’t really fit into this posting.

Conclusions

We can easily see that our design makes it easy to add new behaviors and even extend the existing one. It also provisions for DRY as we can write generic algorithms for widgets that are totally independent of the underlying implementation. We have a place to grow and an option to completely overhaul our inner representation without breaking any existing code. For example, we can switch from a fat GADT representation of a widget to a featherweight pattern and nobody will notice anything, except (hopefully) improved performance. With that said, I have to conclude as it already took too much time. I am ready for the questions if you have any.

14 Likes

No question, just want to thank you for this fully detailed explanation that seems clear to me, especially the type directed design and its evolution over time.

That’s exactly what I had in mind when I designed the to_shape constructor family:

val to_shape : (module Shape with type t = 'a) -> 'a -> shape

It’s a theorem that states that any member of a Shape type classes is a shape (that’s indeed the mere definition of a shape).

The only thing I wanted to add is that you’re starting to play with fire when you’re ended there :slight_smile:

1 Like