What is the "right" way to add constraints on a type, to handle recursive structures with variants and to combine fragments of types?

I believe (and I’m quite sure) that the first problem, for/as a business architect (or whatever object architect) , is to define a “good” enough model (why, what, how) abstracted from any technology (PL, SW and HW architecture).
That’s how people see their world and work on on a daily basis, with often a heavy legacy and also a desire to improve or innovate.
Then, sooner or later, comes the discussion with the software architect (who knows how to choose a PL and the related developers), assuming here usual HW is enough.

In this context, this topic can be refined as follows:

  1. How to (correctly) encode in OCaml such a “biz” model (which is quite generic).
  2. Where to modify the model, if admitted by the biz architect, to take advantages of some powerful OCaml language features.

If a end user wants/demands some strange business rules enforced about weirdly asserted objects (that are obviously biased representations), a business architect will have little space to improve the model (he will do more next time, if still possible…).

I clarify again the context to avoid a great and passionate debate about FP vs. OOP and other PL.
It’s just about correctly using OCaml.

From now, I’ve been sufficiently working with OCaml to assert that OCaml is of great value (from my standpoint that should be shared at discuss.ocaml.org), and that I’ve chosen it to do my SW stuff.

So for this topic, let’s assume that there is a quite good model of real world as a starting point.
It’s all (and always) about things and their relations.
So ADT should do the job. As discussed before, invariants can be expressed and implemented in functions that enforce them (whether it’s nested in the function code or isolated as values used by function(s) is a matter of choice driven by modularity.

One side (and non blocking) point is the correct way to document the OCaml code about invariants (How should it be documented? With comments in the structure? With comments in the signature? Or as values that express invariants in the signature and the structure ? Obviously, all “biz” rules should be known and made visible for both end users and SW people ) This scoped point will be discussed later or as another topic.

The module system or the object system look great and I don’t see any reason so far to prefer one over the other. Once the (modules or classes/objects) structure is established, they both use functions which is the key feature of OCaml. They can both benefits from functors and of advanced language features. Maybe a new topic should be opened about When to prefer object or module Ocaml system?

One question raised in the example I’m currently working on will be:
How to make a SW decision about OCaml modules and the related main needs to create OCaml code in order to traverse all the trees to add/update/delete values? (which is a very generic need)

A topic already exists (about objects per se rather than objects vs. modules) at Objects use cases in OCaml

I don’t think it can be the case that your module exposes the polymorphic variants, if you applied your module signature. You don’t need to put explicit types on your functions, but as polymorphic variants works in different ways than ordinary variants, it is often a good idea to put explicit types on your functions. Polymorphic variants has structural subtyping semantics.

I think you should get some of the answers you are after from the ocaml manual and good books on ocaml; seems like you still need to get to know the semantics of basic stuff.
Have a look at Real World OCaml (even though that uses a different standard library than what comes with ocaml) e.g. https://v1.realworldocaml.org/v1/en/html/variants.html#polymorphic-variants

I also tried with simple variants. Based on the previous code, it was straightforward:

module type AUTOMATON = sig
type t
val create_auto : string -> string -> t
val add_st : t -> string -> string -> t

module Automaton: AUTOMATON = struct
  type t = string * region
  and region = Region of ( string 
                           * state list
                           * transition list )
  and state = State of string
                       * region list
  and transition = Transition of { name: string;
                                   source: state;
                                   target: state;
                                   event: string;
                                   guard: string

let create_auto an rn = an, Region (rn, [], [])

let add_st (a:t) rn sn :t =
let an,r = a in
let rn0, sl, tl = match r with Region u -> u in 
match rn0 = rn with
true ->  an, Region (rn, State(sn, [])::sl, tl )
| false -> failwith ("No such region "^rn^" in automaton of name "^an)

I could take benefit of inlined records.
I could also define only one mutually recursive variant type definition. As, on the other hand, the same type definition with polymorphic variants raised an Error (whether type t is placed in first or last position):

Characters 318-324:
* [ | region 
Error: The type constructor region
is not yet completely defined


Code for one polymorphic mutually recursive type definition
type region = [ | `Region of (string 
                  * state list
                  * transition list )
and state = [ | `State of string
              * region list
and transition = [ | `Transition of transition_data
and transition_data = { name: string;
                        source: state;
                        target: state;
                        event: string;
                        guard: string
and t = string (* automaton name *) 
        * [ | region 

Do you know if there is a way to get only one type definition with polymorphic variants? (if we consider it should be nice to have).
And what is the advantage of using polymorphic variants instead of simple variants? (beyond “we can do it”).

type to_include = [ `Include ]
type t = [ 
  | to_include 

This means that the cases of to_include is included in type t. I did this, because this is what your first variant definition did in example 2. When you write and between your types, you state that your types can recursively defined. Type t shouldn’t be included in the recursion.

The cool thing about polymorphic variants is primarily the structural subtyping. Where you can narrow the type with pattern-matching, and widen the type in expressions. This also depends on variance of e.g. functions, so sometimes you need type coercion. So you loose some type inference in the trade for structural subtyping.

Edit: You only ‘loose’ type inference in the sense that it’s a good idea to put explicit types on things in the prescence of subtyping, to get the errors that you want. Plus, because it’s neccesary to put explicit types on stuff where you want to say to ocaml that the variants can only be exactly one specific set of variants. Or in the case where you want to ‘open up’ a polymorphic variant type that was previously closed.

It will be difficult to answer this question: it is too large and it really depends on which kind of relations and constraints you want to express.

I don’t know anything about UML, but what I understand from the schema you gave:

is that you have a concept (actor) that is subdivided according to two distinct point of view. As I previously said, the words concept and type are synonyms. Here the first subdivision claims that physical person and company are subtypes of the actor type, and that this is a partition of this type.

So in this case, what you name class is what I name type and the relation you are interested in is the subtyping relation. But without knowing the definitions and the characteristics of these concepts, it will be difficult to tell you how we can encode this in OCaml.

As a side note:
I don’t agree with the meaning of this UML model because it is mainly a way of seeing things.
I will never write such a thing where “Actor” is put a superclass. Instead I will try to understand what is the essence of a Person (physical and legal/company) and define a generic Person with common attached properties such as its/his address.
In parallel, the Inner/External classification is purely a choice in organization. An external actor can become internal in the case of a merging (supplier is now inner…). Think about arbitrary inter-extra-intranet classification.
The OO modularity was featured here assuming all attributes and operations and relations of the Actor class are inherited by its sub classes.

Disregarding its meaning, this UML example, could be one example about how to encode it in OCaml, obviously with variants.
What about the following?

module Actor = struct
type person = Physical_person 
            | Company (* Legal_person *)

type inex = Inner_actor 
          | External_actor
type actor = { person: person;
               kind:   inex;

Instead, I would prefer to create a module Person and a module Address. And I can define in the module Person that a person has zero to N addresses if a physical person, and 1 to N addresses if a legal person.

module Person = struct
type person = Physical_person 
            | Company (* Legal_person *)

type role = Role_A 
          | Role_b
          | Role_c
          (* |  etc. 
             Roles can be defined as a combination of elementary roles. *)

type actor = { person:  person;
               role:    role
               address:  ( string * bool * Address.t ) list 
               (* use (shipping, invoicing), active, addresses *)

(* Module invariants (enforced by functions): 
   - a physical person has 0..N addresses
   - a company has 1..N addresses (if it wants to have a legal existence) *)

module Address = struct
type t = { (* Some generic address definition *) }

This is not very important to me, I’ve never used UML. What I wanted to do is to make some precision about the terminology we use. When you write such a thing:

Here the word classes seems similar to me to what a type is. For instance when we say:

  • All animals are mortal
  • All men are animals
  • So, all men are mortals

the concepts animal, mortal and men are types, and this syllogistic rule only says that the subtyping relation is transitive. Usually the subtyping relation is noted <: and we can translate this previous reasoning to:

  • animals <: mortal
  • men <: animals
  • so, men <: mortal

And if we are interested in values that belong to the type men, we can say:

  • men <: mortal (all men are mortal)
  • Socrates : men (Socrates is a men)
  • so, Socrates : mortal (Socrates is mortal)

here, the colon : is exactly the same we use in type annotation, for instance 1 : int (1 is an int).

Therefore, in OCaml you have types that denote concepts and you use functions to express relations and operations between them.

For this particular UML example, you can for instance define a module for addresses, one for person (physical person) and one for companies with all those modules that enforce invariants on your types. And then you can define your actor type this way:

type inex = Inner_actor | External_actor
type role = Role_a | Role_b (* | etc. *)

type actor =
  | Person of Person.t * role * inex
  | Company of Company.t * role * inex
1 Like

Roger, pardon my asking this, but why do you want to use OCaml when it seems you really want a different language? There is no shortage of programming languages and no shame in using one that suits the way you prefer to think about the world. If you want to use OCaml, then use OCaml. If you want a boat, don’t ask people how to make your car float by strapping plastic bags to the bottom, how to hook a sail to the roof rack, and how to use the wheels as rudders. Just get a boat instead. It’s clear you don’t want to think about programming the way OCaml encourages, and there’s no shame in this, but there is no point in trying to use a screwdriver as a an egg whisk.

The problem with classes (in any language) is that they mix and confuse implementation with abstraction. Basically, when you use the inheritance relation you’re invoking two orthogonal things at once, the inheritance of implementation (aka subclassing) and the refinement of type/interface (aka subtyping).

These are completely different things, that should be addressed differently. The implementation inheritance enables code reuse, and perhaps is the worst possible way to provide the code use, hence it is not even recommended in Java/C++ and other OO-focused languages. Fortunately, OCaml provides lots of options for code reuse, that includes first class functions, modules, functors, all powered with parametric polymorphism with principal and automatic type inference.

The refinement of a type is what you’re actually want to model (as code reuse is the implementation detail). When people say that we think in terms of objects, like that a bird is an animal, it doesn’t really mean that we think that birds inherit the implementation of animals and that there is a locomotion method implemented in all animals which is overridden by different implementations. No, we think in terms of phenomena which we observe and generalize into concepts. So that we can see that different objects and processes may exhibit common behaviors, so we have an opportunity to generalize them by creating an abstraction. That brings us to the next level of understanding on which we think already in terms of abstractions, which we can generalize even further, and so on until we reach the level of category theory, where we are no longer constrained with understanding :slight_smile:.

This modus of operandi is modeled in OCaml extremely well, as we have signatures to model different concepts, e.g., for example, we may observe that all animals are concrete objects which come in and out of existence, and that each Animal is an individual which we can somehow identify, so we model it with the signature that defines the set of operations that we believe, in domain of discourse of our model, will abstract an animal correctly:

module type Animal = sig 
   type t 
   val create : id -> t
   val id : t -> id
   val age : t -> int
   val is_alive : t -> bool

We can see, als, that some Animals are flying. In our model, we may decide that is worthwhile to distinguish them from other animals, e.g.,

module type Flying = sig 
    include Animal
    val fly : position -> unit world

So, we can see that Flying is an Animal, no matter the implementation or other details - we can always expect that any Flying Animal will exhibit the behavior of an animal (wrt to our model, again).

Once you will start implementing your hierarchies you will be happily surprised that OCaml will provide you with means to reuse the implementation. Like you may actually decide to reuse the Animal data type for all your animals (indeed, there is no need to reimplement the animal bookeeping in each module implementation). The good thing is that your choice of implementation details wouldn’t affect your hierarchies as they are completely orthogonal.

As a final note, OCaml uses the word type for whatever is called datatype in SML, data in Haskell, struct or class in C/C++ or Java, for a thing that describes a data type, i.e., the concrete implementation. There is nothing wrong with it, except that it brings confusion between types and abstractions, which usually deemed equal. In OCaml, abstractions are modeled with module types. So type animal is a concrete implementation (which could be abstracted by its signature, where the signature is the abstraction). Hence, when you have a UML diagram that models some entities, then those entities should map to module types, and operations, along with their constraints (in OCL) should map to functions and their types. So classes in UML are module types in OCaml.


You are welcome.
Obviously I didn’t use correct words to express clearly enough my ideas and requirements.
Let me pls. give you more information.
I first had a very sweet experience with simple OCaml code, and I immediately loved that PL. Then I had a difficult experience with the OCaml global documentation, where old and recent resources are mixed up with no indication (several build systems, text editor setup was not straight forward despite RWO setup files (or with other files), confusion with various undocumented stdlib replacement, an interesting quantity of packages but that requires to spend time to understand how to use them,…).
Eventually, I was happy with the OCaml refman that appears to newcomers as “outdated” whereas I think it looks like a scientific publication (and plain text search in the pdf or html is fine). Each time I spend some time in calmly reading this refman, I discover subtle information. Reading the BNF description of OCaml appeared to me of great value. There is a lack of examples. But I found that within various lectures and tutorials from US Universities (Cornell, UChicago, etc.) and from ocaml.org. RWO was interesting but it’s talking about too many things at the same time, telling how to do something in various ways, introducing all concepts and techniques, and telling at the same time not to use advanced features because it’s risky or whatsoever. So RWO was useful in my experience but it is designed to be read, then experience and then read again to really understand all of it (from a newcomer to OCaml standpoint).
Today, I can write OCaml code following the recommended way of doing (signature first with abstract type, then structure) and it compiles. I know how to fix errors.
Compared with all mainstream PL that I know more or less directly, or through discussions with other people that describe their nightmare, I’m very happy to look at some nice OCaml code I could write by myself without headaches (some time is just necessary to let things go through my mind). Pattern matching, tail recursion, options,… : all is concise and it’s easy to read or to get an overview of what is globally going on.
For most of my simple SW needs, OCaml is now the PL of choice.
When it comes to write a larger program, a very usual question is raised:
How could I reuse my code? For filtering values against pre-conditions, traversing trees thanks to pattern-matching, in order to finally “just do a simple side-effect action or a functional update” (create or update a record, a record field value,…). This is a lot of work and writing always the same things within a different context is time-consuming, error-prone (even if the type-checker tells us what is wrong, fortunately) and quite boring.

While answering to @ivg, I will try to describe in more details, from where I am (i.e. I can write OCaml code as a very common programmer - not a beginner and not an expert) how to use OCaml and its tools to be able to reuse as much and easily as possible existing code (signature and structure) to simplify implementation (I’m very confident in using the module types to specify rather complex models).
I believe this goal requires to understand advanced existing OCaml features and it is not my case.
BTW, I don’t know how you will receive the following, but pls. feel free to let me know:
by using metaphors (boat vs. car, screwdriver vs. egg whisk), you introduce a well-known bias that leads you inevitably to the (predefined) conclusion that OCaml doesn’t fit my needs, and then I should even go away and choose another PL.
Based on this discussion, I will try as much as possible to clarify my needs and requirements.
Obviously, I don’t want to be a (happy) OCaml programmer that simply writes the same “type of code” again and again, thinking that I’ll study advanced OCaml features when I have time or a big problem to solve. Instead, because I know how to code in “regular OCaml”, I want to understand all the power of OCaml as soon as possible.
I hope this discussion will be helpful for people with the same goals.

1 Like

Well, I guess “writing the same type of code again and again” leads to boredom or happiness depending on whether the writing takes a lot of time or a very short time.

If copy-and-paste were all that’s needed for code re-use, there would have been no need to invent OO.

In my view, one of the virtues of OCaml is that it has few “advanced features” and an easy learning curve, contray to other programming languages whose easy features are limited in usefulness.

One thing that is important (to me and to some people like @perry; see discussion about Base/Core where catching exception vs. returning option type was evoked) is to reduce as much as possible and ideally eliminate the need to copy-paste code and instead reuse existing modules/libs.
Because it will always happen that you are in a hurry, and that some lib or build system or editor problem will happen at the same time while doing copy-paste of code. Eventually, I believe that you will get the result that you aim at but with pain and cost.
To be more concrete, can you describe more in details a use case where copy-paste is the solution and where it is not ?

@ivg That is a really neat way of putting it. Can I quote this on other forums? And can you tell me whether this is an accurate characterization of what you said?

When I think about inheritance hierarchies and relationships between classes in OO, I’m actually confusing the means and ends. What I really want is to be able to generalize phenomena, which means code reuse, but I get fixated on object modeling instead. Sometimes they work and produce the right abstraction, but often things get muddled.

Taking this to the extreme, it is always best to think of entities in complete isolation - avoid exercises in ontology as much as possible. Except when you want code reuse - then work backwards in how types may relate to each other based on what code we might want to reuse.

This is easier in OCaml because we have many ways of reusing code: functions can be polymorphic, modules can be included in other modules, and we can even parameterize modules with functors.

Terribly recorded video, but great content. Evan walks us through a user interface that has a couple of form elements with similar-looking behavior. Our first instinct would be abstract them, but he shows why and how things that look same could be deeply different. It is a shame I don’t remember the specifics, but one thing I took away was to feel ok and leave repetitive code when they don’t abstract well - it could be the characteristic of the domain itself rather than a failure in programming.

1 Like

Sure, feel free)

That is not what I was trying to say, but it is a plausible way of thinking, at some meta level of thinking. I will return to the code reuse a little bit later. I didn’t have time to write a short answer, so I wrote a long one instead :slight_smile:

The idea of modeling is that we’re trying to build a mathematical model that suits our needs and represents objects in the universe in some expected way. Than enables mathematical reasoning and proving and verification of certain properties of the model. If the PL that we’re using allows it (and OCaml does) then an implementation of the model could be verified by the type checker.

When a system is designed the designer creates a model in some high-level specification language (sometimes just natural language), that basically says what entities exist, how do they interact, what properties they exhibit, and how they expected to be used. This model is used to communicate ideas between developers, to devise tests, etc, etc.

The model has nothing to do with the implementation, besides the fact, the implementation should match the specification of the model. This could be verified by testing. In OCaml, it could be also verified statically, to some extent, by the type checker.

Even more, modeling has nothing to do with the code reuse. Your implementation may use copy-pasting, or whatever method to do this. Trying to capture in the model implementation details is a bad idea, that will only frustrate the developers (not to say testers).

Now back to the code reuse. As a programmer, you interact with programs. And you may observe that some programs exhibit the same behavior so that we can generalize. But when you see this common behavior, you, in fact, are operating in the knowledge domain of programs, not in the domain of what your program is actually modeling. In other words, an object of your observation is a program. Not an object of a model which this program is trying to represent. For example, cars usually have a collection of N equal wheels. At the same time, a year consists of N equal days. However, there is nothing in common between class Year and class Car although they can definitely reuse some common code. Neither it would be a good idea to derive one or another from a container.

This brings us to an idea, that code reuse is an entity in the domain of a program analysis. The domain in which we reason about programs and their behavior. It is a worthwhile domain, where we study algorithms and data. In this domain, we have various mechanisms with variable availability in different languages, thus, in general, we have to design this in terms of a language. For example, in Java, we might use a visitor, in places where in OCaml we will use ADT or a first-class function. However, let’s try to think about code reuse without sticking to a particular language or paradigm. When you have several programs that behave mostly the same but differ in some detail, you may substitute these several programs with one program, by abstracting away the differences and creating a general program, that doesn’t depend on those details. Notice, that abstraction and generalization are dual concepts that always come together. In order to generalize something so that it could be applied to more than one thing, you have to create an abstraction that will describe the set of applicable things. For example, if we want to generalize a summation of a list of entities, we have to provide an abstraction of a sum of entities:

let rec sum add xs = 
  | [] -> raise Not_found
  | x :: xs -> List.fold_left add x xs 

We might see that our abstraction is insufficient as it doesn’t handle the empty sum case. So, in fact, we need some sentinel that will denote the empty, sum. Usually, it is called zero, e.g.,

let rec sum zero add xs = List.fold_left add zero xs

The abstraction that we just devised is called monoid, and there are quite a few algorithms that we can generalize using the monoid abstraction. In fact, product is a sum in monoid where zero is one and add is the multiplication.

In OCaml, we can create abstractions by … abstraction, aka creating a function. Surprisingly, not all languages provide these facilities. This is possible in OCaml due to the parametric polymorphism. Indeed, our sum function is polymorphic, as it could be used in different typing contexts. Or in other words, applied to values of different types, e.g., ints, floats, strings, sets of strings, etc. Not all abstractions actually require polymorphism, as sometimes builtin types of a language provide sufficient abstraction, e.g., you can model plenty of things with int. But in general, abstraction involved a typing. Basically, an abstraction denotes a set of operations applicable to some enitity. For example, monoid is described as

module type monoid = sig 
    type t 
    val add : t -> t -> t
    val zero : t

alternatively, you can just represent it as a pair of arguments to the function (as we did above), or as a record

type 'a monoid = {
    add : 'a -> 'a -> 'a;
    zero : 'a

or even as a class or object type.

In classical OOP languages, the only way to specify abstractions were classes, e.g.,

struct monoid {
    monoid zero() = 0;
    monoid add(monoid, monoid) = 0;

monoid sum(monoid, list<monoid> elts);

Unfortunately, OOP started to confuse abstraction with generalization from the first days of existence, thus they started to implement generic also using classes, e.g.,

struct base {
    base zero() = 0;
    base add(base, base) = 0;

struct monoid : base {
    monoid sum(list<monoid>);

struct Int : monoid {
    Int(int init) : value(init) {}
    Int zero() {return Int(0); }
    Int add(Int x, Int y) { return Int(x.value + y.value);}
    int value;

This all started to bring havoc, as soon as abstractions happen to become more complex. As now we started to have a problem - that subclassing is not subtyping, deal with variances and covariances and other notions that we should have to deal with. Fortunately, in OCaml we can use simple abstractions, or their big brothers functors (which are abstractions on structure level) to deal with it.

To summarize, code reuse should be applied to your code, not to the domain that you’re modeling. I.e., entities of the code reuse are not animals or plants, but programs and data. Albeit both domains are important, they should not be confused. And in the domain of programs, there are quite a few open question about how we can write programs better.


Thanks, the long post was a treat to read. Learned and clarified quite a few things.

In which way do you mean this? I don’t see Flying as an Animal any more than Animal is an InanimateObject:

module type InanimateObject = sig
  type t
  type id = int
  val create : id -> t
  val id : t -> id

Also, as it’s then only a structural relation between modules, and not a semantic one - then one cannot expect anything about the behaviour of a module that fits an interface. Expectation will only be possible by cultural norms - e.g. one expects that (using package containers as example) module M = CCMap.Make(CCInt) returns a 'a Map.t that does what one expects.
In OO, the semantic relationship between classes is more strictly encoded by the named subtyping hierarchy, than by this.

As @RogerT were talking about constraining data into being correctly structured, there is another point about interfaces worth discussing. I don’t think that module interfaces helps much to restrict wrong data from being formed, when looking at the intrinsic structure of an algorithm. Interfaces are more useful to help creating correct interactions with some set of types + functions over those types, what one could call extrinsic constraints, wrt. the set of algorithms and possible data. So it seems to me as if there are several levels one wants to solve these problems at, in OCaml at least.

This is wrt to some hypothetical model that we’re trying to represent in OCaml, e.g., if in our abstract model we say that Flying is an Animal that flies, then we reify it into the presented module types. I’m not trying to build the model of the observed world here. I’m just picking those two concepts as an example. Also, a model couldn’t be right or not. Model either works for a particular task or doesn’t. Like Newton’s model of the world is completely wrong, but works fine for many of us.

Modules describe abstractions. How many details do you need to describe an abstraction of your domain depends on your taste, appetite, and of course on your tools. OCaml provides propositional logic that covers most of the needs. You may ask for more powerful logic and either use language with more powerful type systems, like Coq and F*, or stick to testing. OCaml, in that sense, provides a nice middle ground, it provides a powerfull but still decidable (i.e., with automatic type inference) system to specify your constraints, as well as even more powerful higher-order logic which can express arbitrary constraints, which you, might or might not be able to check, and can either ignore/rely on tests/or use GADT to prove it.

Again, these are two different topics: how to represent a given model in OCaml, and how to check that the module implementation satisfies the module signature. Example, you can represent a non-empty list with total hd : 'a list -> 'a function using private or abstract type list, or you can indeed represent it using Church numerals encoded in GADT to represent the length. In the first case, you have to manually check that all functions in the module implementation preserve the invariants, in the second case this would be done by the compiler in the expense of very heavy implementation. The question that @RogerT asks is actually about abstractions, while the solution is applied to the implementation. Hence it doesn’t work. That’s, in simple words, what I’m trying to say.