What is the "right" way to add constraints on a type, to handle recursive structures with variants and to combine fragments of types?

I think I’m not using OCaml the best way, or the right way

On one hand, now I can write many expressions with all OCaml constructs to make a program that compiles.

On the other hand, I feel often weird because the OCaml expressions do the job but with apparently too much verbosity. And any modification regarding instances of types and relations between them add even more verbosity.
Or the type definitions are insufficiently constrained, and the required constraints should be injected by the functions in order to create an object of the desired “constrained type”.
It’s ok with a program of small size. But I fear it becomes very tricky with a program of medium or large size (in addition, we have to manage the consistency between the signature and the structure of many nested modules). Maybe there is a way to handle that.

What is the recommended way to handle constraints on types? (see example 1)
How to handle constrained recursive structures with variants? (see example 2)
And how to (simply) factorize and combine fragments of types? (which are knowledge fragments) (see examples 1 and 3)

Let’s see through examples.
At the very beginning, this knowledge can be expressed in terms of predicates, possibly grouped by subject (person, company, etc) and that shapes OCaml constructs such as a record for a person or a company:

a person is a human
a person has a name
a person as an ID card
a person has 1 spouse

a company has…
a company has…

How (or where) to add constraints to/in a OCaml type definition to get all predicates applied?

Example 1

type tree = Node of int * tree * tree
         | Leaf of int
let tree_constraint_NL = "The int value of a Node is in the range [1; 100]. 
                      AND The int value of a Leaf must be a prime number."

let t1 = Node (1, Node (1000, Leaf 3, Leaf 5), Node (3, Leaf 7, Leaf 16 )  (* ILLEGAL VALUES *)
val t1 : tree =   Node (1, Node (1000, Leaf 3, Leaf 5), Node (3, Leaf 7, Leaf 16)) 

The only place I see is in a helper function that enforces this constraint when it creates an instance of type tree. But it 's starting to embed knowledge in some piece of code whereas I would like to get this knowledge in a central place that should be types or “extended types” or “typed types”.
What is the recommended way?

Example 2
A (simple) state machine is composed of one or more regions that hold several states and transitions.
A state can hold one or more region. A transition has a pair of relations: one with a state as source and one with a different state as a target.
A transition is done when an event is received in a certain state and if a guard condition is satisfied (only when the even is received).
The two following expressions are expressing what I want but obviously don’t compile (Syntax error):

type automaton = Region of ( string * State list * Transition list ) list 
               | State of string * Region list (* Recursivity *)
               | Transition of {name: string; source: State; target: State; event: string; guard: string } (* NO *)
(* or *)
type automaton = Region of ( string * automaton:State list * automaton:Transition list ) list
               | State of string * automaton:Region list (* Recursivity *)
               | Transition of  {name: string; source: automaton:State; target: automaton:State; event: string; guard: string } (* NO *)

The following expresses the recursivity and compiles:

type automaton = Region of ( string * automaton list * automaton list ) list
| State of string * automaton list (* Recursivity *)
| Transition of {name: string; source: automaton; target: automaton; event: string; guard: string }

but now it’s also too much expressive/insufficiently constrained.
Again, I have the knowledge that a Region is composed of States and Transitions.
How can I apply that knowledge at the type level?

Finally, while writing this post, I remembered mutually recursive records as follows:

type state = { st_name: string;
               regions: region list}

and transition = { tr_name: string; 
                   source:  state; 
                   target:  state;
                   event:   string }

and region = { reg_name:     string; 
               states:       state list; 
               transitions:  transition list }

and automaton = { auto_name: string;
                  region:    region }
(* unique field names to avoid compiler warnings *)

# let a =
    let s1 = { st_name = "s1"; regions= [] } in 
    let s2 = { st_name = "s2"; regions= [] } in
    let s3 = { st_name = "s3"; regions= [] } in 
    let s4 = { st_name = "s4"; regions= [] } in
    let t1 = { tr_name = "t1"; source = s1; target = s2; event = "e1" } in
    let t2 = { tr_name = "t2"; source = s2; target = s3; event = "e2" } in
    let t3 = { tr_name = "t3"; source = s3; target = s2; event = "e3" } in
    let t4 = { tr_name = "t4"; source = s3; target = s4; event = "e4" } in
      { auto_name = "a1"; 
        region = { reg_name = "r1";
                   states =      [ s1; s2; s3; s4 ];
                   transitions = [ t1; t2; t3; t4 ] }

val a : automaton =
  {auto_name = "a1";
   region =
    {reg_name = "r1";
     states =
      [{st_name = "s1"; regions = []}; {st_name = "s2"; regions = []};
       {st_name = "s3"; regions = []}; {st_name = "s4"; regions = []}];
     transitions =
      [{tr_name = "t1"; source = {st_name = "s1"; regions = []};
        target = {st_name = "s2"; regions = []}; event = "e1"};
       {tr_name = "t2"; source = {st_name = "s2"; regions = []};
        target = {st_name = "s3"; regions = []}; event = "e2"};
       {tr_name = "t3"; source = {st_name = "s3"; regions = []};
        target = {st_name = "s2"; regions = []}; event = "e3"};
       {tr_name = "t4"; source = {st_name = "s3"; regions = []};
        target = {st_name = "s4"; regions = []}; event = "e4"}]}}

EDIT (hidden): previous erroneous definition with let... and construct that doesn’t compile… because the order of evaluation is not guaranteed.

Erroneous `let... and` definition
# let s1 = {st_name = "s1"; regions= [] }
and   s2 = {st_name = "s2"; regions= [] }
and   s3 = {st_name = "s3"; regions= [] }
and   s4 = {st_name = "s4"; regions= [] }

and t1 = {tr_name = "t1"; source = s1; target = s2; event = "e1"}
and t2 = {tr_name = "t2"; source = s2; target = s3; event = "e2"}
and t3 = {tr_name = "t3"; source = s3; target = s2; event = "e3"}
and t4 = {tr_name = "t4"; source = s3; target = s4; event = "e4"}

and a = 
{auto_name = "a1"; 
region = {reg_name = "r1";
states = [ s1; s2; s3; s4 ];
transitions = [ t1; t2; t3; t4 ] }

Characters 201-203:
  and t1 = { tr_name = "t1"; source = s1; target = s2; event = "e1" }
Error: Unbound value s1

It looks like the job is done. It’s verbose but quite close to a serialized specification with minimum syntax as follows:

automaton a1
  region r1
    states s1 s2 s3 s4
    transitions t1 s1 s2
                t2 s2 s3
                t3 s3 s2
                t4 s3 s4

I would like not to get this useless stuff (for me) which is mandatory (for the compiler) such as “empty regions in states” and all the quotation marks "; it’s not a problem here because still human readable, but it can represent a lot of noise with complex structures and quickly become not human readable).
Is there a way to define optional record fields like in functions?
I only know:

type state = { st_name: string;
               regions: region list option}

which very slightly improves readability with {st_name = "s1"; regions= None} instead of {st_name = "s1"; regions= []}

And what about the variants “that are nearly the same as a BNF declarations and are supposed to be of great help for parsing languages” ? (the examples above are very simple languages).
How to use variants to get the same as with mutually recursive records?

Example 3
Let’s take a very classical example of a person: a record can handle all details (name, ID_car number, etc.) and his address, first erroneously considered as part of him.
But there can be several addresses: for invoice, for delivery, a permanent address or one during holidays. So we see that we should hold separately personal information and address information.
Moreover, in the context of an order, the same person has a client_ID and several order_ID.
Obviously we cannot use a monolithic record (or a class for that).
Of course, a product type will handle each set and another product type will combine those types.

(* fields simplified *)
type person = { name: string }

type address = { city: string}

type client = {client_ID: string;
               client_type: person;
               client_address: address 
type product = { product_id:    string;
                 product_name:  string;
                 product_price: string; }

type order = { order_ID:    string;
               client_ID:   string;
               supplier_ID: string;
               order_date:  string;
               order_items: (product * int ) list; } 
               (* Constraint: A product can only appear one time *)

If the client can also be a company, I can create a variant that will handle the two cases

type company = { name:         string;
                 legal_form:   string;
                 address:      address; (* Only ONE address *)
                 tel:          string;  (* Only ONE telephone number *)
                 contact_role: (person * string) list; (* [{name="John"}, "logistics"; {name="Bill"}, "accounting"] *)

type client = Person of {client_ID: string;
                         client_type: person;
                         client_address: address 
            | Company of {client_ID: string;
                         client_type: company;
                         client_address: address 

The structure is becoming more and more complex when combining types fragments with nested product (and sum) types
Is it a correct way/style to write OCaml code?
Is there a better and more efficient way to hold such fragments and to combine them?

What is the cleaner and more efficient way to:
1/ handle constraints on types
2/ handle recursive structures with variants
3/ combine fragments of types


1 Like

I don’t know about the right way specifically but would consider the following: resist the urge to enforce all constraints statically in the types. Especially when starting in a language with a powerful type system one is tempted to do that.

Type safety is great but there are limits and some things are easier checked dynamically. A compiler constructs an abstract syntax tree from the input but still needs to check a lot of constraints dynamically: is every identifier defined, are expressions well typed? So I think it very natural to use types to limit the shape data can take but also rely on predicates to enforce further invariants.



your automaton example is precisely a case where variants are not that useful. Variants represent various cases for a given type. But here every construct has only one case:

  • an automaton is a (toplevel) region
  • a region is a set of states ans transitions
  • etc.

So your final definition is adequate; no variants are really needed.

On the contrary, a tree is either a leaf or a node, hence you have two cases (variants) you give a name to (with “constructors”).

Finally, you want in most cases to ensure that your data complies to some given invariants. As @lindig said, you should refrain to ensure too much in the type structure itself as it becomes rapidly hard to manage. A good alternative is to rely on the “abstract type” machinery: you define a signature for your type that keeps it abstract, and then in the implementation, you ensure that your data is compliant with the invariant.

module type TREE = sig 
  type t  (* abstract! *)
  val node : int -> t -> t -> t 
  val leaf : int -> t
  (* other operations *)

module Tree : TREE = struct
  type t = Node of int * t * t | Leaf of int
  let node i left right = 
    if 1 <= i && i <= 100 then Node (i, left, right)
    else failwith "The int value of a Node is in the range [1; 100]."
  let leaf i = 
    if is_prime i then Leaf i
    else failwith "The int value of a Leaf must be a prime number."

  (* other operations; you must ensure they preserve the invariant *)

utop# open Tree;;
utop # let rec sum = function
  | Leaf i -> i
  | Node (i, left, right) -> i + sum left + sum right;;
Error: Unbound constructor Leaf

Now you can only construct trees by calling the “constructors” node and leaf. So, provided all other operations preserve the invariant, there’s no way for trees to falsify the invariant.

One drawback of this approach is that you can’t pattern-match anymore on trees (except in the Tree module itself). Fortunately, OCaml also provides private types which allow pattern matching but still forbid to use constructors to build some data. You just have to make the type t known but private:

module type TREE = sig 
  type t = private Node of int * t * t | Leaf of int  (* private *)
  val node : int -> t -> t -> t 
  val leaf : int -> t
  (* other operations *)
module Tree : TREE = struct
  (* as before *)

utop # open Tree;;
utop # let rec sum = function
| Leaf i -> i
| Node (i, left, right) -> i + sum left + sum right;;
val sum : Tree.t -> int = <fun>

(* but this is still forbidden outside of Tree: *)
utop # let one_node i ileft iright = Node (i, Leaf ileft, Leaf iright);;
Error: Cannot create values of the private type Tree.t

Hope this helps!


Thinking about all I’ve been reading about OCaml, it seems that GADTs could be a part of the solution to the requirements I expressed.

This 2015 article shows how to add constraints on a variant definition: https://mads-hartmann.com/ocaml/2015/01/05/gadt-ocaml.html
I first didn’t fully understand it as I was learning OCaml basic constructs.

With ADTs:

type value =
  | Bool of bool
  | Int of int

type expr =
  | Value of value
  | If of expr * expr * expr
  | Eq of expr * expr
  | Lt of expr * expr

With GADTs:

type _ value =
  | Bool : bool -> bool value
  | Int :  int -> int value

type _ expr =
  | Value : 'a value -> 'a expr
  | If :    bool expr * 'a expr * 'a expr -> 'a expr
  | Eq :    'a expr * 'a expr -> bool expr
  | Lt :    int expr * int expr -> bool expr

and the eval function is reduced to only 6 lines (instead of 20 lines with ADTs; say 17 lines if we move the three end constructs), with drastically improved readability.

Details : eval function with ADTs and GADTs

with ADTs:

let rec eval: expr -> value = function
  | Value v -> v
  | Lt (x, y) -> begin match eval x, eval y with
      | Int x, Int y -> Bool (x < y)
      | Int _, Bool _
      | Bool _, Int _
      | Bool _, Bool _ -> failwith "Invalid AST"
  | If (b, l, r) -> begin match eval b with
      | Bool true -> eval l
      | Bool false -> eval r
      | Int _ -> failwith "Invalid AST"
  | Eq (a, b) -> begin match eval a, eval b with
      | Int  x, Int  y -> Bool (x = y)
      | Bool _, Bool _
      | Bool _, Int  _
      | Int  _, Bool _ -> failwith "Invalid AST"

with GADTs:

let rec eval : type a. a expr -> a = function
  | Value (Bool b) -> b
  | Value (Int i) -> i
  | If (b, l, r) -> if eval b then eval l else eval r
  | Eq (a, b) -> (eval a) = (eval b)
  | Lt (a,b) -> (eval a) < (eval b)

BTW, GADTs seems a powerful construct which is quite easy when creating the type, but writing the function seems more tricky. Do you know the exact meaning of the “signature” of a function over GADTs and how to read the type a. a expr -> a in the following function definition?

let rec eval : type a. a expr -> a = function

OCaml refman/GADTs uses the following syntax (the various examples give an idea but it’s still unclear):

let rec eval : type a. a term -> a = function

let rec sum : type a. a term -> _ = fun x ->

let rec eq_type : type a b. a typ -> b typ -> (a,b) eq option =

let get_dyn : type a. a typ -> dyn -> a option =

1 Like

Sure, you can do this with GADT but I think it would be simpler with your records.

(* some phantom types for the GADT *)
type state
type transition
type region
(* the automaton definition using GADT *)
type _ automaton =
    Region :
      (string * state automaton list * transition automaton list) list -> 
      region automaton
  | State : string * region list -> state automaton
  | Transition : string * string * string * state automaton *
      state automaton -> transition automaton

I read it “for some type a, take an a expr and return an a”. This annotation is necessary for the unification algorithm of the type checker. In your example, in the first case of the pattern matching the local type variable a will be unified with bool, in the second case with int, etc.

Batteries has a BatBounded module, which implements values that must fall within bounds.
Perhaps you’ll be able to figure out a good way to implement constraints by looking at that module’s code ?

Sure! Thanks.
I’m progressively switching to abstract types because this mechanism gives freedom for changing the implementation. And that helps enforce an invariant as you have just demonstrated.
I’m currently putting the constraints in functions and in modules (in fact I’ve no other choice).
A constraint can be related to the structure or the behaviour of instances of the defined typed, so it’s general enough.
I imagine some situations where I should tap into an existing business rules repository, or it is useful to define such a repository within an OCaml program where many constraints are used by many functions (so changing a constraint once is enough and requires no additional effort).

This makes me thinking about two things:

1/ I can very easily define such a set of predicates that can be used by functions.

2/ I feel that this can create some trouble at run-time:
For example, it seems not dangerous to change the constraint on Node value (from the Tree module) because the scope is quite limited:

   "The int value of a Node is in the range [1; 100]." 


   "The int value of a Node is in the range [10; 20]." 

But there may be some cases where several predicates are related and that could drastically change the behaviour of the program.
As the conditions used by OCaml functions are true or false, I think that the program will still compile. But the program could have a weird behaviour at run-time without crashing.
Do you agree with this assumption?

In general, there are type systems where types can be arbitrarily parameterized by values, but these sort of types, so-called “dependent types”, are not provided by OCaml’s type system, and they are too powerful for most purposes. You can do Turing equivalent computation at the type level once you have full dependent types, type inference and checking become very hard (and in some cases, no decision procedure can exist for it any longer), etc.

Wanting more powerful types is natural, but there are limits to what is practical given current understanding. That said, if you want to play with such things, Agda, Idris, Coq, etc. provide them. There is a reason people don’t use them every day, though.

I’m not talking about those advanced (and research) tools (as discussed previously, they are dedicated for very high end system/software that are not done in C/C++/Java or OCaml).
I’m just talking about - Real World OCaml programs - where we want to easily and cleanly define (algebraic data) structures:

  • that are rich enough
    I think they are: I can create as many records or variants combinations as I want to express all that than I need; IMHO the limit should my/our imagination because pattern matching can help a computer to traverse very big trees, especially if the trees are well defined/typed. It’s up to the program to handle any creation/search/update/deletion on such structures.

  • that stay manageable at any scale
    AFAIK it doesn’t seem to be the case because to understand all the knowledge embedded in an OCaml program, I need to have an overview of 1/ the structures type definition and 2/ all the constraints that are nested in functions.
    It looks like a solution would be some “documentation” of these constraints: either by highlighting constraints in the code, or by putting the constraints in a set/repository with possible additional documentation fields. Do you agree with that option?

On the basis of the powerful OCaml type system, I feel it would be valuable to put constraints into type declarations, or as close as possible to these declarations. Of course, as a simple OCaml programmer.

I just told you, you can’t put arbitrary constraints into type systems without requiring dependent types. It’s trivial to prove that arbitrary constraints mean that your types become far too powerful to do automated inference on and the like. Doing what you’re asking means using the sorts of type systems you find in Idris or what have you. It’s much easier just to use opaque types and have your constructor functions enforce constraints in executable code instead of in types.

However, you don’t have to believe me.

You’re asking too many questions :slight_smile: It is hard to address them all, so this may scare people away, my suggestion would be to split your question into several small and well-scoped questions.

Anyway, you are indeed using OCaml a little bit incorrectly because you’re confusing data and types. Records, tuples, and variants represent data, an algebra of data close under very few operations (basically union and set multiplication). If your field of knowledge could be encoded in terms of two this set operations, then you’re fine with algebraic data types.

For less simple cases, you need to start thinking differently and use different mechanisms. You shall invoke the Curry-Howard isomorphism and start thinking of types as of theorems. Where one type is a theorem, a collection of types (a signature) will become a theory. Once you’re in this paradigm it is quite easy to express your domain knowledge using types, e.g.,

module type System = sig 
  type human
  type company

  (* theorem: all humans have one name *)
  val name : human -> string

   (* theorem: all humans have and id card *)
   val name : human -> id

   (* all companies have a set of clients, possibly empty *)
   val clients : company -> human list

You can express complex relationships between different entities. You can employ subtyping by using type constructors, e.g.,

type 'a entity
type human
type company

(* all entities have an identity *)
val id : 'a entity -> id

(* a human with id is an entity *)
val human : id -> human entity

(* a company with id is an entity *)
val company : id -> company entity

Once you start to reason in signatures and writing your code on the module level you will uncover the whole power of OCaml. Remember: if you’re not using mli files, then you’re not programming OCaml.

Where a signature is a theory which is a set of theorems which can express many complex ideas, the module implementation, aka structure is a set of proofs. Constructive proofs. Since OCaml is not a theorem prover, but is more known as a programming language, it would be hard to force OCaml to verify all your proofs. Indeed, OCaml can try to verify some simple cases, as it knows some theories of algebraic data types, and GADT give enough power to prove stuff that in general requires dependent typing. You can even prove Russel Paradox in OCaml. But at the end of the day you will still find that some terms could not be verified by OCaml. For example, if we want to create a theory of nat1, i.e., natural strictly positive numbers, we can express it with a signature:

module type Nat1 = sig 
   type t

  (* nat1 is a subset of integers *)
  val of_int : int -> t option

  (* the set of nat1 is closed under addition *)
  val add : t -> t -> t

  (* but not under subtraction *)
  val sub : t -> t -> t option

Now, when we have to write the implementation it would be natural to use int as an underlying implementation. (You can of course use church numerals, and indeed prove the provided theorems, but it wouldn’t be practical or efficient). So the implementation would look like this:

module NatAsInt : Nat1 = struct 
   type t = int
   let of_int x = if x > 0 then Some x else None

   (* here the compiler does not have a proof that `x` and `y` are positive, 
       however, since the only valid way to obtain an instance of type t 
       _outside_ of our module is via `of_int` or `add`, and `sub` we can prove
      though not mechanically, that `x` and `y` are always positive here. *)
   let add x y = x + y

   (* the same is here, we know that they are positive as this is an invariant
      of the module *)
   let sub x y = if y >= x then None else Some (x - y)

Thanks for your refreshing answer.
I feel much more comfortable because I find sets and predicates expressed in OCaml. :slight_smile:

Maybe this function should be renamed as id to avoid overriding of the previous name and type id added in order to compile? This should give:

module type System = sig 
  type human
  type company
  type id    (* Added to be able to compile *)

  (* theorem: all humans have one name *)
  val name : human -> string

   (* theorem: all humans have and id card *)
   val id : human -> id      (* Value changed from `name` to `id` *)

   (* all companies have a set of clients, possibly empty *)
   val clients : company -> human list


Honestly, I’ve never used that kind of expression 'a <sometype>
How is it called?
It looks like those from @kantian:

(* some phantom types for the GADT *)
type state
type transition
type region
(* the automaton definition using GADT *)
type _ automaton =

First, I roughly understand that it allows to express things on two levels (human entity / company entity or state automaton / transition automaton / region automaton)
As I just discover it, I see it somehow as a workaround (in the above case of the automaton).
Because from this construct (kind of expression) how can I express a complex set, “class hierarchy”, etc. ? (say, “human can be male or female” ; “company can be composed of humans and animals and machines” etc.

Hmm… wait a minute!
Do you mean that I can handle (subject, predicate, object) this way?
Let’s see. I can express:

  • many entities (human, company, animal, etc.) as subjects.
  • many objects (id, name, color, Ocaml, etc.).
  • and many predicates as functions: (subject, predicate, object) val human : id -> human entity .

Am I right or confused?
(I deliver my thoughts directly without censorship).

I’m not sure I understand what you’re trying to say. It seems to me you’re saying that some expected properties of your structures may be more complex than plain invariants. Indeed, this may happen but:

  1. It happens in any program, it’s not specific to OCaml;
  2. As of today, I don’t think there’s any programming language (except a few academic experiments) featuring a type system able to specify this kind of property in a “natural” way and then check it automatically.

In short, I don’t know how to fully and correctly implement my OO model in OCaml, and I feel that I try to bend variant for that purpose.

Let me express simply the situation in other words:

1/ in OCaml I can express any individual class/thing/entity (designed with an OO approach):
I just have to pack up its attributes and operations in a record, a class or a module. A signature will mint it’s attributes and operations, and a structure will implement with functions the behaviour defined by the operations.
And I can express any invariants of that class/thing/entity that will be enforced as you demonstrated above within a module. There is no need for research tools.
We assume here that a class just knows how to create a relation with other classes by calling the relevant function.
However, implementing the inheritance/multiple inheritance is not clear for me.
With the OCaml object system, it’s easy to define that B inherits from A. Then B gets automatically all the features from A and can overrides some features.
How to do the same with the OCaml module system?
Hmm… multiple signature and structure include and redefinition/shadowing of “inherited” functions should do the job. Am I right?
So, as a side question, if this is exact:
what are the advantages of the object system? and when to use it

2/ In OCaml, I can create any kind of relation between any number of classes:
A product type such as a record or a tuple is fine for holding that. There is place for any properties such as the name of the relation, the role and the cardinality and multiplicity of each class involved in the relation and any other useful properties (types ans functions). And again any invariant of the relation can be enforced as desired.
Except for the case of inheritance relation has been pointed in 1/, the job should be done.

3/ In parallel, OCaml let me express trees with variant:
a simple or complex fixed size tree with a variant (i.e. for various cases as you mentioned above), or even a language with a recursive variant as well as some eval function to analyze any unlimited size expression which is an unlimited size tree.
Exactly as shown in RWO:Variants :

"Examples from RWO: variant and recursive variant"...

Various cases:

type basic_color =
  | Black | Red | Green | Yellow | Blue | Magenta | Cyan | White ;;

type color =
  | Basic of basic_color     (* basic colors *)
  | Bold  of basic_color     (* bold basic colors *)
  | RGB   of int * int * int (* 6x6x6 color cube *)
  | Gray  of int             (* 24 grayscale levels *)

Or a small language using recursive variants:

type 'a expr =
  | Base  of 'a
  | Const of bool
  | And   of 'a expr list
  | Or    of 'a expr list
  | Not   of 'a expr

BTW, I made a mistake when aswering @ivg that "I’ve never used that kind of expression 'a <sometype>". I’ve already played with small languages like the example from RWO or OCaml refman where " expr is parameterized by a polymorphic type 'a ".

The most famous example of unlimited size int binary tree defined with a recursive variant is probably:

type tree = Leaf of int 
          | Node of tree * tree

I feel uncomfortable with variant, somehow “obliged” to use it because those trees are very nice, and also because it’s very close to OO inheritance with property {disjoint}

So the point that motivated me to create this topic may be that:

  • I try with OCaml to define inheritance/multiple inheritance between things/classes with modules (because “first-class OCaml modules are mainly used over classes…”) and I don’t really know how to handle that with modules;
  • I struggle for using polymorphism in order to factorize as much as possible functions and don’t exactly know how to do.

These two points aim towards as much as possible factorization of things (and their relations, which are things).

Is it more clear?

PS: as a side note and as as a possible illustration, pls. find a class diagram from this site https://stackoverflow.com/questions/37112937/uml-modeling-how-to-represent-a-multi-way-inheritance-onto-a-class-diagram


UML can represent this with no problem. But, how you map it into a programming language is another matter. (And another SO question.) Many languages do not support multiple classification of instances, so you have to hammer a square peg into a round hole.

Can it be implemented in OCaml as follows?

type person = Physical_person 
            | Company
type inex = Inner_actor 
          | External_actor
type actor = { person: person;
               kind:   inex

and then the attributes/operations of the general Actor class should be injected in the sub classes via include in the module of each other class?

This is not surprising as OCaml isn’t an object oriented language. It does have an object system, but for good reasons it is rarely used.

Trying to bend a language to make it what it isn’t reminds me of the old saw, “the determined Fortran programmer can write Fortran code in any language.”

If you want to use Java or Python, they are out there, waiting to be used, just as if what you really want is Fortran, Fortran is out there. In general, it is not a good idea to try to make a language into something it isn’t. Don’t write Haskell the way you would write C code, say (and generally, the reverse is also true.) If you want to use OCaml, then read other people’s OCaml programs and learn how to write OCaml code in OCaml. (I started learning OCaml over a year ago; it will take me at least another year until I feel I have the style approximately right when I write things, though it might take much longer. It is important to be humble while learning, and to see why people do things the way they do, without presuming in advance they know less than you when they’ve done something far longer.)

If learning to do things the OCaml way doesn’t suit you (and there’s no reason it needs to!) then pick some language that does. The world is full of choices in the PL design space. Everything from Crystal to Julia to Agda is out there waiting for you to try it. Perhaps one of them suits your way of thinking better.

As for the software architectures you are discussing, generally, I find that when people start by assuming they need a complicated data model or what have you, they’ve probably taken a wrong turn. That is, of course, my taste — some others have found success with complicated code. That said, on more than one occasion, I’ve looked at some program, found it incomprehensible, and replaced it with a better program that was a fraction of the length, did far more, and was vastly simpler.

A real life example: Once, decades ago, I was hired by a large European bank to fix a broken many, many thousands of lines Perl program (yes, Perl) intended to merge two email directory databases every evening. The script had never worked correctly and was breaking badly. The author, long gone, had suffered from a syndrome I find all too common, which I refer to as “overeducation” — his program incorporated, among other things, an implementation of the Quine–McCluskey algorithm. I could not figure out why the program could possibly need an implementation of the Quine-McCluskey algorithm or what it was using it for; even littering the code with print statements didn’t elucidate the structure very well. I gave up and replaced the monstrosity it with a couple hundred lines of Perl (yes, Perl, at the time that was what I had) that worked far better, and to my knowledge, that code never broke in years of production.

Similarly, years and years before that, when I was a stupid and arrogant programmer of an embarrassingly young age, I was hired to write some input editing libraries for a full screen accounting application. The UI ran on terminals back in those days — real terminals, the sort with an RS-232 connector on the back and an 80x24 line display which is addressed with escape codes. I produced a couple thousand lines of spaghetti to do the work, and thought I’d created the Sistine Chapel. Luckily, there was a very, very old programmer around — I think he must have been 30, he was so very very old (would that I were 30 again) — and he taught me how state machines could be used to simplify code of this sort, and helped me to rewrite the mess. The code shrank by about 90%. It did much less when it was finished, and thus did far more.

The most successful software is very very often the software with the simplest possible design. However, finding the simplest possible design is really really hard. It is our task as programmers to learn, usually from bitter experience, how to make programs simpler and simpler, and not more and more complicated.

The best code is so simple that it has an inevitability to it — it becomes “obvious” that there is no better way to write the code. Generally, such code is extremely hard to write, and is only “obvious” in retrospect. You see the beauty and inevitability but not the long and bitter experience and struggle that went into creating the inevitable simplicity. (Generally such code is also not created through the use of class diagrams, “design patterns”, UML, etc.)

Mostly, you will never need all the elaborate castles in the sky your mind has imagined your program requires. Learn not to imagine them in the first place; don’t dream about complicated data structures at night or work out complicated architectures in the shower, they are almost always a mistake.

Someone has asked you to write a program. Probably your program can be written in a very straightforward style. Start with that, with the most obvious way to do things. Maybe once you’ve written it, you’ll discover it doesn’t quite do enough, or it needs some more generality — you can refactor the parts of your program that will need more generality at the point where you discover you require them. You will almost never correctly guess in advance when those times will be, so there’s no point in figuring them out in advance anyway. More likely you’ll discover once you wrote it that you wrote far too much and made it far too complicated, anticipating needs that never appeared, and you’ll yearn not for time to generalize the code but to remove the 2/3rds that proved utterly unnecessary once it was done.

Premature optimization is often said to be a sin, as you’re unlikely to guess what is performance critical in advance. Premature design is also a sin, because you won’t know what you will actually need in advance.

So all this is a roundabout way of saying this: you can probably find some way to achieve what you claim you want. I recommend you stop trying instead.


Like @ivg said, I think you’re speaking of too many things in one post…

Now, to put it bluntly, in the OO approach, data comes first; in functional languages, functions come first. So I think you should refrain from trying to map an OO design over an OCaml program. You may rather change your mindset and first adopt a functional design approach. Which means:

  1. Think first of what your program should do and how you would decompose every function into the composition of simpler ones.
  2. Types, that are specifications as @ivg showed clearly, will follow (“this function needs this kind of information”, “it’s meaningful from the applicative point of view to gather this and that pieces of information in a single one”, “a tree is either a leaf or a node”).
  3. Gathering types and function profiles into signatures will then enable abstraction and modularity, and all the good things they entail.
  4. Factorizing function declarations occurring into several signature into more abstract signatures.

Of course, in practice, things will not necessarily happen in that order, and you will iterate over your declarations.

Now there’s only one thing to do: practice!
Do not try to embrace all OCaml in one stroke and do not try to translate OO into functional.

PS: Incidentally, functional design was rejected and replaced by the OO approach, but I think it comes from the fact that people were not implementing it with functional languages (featuring abstract types and higher-order functions, which allow code reuse). Amusingly, OO design is now rejecting general inheritance a lot to replace it with interfaces and delegation (poor man’s function composition).

Do no try to mimic OO programming in OCaml: most of the the time objects (in OO programming languages) are used as poor man modules.

Here an example from C++ great practices. It’s about naming conventions, but it is useful to illustrate the problem with OO programming and the fundamental logical confusion in its design.

First, they define a struct this way:

struct Limit
    int32_t quantity;    // Good
    double  price;       // Good
    double  limitPrice;  // Bad - Duplicated word 'limit' with struct name
    bool    isActive;

Then they argue that in a struct all fields are public, but that a class has private data and provides many functions. So they define this class:

class MyClass
    double getTradingVolume() const;

    int32_t _quantity;
    double  _price;
    bool    _isActive;

double MyClass::getTradingVolume() const
    return _price * _quantity;

But what they really want is a very simple theory (with only one proposition) on the concept previously defined as a struct without revealing its definition. Id est what they really had in mind is this:

module MyClass : sig
  type t (* the definition is kept abstract *)
 (* the only one proposition on this type: 
  * every value of type `t` has a trading volume
  val trading_volume : t -> float
end  = struct
  type t = {
    quantity : int;
    price : float;
    is_active : bool;

  let trading_volume {quantity; price; _} = float_of_int quantity *. price

More generally, a type is just a concept (a concept whose values are constructed, i.e in other words a type is just a synonym for mathematical concept) and most of the time a class is just a product type (whose fields are kept abstract) with some functions on it: it’s just a theory about this concept, and in OCaml you have to use the module language to define theory on types.

In short, when you define a class in an OO language you do not only define a concept but at the same type you define a concept and a theory on it.

1 Like

Your example 2 could also be implemented with polymorphic variants:

type state = [
  | `State of string * region list

and region = [
  | `Region of (string * state list * transition list ) list

and transition_data = {
  name: string;
  source: state;
  target: state;
  event: string;
  guard: string

and transition = [
  | `Transition of transition_data

type automaton = [
  | state
  | region
  | transition

OO vs.functional approach:
It would require a whole site only to discuss that. I’m not an evangelist of any kind, so I can just tell you that I experienced and observed that a pure OO approach is very rarely observed because it requires a high ability in abstraction. So, usually OO models are not well designed (only the OO syntax is used), then the developers implement something that is simply not a good (OO) model, using many tweaks to partially satisfy initial and changing requirements. And when the discussion with DBAs occurs, the poor Object model is killed a little bit more.
Functional programming appears to me very positive because it encourages and even enforces (with the “abstract signature first, then structure” pattern) a good way of thinking at the same time about objects/things and their relations (i.e. functions from a thing to an other, or “theories” as @ivg mentioned).
On the other hand, OO tries to make people focus on objects but as it’s very hard (the way it’s done), people rapidly focus on what software users are doing with the poorly designed software objects. And “doing” is simply about (possibly inconsistent) operations (business processes), and not at all about pure functions.
One should admit that OCaml doesn’t enforce us to write with a correct OCaml style (see some instructive discussions between C programmers trying to write in OCaml and experienced OCaml programmer explaining functions, recursion…).
So this topic should be something like How to correctly code in OCaml a model that is all about things (classes) and relations (other things) and constraints on all of them?

About trying to mimic OOP
I’m not trying to mimic OOP. There is a model that is expressed in UML and that apparently looks like a conventional OO model. But in fact it’s richer than usually observed OO models (before the Java guys start to encode it) because there are much more constraints on all the stuff (classes, attributes, operations and on associations between all of that).
I feel fine with the functional approach even if I don’t know yet how to fully handle what I’ll try to explain in the following.

I do practice but when I can’t implement something which is clearly defined and that should be fine with ADT, I ask for more experienced people about how to do.
BTW, in many tutorials that I’ve been working on, the “signature first, then structure” pattern is not encouraged. There is so much to learn about constructs and good recursion that adding constraint may be discouraging?

Multiple inheritance/Multiple classification
I’m not experienced enough with OCaml but it should be solved with the OCaml module system with multiple includes or with the object system thanks to the multiple inheritance facility.
Do you agree on that?

Working on examples
In order to focus the discussion on a limited scope as suggested, I tried to implement some helpers with the polymorphic variants suggested by @rand.

1/ I modified the type definition of t. With only a region, this invariant is naturally enforced.

2/ I went into some trouble with types, polymorphic variants dancing in every directions with no abstract type at all returned (despite the module structure constrained by the module type signature has been correctly evaluated…?!). So to fine tune this simple implementation, I had to enforce types locally on functions in the structure, then I could add the module signature to enforce globally (I left the local “signature enforcement” in the code of the two functions after the AUTOMATON module signature was globally applied).

3/ I had for quite a while issues with the type of the automaton returned when playing with the functions of this module. It appeared not to be abstract. Eventually I fixed that without exactly knowing what I did (after several compiler restart…)

# let a1 = create_auto "Automaton1" "Region1"
val a1 : Automaton.t = ("Automaton1", `Region ("Region1", [], [])) 
(* instead of 
val a1 : Automaton.t = <abstr>        *)

# let a1 = create_auto "Automaton1" "Region1"
val a1 : Automaton.t = <abstr>
(* Finally ok )

What are the rules one should respect for having an abstract signature returned (as defined in module type signature)?

Here is the code:

module type AUTOMATON = sig
type t
val create_auto : string -> string -> t
val add_st : t -> string -> string -> t

module Automaton: AUTOMATON = struct
type region = [ | `Region of (string 
                              * state list
                              * transition list )
and state = [ | `State of string
                 * region list
and transition = [ | `Transition of transition_data
and transition_data = { name: string;
                        source: state;
                        target: state;
                        event: string;
                        guard: string
type t = string (* automaton name *) 
         * [ | region 
             (* "Invariant: 
                An automaton can only be composed of one region." *)

let create_auto an rn :t= an, `Region (rn, [], [])

let add_st (a:t) rn sn :t =
  let an,r = a in
  let rn0, sl, tl = match r with `Region u -> u in 
    match rn0 = rn with 
        true ->  an, `Region (rn, `State(sn, [])::sl, tl )
      | false -> failwith ("No such region "^rn^" in automaton of name "^an)

Is the way of coding in OCaml correct according to you?

For scaling from that very basic example to a real large program, what are your recommendations?


PS: I also worked on the implementation based on the mutually exclusive records. It was much more easy than with polymorphic variants.

I’m currently preparing an example with questions about how to design modules (what to put in them) when we have several things/classes and many to many/one to many relations between several number of modules. This way we will scope one thing after another.

perhaps because there is no such thing except in a few simple cases.