I think I’m not using OCaml the best way, or the right way
On one hand, now I can write many expressions with all OCaml constructs to make a program that compiles.
On the other hand, I feel often weird because the OCaml expressions do the job but with apparently too much verbosity. And any modification regarding instances of types and relations between them add even more verbosity.
Or the type definitions are insufficiently constrained, and the required constraints should be injected by the functions in order to create an object of the desired “constrained type”.
It’s ok with a program of small size. But I fear it becomes very tricky with a program of medium or large size (in addition, we have to manage the consistency between the signature and the structure of many nested modules). Maybe there is a way to handle that.
“IN SHORT”
What is the recommended way to handle constraints on types? (see example 1)
How to handle constrained recursive structures with variants? (see example 2)
And how to (simply) factorize and combine fragments of types? (which are knowledge fragments) (see examples 1 and 3)
Let’s see through examples.
At the very beginning, this knowledge can be expressed in terms of predicates, possibly grouped by subject (person, company, etc) and that shapes OCaml constructs such as a record for a person or a company:
a person is a human
a person has a name
a person as an ID card
a person has 1 spouse
…
a company has…
a company has…
How (or where) to add constraints to/in a OCaml type definition to get all predicates applied?
Example 1
type tree = Node of int * tree * tree
| Leaf of int
let tree_constraint_NL = "The int value of a Node is in the range [1; 100].
AND The int value of a Leaf must be a prime number."
let t1 = Node (1, Node (1000, Leaf 3, Leaf 5), Node (3, Leaf 7, Leaf 16 ) (* ILLEGAL VALUES *)
val t1 : tree = Node (1, Node (1000, Leaf 3, Leaf 5), Node (3, Leaf 7, Leaf 16))
The only place I see is in a helper function that enforces this constraint when it creates an instance of type tree. But it 's starting to embed knowledge in some piece of code whereas I would like to get this knowledge in a central place that should be types or “extended types” or “typed types”.
What is the recommended way?
Example 2
A (simple) state machine is composed of one or more regions that hold several states and transitions.
A state can hold one or more region. A transition has a pair of relations: one with a state as source and one with a different state as a target.
A transition is done when an event is received in a certain state and if a guard condition is satisfied (only when the even is received).
The two following expressions are expressing what I want but obviously don’t compile (Syntax error):
type automaton = Region of ( string * State list * Transition list ) list
| State of string * Region list (* Recursivity *)
| Transition of {name: string; source: State; target: State; event: string; guard: string } (* NO *)
(* or *)
type automaton = Region of ( string * automaton:State list * automaton:Transition list ) list
| State of string * automaton:Region list (* Recursivity *)
| Transition of {name: string; source: automaton:State; target: automaton:State; event: string; guard: string } (* NO *)
The following expresses the recursivity and compiles:
type automaton = Region of ( string * automaton list * automaton list ) list
| State of string * automaton list (* Recursivity *)
| Transition of {name: string; source: automaton; target: automaton; event: string; guard: string }
but now it’s also too much expressive/insufficiently constrained.
Again, I have the knowledge that a Region is composed of States and Transitions.
How can I apply that knowledge at the type level?
Finally, while writing this post, I remembered mutually recursive records as follows:
type state = { st_name: string;
regions: region list}
and transition = { tr_name: string;
source: state;
target: state;
event: string }
and region = { reg_name: string;
states: state list;
transitions: transition list }
and automaton = { auto_name: string;
region: region }
(* unique field names to avoid compiler warnings *)
# let a =
let s1 = { st_name = "s1"; regions= [] } in
let s2 = { st_name = "s2"; regions= [] } in
let s3 = { st_name = "s3"; regions= [] } in
let s4 = { st_name = "s4"; regions= [] } in
let t1 = { tr_name = "t1"; source = s1; target = s2; event = "e1" } in
let t2 = { tr_name = "t2"; source = s2; target = s3; event = "e2" } in
let t3 = { tr_name = "t3"; source = s3; target = s2; event = "e3" } in
let t4 = { tr_name = "t4"; source = s3; target = s4; event = "e4" } in
{ auto_name = "a1";
region = { reg_name = "r1";
states = [ s1; s2; s3; s4 ];
transitions = [ t1; t2; t3; t4 ] }
}
val a : automaton =
{auto_name = "a1";
region =
{reg_name = "r1";
states =
[{st_name = "s1"; regions = []}; {st_name = "s2"; regions = []};
{st_name = "s3"; regions = []}; {st_name = "s4"; regions = []}];
transitions =
[{tr_name = "t1"; source = {st_name = "s1"; regions = []};
target = {st_name = "s2"; regions = []}; event = "e1"};
{tr_name = "t2"; source = {st_name = "s2"; regions = []};
target = {st_name = "s3"; regions = []}; event = "e2"};
{tr_name = "t3"; source = {st_name = "s3"; regions = []};
target = {st_name = "s2"; regions = []}; event = "e3"};
{tr_name = "t4"; source = {st_name = "s3"; regions = []};
target = {st_name = "s4"; regions = []}; event = "e4"}]}}
EDIT (hidden): previous erroneous definition with let... and
construct that doesn’t compile… because the order of evaluation is not guaranteed.
Erroneous `let... and` definition
# let s1 = {st_name = "s1"; regions= [] }
and s2 = {st_name = "s2"; regions= [] }
and s3 = {st_name = "s3"; regions= [] }
and s4 = {st_name = "s4"; regions= [] }
and t1 = {tr_name = "t1"; source = s1; target = s2; event = "e1"}
and t2 = {tr_name = "t2"; source = s2; target = s3; event = "e2"}
and t3 = {tr_name = "t3"; source = s3; target = s2; event = "e3"}
and t4 = {tr_name = "t4"; source = s3; target = s4; event = "e4"}
and a =
{auto_name = "a1";
region = {reg_name = "r1";
states = [ s1; s2; s3; s4 ];
transitions = [ t1; t2; t3; t4 ] }
}
Characters 201-203:
and t1 = { tr_name = "t1"; source = s1; target = s2; event = "e1" }
^^
Error: Unbound value s1
It looks like the job is done. It’s verbose but quite close to a serialized specification with minimum syntax as follows:
automaton a1
region r1
states s1 s2 s3 s4
transitions t1 s1 s2
t2 s2 s3
t3 s3 s2
t4 s3 s4
EDIT:
I would like not to get this useless stuff (for me) which is mandatory (for the compiler) such as “empty regions in states” and all the quotation marks "; it’s not a problem here because still human readable, but it can represent a lot of noise with complex structures and quickly become not human readable).
Is there a way to define optional record fields like in functions?
I only know:
type state = { st_name: string;
regions: region list option}
which very slightly improves readability with {st_name = "s1"; regions= None}
instead of {st_name = "s1"; regions= []}
And what about the variants “that are nearly the same as a BNF declarations and are supposed to be of great help for parsing languages” ? (the examples above are very simple languages).
How to use variants to get the same as with mutually recursive records?
Example 3
Let’s take a very classical example of a person: a record can handle all details (name, ID_car number, etc.) and his address, first erroneously considered as part of him.
But there can be several addresses: for invoice, for delivery, a permanent address or one during holidays. So we see that we should hold separately personal information and address information.
Moreover, in the context of an order, the same person has a client_ID and several order_ID.
Obviously we cannot use a monolithic record (or a class for that).
Of course, a product type will handle each set and another product type will combine those types.
(* fields simplified *)
type person = { name: string }
type address = { city: string}
type client = {client_ID: string;
client_type: person;
client_address: address
}
type product = { product_id: string;
product_name: string;
product_price: string; }
type order = { order_ID: string;
client_ID: string;
supplier_ID: string;
order_date: string;
order_items: (product * int ) list; }
(* Constraint: A product can only appear one time *)
}
If the client can also be a company, I can create a variant that will handle the two cases
type company = { name: string;
legal_form: string;
address: address; (* Only ONE address *)
tel: string; (* Only ONE telephone number *)
contact_role: (person * string) list; (* [{name="John"}, "logistics"; {name="Bill"}, "accounting"] *)
}
type client = Person of {client_ID: string;
client_type: person;
client_address: address
}
| Company of {client_ID: string;
client_type: company;
client_address: address
}
The structure is becoming more and more complex when combining types fragments with nested product (and sum) types
Is it a correct way/style to write OCaml code?
Is there a better and more efficient way to hold such fragments and to combine them?
Conclusion
What is the cleaner and more efficient way to:
1/ handle constraints on types
2/ handle recursive structures with variants
3/ combine fragments of types
Thanks.