Practice: Making Invalid States Unrepresentable

type-system
learning

#1

I’m not sure if this is the place for this, but I wanted to start a discussion
that would end up being sort of a sprawling code review of a toy problem.

I’m just learning OCaml.

One of the exciting techniques I’ve read about in using a language that support
abstract data types is the fact that you can model your problem in a way that
the type system makes it impossible to represent invalid application states.

This is something that I’ve seen only a couple examples of and I want to try to
find more ways to learn how to do this better.

I’m working on a side project gathering a collection of coding puzzles to solve
in OCaml and I came up with a couple implementations of one that might be
approaching something manageable but I still feel like it’s falling short.

To take you through the iterations, I’m working on a way to generate a solution
to this simple logic puzzle involving a fox, a goose, a bag of corn, and a
river. You can only transport one animal at a time, and leaving the animals
alone on one side of the river will allow that animal to eat another animal (or
the corn).

I’ve come up with a couple ways to represent this that are varying degrees along
the spectrum of the code allowing for invalid states to still be possible.

(* naiive and error prone *)

type player = Fox | Goose | Corn | Person
type riverbank = Bank of player list
type boat = Boat player list
type game = riverbank * boat * riverbank

let first_game_state = (
    [Fox; Goose; Corn; Person],
    [],
    []
)

(* check if someone's getting eaten *)
let is_dangerous g = true (* would need to check list contents etc *)

let generate_solution first_state = [first_state]
(* would generate a list of states that get to the solution state *)

This is how I would probably model it in most languages, but there are a number
of invalid states you could create from this model. For example if the solution
generator’s second step looked like this:

(
    [Fox; Fox; Corn; ],
    [Person; Goose],
    [],
)

We know something’s wrong since there can only be one Fox in the game. Also I
think the act of iterating over a list of things isn’t that great. I would
rather be able to use some pattern matching.

The next data model I came up with has slots for each of the members. At this
point I noticed that a boat also has the constraint of only either being empty
or having a player or an animal.

type not_player = Fox | Goose | Corn | Empty
type fox_spot = HasFox | NoFox
type goose_spot = HasGoose | NoGoose
type corn_spot = HasCorn | NoCorn
type player_spot = HasPlayer | NoPlayer
type river_side = (fox_spot * goose_spot * corn_spot * player_spot)
type boat = EmptyBoat | Boat of player_spot * not_player
type game_state = river_side * boat * river_side

And some tests to verify safe and unsafe game states:

open OUnit
open Solutions.Foxgoosecorn

type step_result = Safe | Unsafe

let check_side side = match side with
    | (_, _, _, HasPlayer) -> Safe
    | (HasFox, HasGoose, _, NoPlayer) -> Unsafe
    | (_, HasGoose, HasCorn, NoPlayer) -> Unsafe
    | _ -> Safe

let check game = match game with
    (left, boat, right) -> match (check_side left, check_side right) with
    | (_, Unsafe) -> Unsafe
    | (Unsafe, _) -> Unsafe
    | _ -> Safe

let suite = "FoxGooseCorn">::: [
    "fox eats goose">:: (fun c ->
        assert_equal
                Unsafe
                (check (
                    (HasFox, HasGoose, NoCorn, NoPlayer),
                    Boat (HasPlayer, Corn),
                    (NoFox, NoGoose, NoCorn, NoPlayer)
                    ))
                );
            "safe state1">:: (fun c ->
                assert_equal
                Safe
                (check (
                    (HasFox, HasGoose, HasCorn, HasPlayer),
                    Boat (NoPlayer, Empty),
                    (NoFox, NoGoose, NoCorn, NoPlayer)
                    ))
                );
            "safe state2">:: (fun c ->
                assert_equal
                Safe
                (check (
                    (HasFox, NoGoose, HasCorn, NoPlayer),
                    Boat (HasPlayer, Goose),
                    (NoFox, NoGoose, NoCorn, NoPlayer)
                    ))
                );
]

This feels better to me, but I still can represent a character being in more
than one place at the same time:

    (
     (HasFox, NoGoose, HasCorn, NoPlayer),
     Boat (HasPlayer, Fox),
     (HasFox, NoGoose, NoCorn, NoPlayer)
    )

While this gives me a more declarative way to define unsafe states, it still
allows me to put a character in more than one place at a time, leading me to
this implementation:

type position = LeftBank | RightBank | Boat
type game_state = {
    fox: position;
    goose: position;
    corn: position;
    player: position
}

And rewriting the tests:

open OUnit
open Solutions.Foxgoosecorn

type step_result = Safe | Unsafe

let combine_check a b = match (a, b) with
    | (Safe, x) -> x
    | (Unsafe, _) -> Unsafe

let check_fox g = match g with
    | {
        player = p;
        fox = f;
       goose = g; _ }
       when f == g && f != p -> Unsafe
    | _ -> Safe

let check_goose g = match g with
    | {
        player = p;
        corn = c;
        goose = g; _ }
       when c == g && c != p -> Unsafe
    | _ -> Safe

let check game = combine_check (check_fox game) (check_goose game)

let suite = "FoxGooseCorn">::: [
    "fox eats goose">:: (fun c ->
        assert_equal
        Unsafe
        (check {
            player= Boat;
            fox= LeftBank;
            goose= LeftBank;
            corn= Boat
        })
    );

    "safe state1">:: (fun c ->
        assert_equal
        Safe
        (check {
            player= LeftBank;
            fox= LeftBank;
            goose= LeftBank;
            corn= LeftBank
        })
    );
    "safe state2">:: (fun c ->
        assert_equal
        Safe
        (check {
            player= Boat;
            fox= LeftBank;
            goose= Boat;
            corn= LeftBank
        })
    );
]

Since there’s a fixed set of characters in the game, I can limit the game state
to tracking their current position explicitly, and re-adapt the pattern matching
to check for unsafe states.

I’m not happy with the fact that I have to write a function to check each
character explicitly. Also, it’s still possible for several characters to be in
the boat at the same time, even though there should only be the character plus
an extra at most.

Is there a way to encode that constraint as well?

And my next question is, can I make an unsafe state – a valid state which results
in a failure, an illegal state?

I figure there is probably a completely different approach involved where the
type system doesn’t even look similar to the problem, and where there wouldn’t
even be work for the user to do in the puzzle to generate the intermediate
states, but what would it look like?


#2

Have you considered using Sets?


#3

Short introduction into the type theory

Types represent sets of values, even when you say type fox = Fox, you actually define an infinite set of indistinguishable foxes, where each new fox could be constructed with the Fox data constructor. A type checker proves, that if a given expression has type t, then all possible valuations of the expressions belong to the set t. Abstract types are existentials, and they represent a set of values, that are satisfying to some predicate. For example,

module Nice : sig 
     (* A nice number is an integer ending in 3 or 7 when written out in decimal *)
     type t 
     val of_int : int -> t option
end

If our Nice.of_int function is implemented correctly, then we may believe that all Nice.t are nice numbers. However, the only thing that the type system can guarantee for us, is that all values of type Nice.t were generated using the Nice.of_int function. Or in other words, is that for all x in Nice.t, there exists an integer y such that Nice.of_int y = Some x. Although this is nice (pun intended), it still leaves a lot burden of the correctness proof on the programmer, i.e., proving that of_int y is Some x iff x is nice. Thus, although abstract types are excellent mechanisms for extending our theories with arbitrary predicates, it is always much more convenient to model our problem in such way, that as much as possible properties are proved without relying on the proof by abstraction. Or, if it is (practically) impossible, then we should try to find existing well-verified abstractions, in which we believe, like Set and Map abstractions in the OCaml standard library. Even more ideal, would be to implement those abstractions in Coq, and extract them into OCaml.

Application to the problem

In order to (ab)use the type system, to prove certain properties of our domain, we need to represent the domain problem in terms of sets and sets inclusions. We will try to rely on Map and Set abstract types, that provide a proof of uniqueness.

We will represent a game as a set of configurations. Each configuration assigns a place for each player. We say that a configuration is valid if each player has one and only one role. From the set of valid configuration, we would like to find those configurations that satisfy some criteria (in our case that in the same place there shouldn’t be too conflicting players).

Note, that here we are separating the correctness problem - the property that we would like to verify during the program compilation, from the search problem, that we will perform when we will run the program. We are also cheating a little bit, by removing the statement that “there can be no more that one animal in the boat” from the correctness problem, by moving this into the search predicate.

Our first implementation will use the Map abstract type to represent the mapping from players to places:

type player = Fox | Corn | Goose
type place = Left | Boat | Right

module Players = Map.Make(struct 
    type t = player
    let compare = compare
end)

type conf = Conf of place Players.t
type game = Game of conf list (* alternatively -- a set of configurations *)
   
let conflicts place p1 p2 = match place, p1, p2 with
  | Boat, _ -> true (* only one player per boat *)
  | (Left|Right),Fox,Goose | (Left|Right),Goose,Fox -> true
  | (Left|Right),Goose,Corn | (Left|Right),Corn,Goose -> true
  | _ -> false

let players = [Fox,Goose,Corn]
let places = [Left;Boat; Right]

With such representation, we can generate all possible configurations, and then filter them using the conflicts predicate to find all possible solutions to our problem. The type system guarantees, that each player has only one place, since Map.find has type player -> Players.t -> place, and thus we can have only one place for the given player in a configuration. So we have the guarantee for the at most part. However, it doesn’t guarantee that a player has at least one place since the Players mapping is not total. Unfortunately, since the absence of value is represented with the Not_found exception, that bypasses the type system, it is not represented explicitly in the type. (That’s why we should try not to use exceptions, especially in public interfaces).

This solution could be an acceptable tradeoff if you have arbitrary (big) number of players. But in our case, we have an alternative solution, that could be better. We can represent each configuration, that is a mapping, with a function (since functions are mappings):

type player = Fox | Corn | Goose
type place = Left | Boat | Right

type conf = Conf of (player -> place)
type game = Game of conf list

let configurations = [
   (function _ -> Left);
   (function 
     | Fox -> Boat
     | Corn -> Left
     | Goose -> Right);
   ...
]

However, in this solution, we need to enumerate all possible configurations manually and refrain ourselves from using exceptions. We also can’t use the type system to prove that all our configurations are unique (since we can’t compare functions).


#4

A little bit but it felt like reaching for a common data structure wouldn’t solve all the problems I was having in modelling stuff in types.

For example if you had a set representing the left and right banks, you could still have a player exist on both banks at once

(
 PlayerSet.of_list [Fox],
 [],
 PlayerSet.of_list [Fox]
)

It’s like I need a global registry of all players to make sure they only appear in one place at a time, which is how I ended up with the type alias to a record in my final impl.


#5

Further to @ivg’s answer, you have two practical options in my mind:

  1. Allow the programmer to write that tuple you wrote, but then pass it through a validation algorithm
  2. Create a set of functions which can only produce valid sets

#6

You can also use private types in ocaml to model type constraints and/or implement type/module invariants. See this for more http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec236.


#7

I’m not a master of the types, but from what you’re describing, I would say that your player type is incorrect because the inhabitants of the type cannot be safely interchanged while maintaining the state of the application valid. The same goes for the places. So, I would split all the players and places in their own types, making each type have only one inhabitant. From this, building only valid states boils down to making the type system solve by itself the problem.

That may be doable, but it will require using advanced typing tricks such as GADT and phantom types, well beyond basic practice of types. Moreover, the readability of the program will suffer.


#8

I think I agree with you, but which of the three implementations to are you referring to?

So kind of like my impl that has a location have a Present | NotPresent set of values for each player type, but taken further so each location * player combo has a named type?


#9

Sorry. I was referring to @ivg 's solution.

No, I’m trying well above my understanding, but that would start more something like:
The series of corn, goose, fox could be modelled with some kind of Peano integers with the relationship of predation:

    type corn = Corn
    type 'p eater = 'p -> unit
    type goose = corn eater
    type fox = goose eater


    type ('a, b') boat = 'a -> b'
    type ('a, 'b, 'c, 'd) right_bank = 'a -> 'b -> 'c -> 'd
    type ('a, 'b, 'c, 'd) left_bank = 'a -> 'b -> 'c -> 'd

with some condition witness types so that types 'n and 'n eater don’t appear in the lists.


#10

gotcha, so define types that just describe the directed graph relationships between the actors


#11

That’s what I was trying to explain in the intro section of my post. Types represent sets of values, not a value, thus trying to represent a singleton value wouldn’t be possible without requesting the linear typing. Thus type fox = Fox still defines a set of foxes, because a construction of a Fox doesn’t consume the Fox proposition, and it can be reused later, unlike in the linear typing where proposition can construct only one instance. So we just can’t represent this relation with the type system, that assumes all the substructural laws of logic. And what I’m trying to say, is that instead of trying to explain to the type system stuff that just doesn’t fit into its logic, we should try to rephrase our model using to the constructive propositional logic, i.e., that we should stick to the interpretation of simply-typed lambda calculus in cartesian-closed categories. And we can do this.

In our model we have a set of foxes F, a corn set C, and a set of goats G. We define a set of players as a disjoint set of this sets: P = F + C + G. The same way we define three sets of sites (places) S = L + B + R. Now we can express our problem as a set of configurations, where each configuration c(i) is a mapping of type P -> S, that is a surjection of P onto S. We also have an indicative function, that selects configurations that satisfy our search problem, PxP -> S, that basically selects a subset of the PxPxS set. Now, our search problem is to find all sequences of configurations, that start from the initial configuration c(0) (all are on the left bank) and ends with the final state c(1) (all are on the right bank). This is expressible in the ZF set theory, thus it is typable in simply typed lambda calculus (the second approach - that uses pure lambda functions for the relations). In practice, we need to use finite maps, to represent the surjection, that makes our P -> S mapping nontotal, i.e., it is P -> S + 0, so we are using abstract types to guarantee, that our P->S mapping is total, by checking this at runtime.