@ruyblast I think that the behavior you propose in your example does not satisfy @not_kill’s question, because the two patterns are not disjoint anymore (Node(Leaf, Leaf)
matches both). @not_kill is converting a pattern-matching in a logic program, think of a series of independent Prolog clauses, so it is important to have a semantics that does not depend on clause order. The pattern-matching algorithms that I know typically assume a top-to-bottom order. (But I am not an expert on pattern-matching compilation, so I may miss something obvious to make them non-order-independent.)
Last time I looked at this topic I thought that I didn’t know how to minimized as @not_kill would like to. But now, after thinking more about it from @ruyblast answer, I think I have a proposal.
Remark that pattern-matching decomposition is good at merging/sharing “common prefixes” during the traversal: during decomposition each submatrix corresponds to all reachable sub-patterns from a given prefix of the input. If at some point, all rows point to the same action, we can stop testing further (replace it with wildcard patterns). On the other hand, it is not good at finding “common suffixes”, where different inputs have the same continuation of tests to do. (Note: Indeed as @ruyblast suggests this is certainly addressed in pattern-matching compilation literature, for example I think that this is addressed specifically in the “Compiling pattern-matching to good decision trees” paper.)
In the example of the decomposition I proposed earlier:
Node(Node(_, _), Node(_,_)) -> 3
Node(Node(_, _), Leaf) -> 1
Node(Leaf, Node(_,_)) -> 2
Node(Leaf, Leaf) -> 1
Leaf -> 3
we have disjointness, but the two -> 1
cases share a common suffix: after matching the first argument of the node (which is different in each case), there is only a Leaf
test to do before returning 1
, which suggests that in fact some of the distinctions we made before were not useful.
My naive idea is to re-do a decomposition of the matrix in reverse order, from right to left, to merge common suffixes. If we get the same matrix, we stop simplifying. If we get a simpler matrix, we do another iteration (from left to right), etc., until we reach a fixpoint (each simplification strictly decreases the total size). On this example, the second decomposition goes as follows (it gives the expected, minimal result):
Node(Node(_, _), Leaf) -> 1
Node(Leaf, Node(_,_)) -> 2
Node(Leaf, Leaf) -> 1
Leaf -> 3
=>
Node(â–ˇ, â–ˇ):
Node(_, _) Node(_,_) -> 3
Node(_, _) Leaf -> 1
Leaf Node(_,_) -> 2
Leaf, Leaf -> 1
Leaf -> 3
=>
Node(â–ˇ, Node(â–ˇ, â–ˇ)):
Node(_, _) _ _ -> 3
Leaf _ _ -> 2
Node(â–ˇ, Leaf):
Node(_, _) -> 1
Leaf, -> 1
Leaf -> 3
=>
Node(â–ˇ, Node(_, _)):
Node(_, _) -> 3
Leaf -> 2
Node(_, Leaf) -> 1
Leaf -> 3
=>
Node(Node(_, _), Node(_, _)) -> 3
Node(Leaf, Node(_, _)) -> 2
Node(_, Leaf) -> 1
Leaf -> 3