Well, everybody knows that well-typed programs can’t go wrong and we should encode important properties of the domain in types but it seems that in real world it can lead to some complications and support difficulties. I’m talking about the compiler parse tree where tuples could have size < 2 but it is being checked in runtime. Also @gsg mentiones something like that in the last year thread:
Writing a predicate and asserting it will require either a pretty large amount of testing, or using a formal verification tools which will check that every execution path doesn’t break an assertion. Let’s imagine that this tool:
works reasonably fast, i.e. can analyze two functions f and bseparately getting two function summaries (it maybe not a precise term but I will stick with it) and construct a functional summary for composition of f and b only my composing two summaries (without reanalyzing f in context where an argument is known to be b)
precise: don’t under approximate functions which may lead to false positives
automatic: in case when we only want to synthesize formal description of written program (in a subset of OCaml) and prove it separately by hand we can use CFML.
useful for many programs: tells yes or no rather often, doesn’t allow SMT solver to hang too much
supports whole language hence can be useful in production (if we don’t want a whole language we should look at MoCHi.
Now everybody can start awakening. Will this hypothetical tool be useful? Am I missing any aspect that makes this approach not working in the real world? Is it worth spending time on this? I know, the task is rather ambitious but I know the guy who claims that he (kind of) solved compositionality problem in presence of high order functions and mutation by reduction to 1-st order Horn Clauses (that’s why I added kind of).
As far as I know, some of the complexity comes from specifying recursive functions and loops. For these, it is not trivial for the tool to infer the invariant (or the function summary as you call it), and in the general case it is even undecidable, if I recall correctly.
Consider the following simple recursive function:
let rec f i = if i <= 1 then 1 else f (i - 1) + f (i - 2)
Without an annotation written by the user, what would be the sumarry of f ? It might be enough for some programs to know that f x >= 1 (without any precondition on x), but that’s an over-approximation which may not be enough, in some cases, thus you might need at some point to prove that it computes the n-th fibonacci number, which I’m doubtful any tool would be able to guess/infer.
Well, Why3 has a by design inconvenience: it could be difficult to specify all what you want (inventing lemmas, specifying loop invariants). There is a little bit of hype in the world of SMT solvers nowadays. They add a lot of heuristics to make SMT solver generate invariants itself, invent right lemmas and prove something using this lemmas and return a decision about provided input with some useful lemmas that can be reused later. Maybe it will be a fun to add something like that into Why3 (if is it not yet done, I did only brief look). Maybe even a publishable fun.
And Imandra has very nice looking website but bound model checking under the hood is boring, induction proofs resemble Coq ones and the first demo not from the tutorial I tried has failed: bound model checking can’t prove this and [@@induction] can’t too. But some SMT solvers definitely has an heuristic for this! This is the demo query that I tried
Imandra uses some particular mix of induction and SMT under the hood (Zipperposition is not the same kind of prover). Why3 isn’t really “SMT” either, it’s a sophisticated system for generating verification conditions from ML, and then it can use SMT solvers or Coq to discharge proof obligations — I don’t know if they have tried to heuristically perform induction themselves or not.
Your example works if you define list_sum as follows:
With Imandra, we have a definitional principle built upon termination proofs. When a function is defined, we must prove it terminating (in general using ordinals - Imandra has as a well-founded relation the << function on ordinals up to epsilon_0). This guarantees that every extension by definition of Imandra’s logical world is a conservative extension.
A beautiful aspect of this (the idea going back to Boyer-Moore, and even McCarthy to some extent!) is that from every terminating recursion scheme, we can derive a sound induction scheme by ‘dualizing’ the relevant ordinal measure.
These induction schemes derived from functions are called ‘recursion induction’ or ‘functional induction’ schemes. We can instruct Imandra to use these with the [@@induct functional foo] attribute.
For example, to prove your theorem relating List.fold_left and List.fold_right, it’s not enough to just use structural induction on lists (xs). We need to use a functional induction scheme derived from the definition of List.fold_left, so that we account for the accumulator in the inductive hypothesis.
Here’s how you can do it:
verify (fun i xs ->
List.fold_left (+) i xs =
List.fold_right (+) xs ~base:i
) [@@induct functional List.fold_left];;
Notice in Imandra’s proof, we then make use of the following induction scheme (and then proceed to another sub-induction to prove an important derived fact about List.fold_right!), as we desired:
BTW, re @Kakadu’s statement that bounded model checking is boring: Such things are no doubt in the eye of the beholder, but I’ll tell you something that I think is much more boring: wasting time trying to prove false goals!
In my experience, if you’re verifying almost anything nontrivial, in the absence of powerful tools for searching for counterexamples (e.g., bounded model checking extended with fair recursive function unrolling), the biggest time sink lies in trying to prove false goals.
Automatic verification procedures that are ‘complete for counterexamples’ and, via reflection, allow you to compute with the found counterexamples in the same environment as the executable formal models (e.g., via Imandra’s generated CX modules) help tremendously in the iterative process of refining both a model and your conjectures about it until they’re actually true and worth investing the time to fully verify :-).
@grantpassmore Do you have a paper (a draft of paper) about what is happening under the hood during inductive proof construction.
I got your explanation about practical usefulness of bounded model checking. It’s interesting idea, thanks. I’m surrounded by folks who believe that computer-human interaction during proof construction will not make into the real world industry with mainstream languages (partially because very big code bases that are very well tested are difficult to abandon), so your way of thinking was sitting in the shadows.