7. Other Input Formats¶
Why3 can be used to verify programs written in languages other than WhyML. Currently, Why3 supports tiny subsets of C and Python, respectively coined micro-C and micro-Python below. These were designed for teaching purposes. They come with their own specification languages, written in special comments. These input formats are described below.
Why3 also supports the format MLCFG, which is an extension of WhyML allowing to define function bodies in an unstructured style, with goto statements.
Why3 also supports the format Coma (which stands for Continuation Machine). Coma is an Intermediate Verification Language implemented as a plugin in the Why3 platform.
Any Why3 tool (why3 prove, why3 ide, etc.)
can be passed a file with a suffix .c, .py, .coma or
.mlcfg, which triggers the corresponding input format.
7.1. micro-C¶
Micro-C is a valid subset of ANSI C. Hence, micro-C files can be passed to a C compiler.
7.1.1. Syntax of micro-C¶
Logical annotations are inserted in special comments starting
with //@ or /*@. In the following grammar, we
only use the former kind, for simplicity, but both kinds are allowed.
file ::=decl* decl ::=c_include|c_function|logic_declarationc_include ::= "#include" "<" file-name ">"
Directives #include are ignored during the translation to
Why3. They are allowed anyway, such that a C source code using
functions such as printf (see below) is accepted by a C compiler.
Function definition
c_function ::=return_typeidentifier "("params? ")"spec*blockreturn_type ::= "void" | "int" params ::=param(","param)* param ::= "int" identifier | "int" identifier "[]"
Function specification
spec ::= "//@" "requires"term";" | "//@" "ensures"term";" | "//@" "variant"term(","term)* ";"
C expression
expr ::= integer-literal | string-literal
| identifier | identifier ( "++" | "--" ) | ( "++" | "--" ) identifier
| identifier "[" expr "]"
| identifier "[" expr "]" ( "++" | "--")
| ( "++" | "--") identifier "[" expr "]"
| "-" expr | "!" expr
| expr ( "+" | "-" | "*" | "/" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) expr
| identifier "(" (expr ("," expr)*)? ")"
| "scanf" "(" '"%d"' "," "&" identifier ")"
| "(" expr ")"
C statement
stmt ::= ";"
| "return" expr ";"
| "int" identifier ";"
| "int" identifier "[" expr "]" ";"
| "break" ";"
| "if" "(" expr ")" stmt
| "if" "(" expr ")" stmt "else" stmt
| "while" "(" expr ")" loop_body
| "for" "(" expr_stmt ";" expr ";" expr_stmt ")" loop_body
| expr_stmt ";"
| block
| "//@" "label" identifier ";"
| "//@" ( "assert" | "assume" | "check" ) term ";"
block ::= "{" stmt* "}"
expr_stmt ::= "int" identifier "=" expr
| identifier assignop expr
| identifier "[" expr "]" assignop expr
| expr
assignop ::= "=" | "+=" | "-=" | "*=" | "/="
loop_body ::= loop_annot* stmt
| "{" loop_annot* stmt* "}"
loop_annot ::= "//@" "invariant" term ";"
| "//@" "variant" term ("," term)* ";"
Note that the syntax for loop bodies allows the loop annotations to be placed either before the block or right at the beginning of the block.
Logic declarations
logic_declaration ::= "//@" "function" "int" identifier "(" params ")" ";"
| "//@" "function" "int" identifier "(" params ")" "=" term ";"
| "//@" "predicate" identifier "(" params ")" ";"
| "//@" "predicate" identifier "(" params ")" "=" term ";"
| "//@" "axiom" identifier ":" term ";"
| "//@" "lemma" identifier ":" term ";"
| "//@" "goal" identifier ":" term ";"
Logic functions are limited to a return type int.
Logical term
term ::= identifier
| integer-literal
| "true"
| "false"
| "(" term ")"
| term "[" term "]"
| term "[" term "<-" term "]"
| "!" term
| "old" "(" term ")"
| "at" "(" term "," identifier ")"
| "-" term
| term ( "->" | "<->" | "||" | "&&" ) term
| term ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) term
| term ( "+" | "-" | "*" | "/" | "% ) term
| "if" term "then" term "else term
| "let" identifier "=" term "in" term
| ( "forall" | "exists" ) binder ("," binder)* "." term
| identifier "(" (term ("," term)*)? ")"
binder ::= identifier
| identifier "[]"
7.1.2. Built-in functions and predicates¶
C code
scanfis limited to the syntaxscanf("%d", &x).printfis limited toprintf(string-literal, expr1, ..., exprn). The string literal should contain exactlynoccurrences of%d(not checked by Why3).rand()returns a pseudo-random integer in the range0toRAND_MAXinclusive.
Logic
int length(int a[])returns the length of arraya.int occurrence(int v, int a[])returns the number of occurrences of the valuevin arraya.
7.2. micro-Python¶
Micro-Python is a valid subset of Python 3. Hence, micro-Python files can be passed to a Python interpreter.
7.2.1. Syntax of micro-Python¶
Notation: In the grammar of micro-Python given below,
special symbols NEWLINE, INDENT,
and DEDENT mark an end of line, the beginning of a new
indentation block, and its end, respectively.
Logical annotations are inserted in special comments starting with #@.
file ::=decl* decl ::=py_import|py_constant|py_function|stmt|logic_declarationpy_import ::= "from" identifier "import" identifier ("," identifier)* NEWLINE
Directives import are ignored during the translation to
Why3. They are allowed anyway, such that a Python source code using
functions such as randint is accepted by a Python
interpreter (see below).
Constant definition
py_constant ::= "#@" "constant" NEWLINE identifier "=" expr NEWLINE
Function definition
py_function ::=logic_def? "def" identifier "("params? ")"return_type? ":" NEWLINE INDENTspec*stmt* DEDENT params ::=param(","param)* param ::= identifier (":"py_type)? return_type ::= "->"py_typepy_type ::= identifier ("["py_type(","py_type)* "]")? | "'" identifier
Function specification
logic_def ::= "#@" "function" NEWLINE spec ::= "#@" "requires"termNEWLINE | "#@" "ensures"termNEWLINE | "#@" "variant"term(","term)* NEWLINE
Python expression
expr ::= "None" | "True" | "False" | integer-literal | string-literal
| identifier
| identifier "[" expr "]"
| "-" expr | "not" expr
| expr ( "+" | "-" | "*" | "//" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "and" | "or" ) expr
| identifier "(" (expr ("," expr)*)? ")"
| expr ("," expr)+
| "[" (expr ("," expr)*)? "]"
| expr "if" expr "else" expr
| "(" expr ")"
Python statement
stmt ::=simple_stmtNEWLINE | "if"expr":"suiteelse_branch| "while"expr":"loop_body| "for" identifier "in"expr":"loop_bodyelse_branch ::= /* nothing */ | "else:"suite| "elif"expr":"suiteelse_branchsuite ::=simple_stmtNEWLINE | NEWLINE INDENTstmtstmt* DEDENT simple_stmt ::=expr| "return"expr| identifier "="expr| identifier "["expr"]" "="expr| "break" | "#@" "label" identifier | "#@" ( "assert" | "assume" | "check" )termloop_body ::=simple_stmtNEWLINE | NEWLINE INDENTloop_annot*stmtstmt* DEDENT loop_annot ::= "#@" "invariant"termNEWLINE | "#@" "variant"term(","term)* NEWLINE
Logic declaration
logic_declaration ::= "#@" "function" identifier "(" params ")" return_type? ("variant" "{" term "}")? ("=" term)? NEWLINE
| "#@" "predicate" identifier "(" params ")" ("=" term)? NEWLINE
| "#@" "axiom" identifier ":" term NEWLINE
| "#@" "lemma" identifier ":" term NEWLINE
Note that logic functions and predicates cannot be given definitions.
Yet, they can be axiomatized, using toplevel assume statements.
Logical term
term ::= identifier
| integer-literal
| "None"
| "True"
| "False"
| "(" term ")"
| term "[" term "]"
| term "[" term "<-" term "]"
| "not" term
| "old" "(" term ")"
| "at" "(" term "," identifier ")"
| "-" term
| term ( "->" | "<->" | "or" | "and" | "by" | "so" ) term
| term ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) term
| term ( "+" | "-" | "*" | "//" | "%" ) term
| "if" term "then" term "else term
| "let" identifier "=" term "in" term
| ( "forall" | "exists" ) param ("," param)* "." term
| identifier "(" (term ("," term)*)? ")"
7.2.2. Built-in functions and predicates¶
Python code
built-in function
powover integerslen(l)returns the length of listl.int(input())reads an integer from standard input.range(l, u)returns the list of integers fromlinclusive touexclusive. In particular,for x in range(l, u):is supported.randint(l, u)returns a pseudo-random integer in the rangeltouinclusive.
Logic
len(l)returns the length of listl.occurrence(v, l)returns the number of occurrences of the valuevin listl.
7.2.3. Limitations¶
Python lists are modeled as arrays, whose size cannot be modified.
7.3. MLCFG¶
The MLCFG language is an experimental extension of the regular WhyML language, in which the body of program functions can be coded using labelled blocks and goto statements. MLCFG can be used to encode algorithms which are presented in an unstructured fashion. It can also be used as a target for encoding unstructured programming languages in Why3, for example assembly code.
7.3.1. Syntax of the MLCFG language¶
The MLCFG syntax is an extension of the regular WhyML syntax. Every
WhyML declaration is allowed, plus an additional declaration of
program function of the following form, introduced by keywords let cfg:
let cfg f (x1: t1) ... (xn: tn): t
requires { Pre }
ensures { Post }
=
var y1: u1;
...
var yk: uk;
{ instructions; terminator }
L1 { instructions1; terminator1 }
...
Lj { instructionsj; terminatorj }
It defines a program function f, with the usual syntax for
its contract. The difference is the body, which is made of a sequence
of declarations of mutable variables with their types, an initial block,
composed of a zero or more instructions followed by a terminator, and a
sequence of other blocks, each denoted by a label (\(L_1 \ldots L_j\) above).
The instructions are semi-colon separated sequences of regular
WhyML expressions of type unit, excluding return or absurd
expressions, or code invariants:
a code invariant:
invariant I { t }where I is a name and t a predicate. It is similar to an assert expression, meaning that t must hold when execution reaches this statement. Additionally, it acts as a cut in the generation of VC, similarly to a loop invariant. See example below.
Each block is ended by one of the following terminators:
a
gotostatement:goto Lwhere L is one of the labels of the other blocks. It instructs to continue execution at the given block.a
switchstatement, of the formswitch (e) | pat1 -> terminator1 ... | patk -> terminatork end
a
returnstatement:return expran
absurdstatement: indicating that this block should be unreachable.
The extension of syntax is described by the following rules.
file ::=module* module ::= "module"identdecl* "end" decl ::= "let" "cfg"cfg_fundef| "let" "rec" "cfg"cfg_fundef("with"cfg_fundef)* | "scope"identdecl* "end" cfg_fundef ::=identbinder+ :typespec"="vardecl* "{"block"}"labelblock* vardecl ::= "var"ident* ":"type";" | "ghost" "var"ident* ":"type";" block ::= (instruction";")*terminatorlabelblock ::=ident"{"block"}" instruction ::=expr| "invariant"ident"{"term"}" terminator ::= | "return"expr| "absurd" | "goto"ident| "switch" "("expr")"switch_case* "end" switch_case ::= "|"pattern"->"terminator
7.3.2. An example¶
The following example is directly inspired from the documentation of the ANSI C Specification Language (See [BFM+18], Section 2.4.2 Loop invariants, Example 2.27). It is itself inspired from the first example of Knuth’s MIX language, for which formal proofs were first investigated by J.-C. Filliâtre in 2007 ([Fil07]), and also revisited by T.-M.-T. Nguyen in her PhD thesis in 2012 ([Ngu12], Section 9.5 Translation from a CFG to Why, page 115).
This example aims at computing the maximum value of an array of integers. Its code in C is given below.
/*@ requires n >= 0 && \valid(a,0,n);
@ ensures \forall integer j ; 0 <= j < n ==> \result >= a[j]);
@*/
int max_array(int a[], int n) {
int m, i = 0;
goto L;
do {
if (a[i] > m) { L: m = a[i]; }
/*@ invariant
@ 0 <= i < n && \forall integer j ; 0 <= j <= i ==> m >= a[j]);
@*/
i++;
}
while (i < n);
return m;
}
The code can be viewed as a control-flow graph as shown in Fig. 7.1.
![digraph G {
graph [bgcolor=lightgray; rankdir="TB"];
node [margin=0.05,shape=ellipse,style=filled,fillcolor="cyan"];
"entry" [pos="2,3!"];
"L" [pos="2,2!"];
"invariant" [pos="2,1!"];
"cond" [pos="2,0!"];
"do" [pos="0,1!"];
"exit" [pos="4,0!"];
"entry" -> "L" [label=" i <- 0"];
"L" -> "invariant" [label=" m <- a[i]"];
"invariant" -> "cond" [label=" i <- i+1"];
"cond" -> "exit" [label=" i >= n"];
"cond" -> "do" [label=" i < n"];
"do" -> "L" [label=" a[i] > m"; headport=e];
"do" -> "invariant" [label=" a[i] <= m"];
}](_images/graphviz-27fb1d6e15443ef5740f34deb31ffdd04481b0c6.png)
Fig. 7.1 Control-flow graph of the max_array function.¶
Below is a version of this code in the MLCFG language, where label
L corresponds to node L, label L1 to node invariant,
label L2 to node do.
let cfg max_array (a:array int) : (max: int, ghost ind:int)
requires { length a > 0 }
ensures { 0 <= ind < length a }
ensures { forall j. 0 <= j < length a -> a[ind] >= a[j] }
=
var i m: int;
ghost var ind: int;
{
i <- 0;
goto L
}
L {
m <- a[i];
ind <- i;
goto L1
}
L1 {
invariant i_bounds { 0 <= i < length a };
invariant ind_bounds { 0 <= ind < length a };
invariant m_and_ind { m = a[ind] };
invariant m_is_max { forall j. 0 <= j <= i -> m >= a[j] };
(* yes, j <= i, not j < i ! *)
i <- i + 1;
switch (i < length a)
| True -> goto L2
| False -> return (m, ind)
end
}
L2 {
switch (a[i] > m)
| True -> goto L
| False -> goto L1
end
}
The consecutive invariants act as a single cut in the generation of VCs.
7.3.3. Error messages¶
The translation from the MLCFG language to regular WhyML code may raise the following errors.
“cycle without invariant”: in order to perform the translation, any cycle on the control-flow graph must contain at least one
invariantclause. It corresponds to the idea that any loop must contain a loop invariant.“cycle without invariant (starting from I)”: same error as above, except that the cycle was not reachable from the start of the function body, but from the other
invariantclause named \(I\).“label L not found for goto”: there is a
gotoinstruction to a non-existent label.“unreachable code after goto”: any code occurring after a
gotostatement is unreachable and is not allowed.“unsupported: trailing code after switch”: see limitations below.
7.3.4. Current limitations¶
There is no way to prove termination.
New keywords
cfg,goto,switch, andvarcannot be used as regular identifiers anymore.Trailing code after
switchis not supported. In principle, it should be possible to have aswitchwith typeunitand to transfer the execution to the instructions after theswitchfor branches not containinggoto. This is not yet supported. A workaround is to place the trailing instructions in another block and pose an extragototo this block in all the branches that do not end with agoto.Conditional statements
if e then i1 else i2are not yet supported, but can be simulated withswitch (e) | True -> i1 | False -> i2 end.
7.3.5. Alternative translation scheme: Stackify¶
An alternative translation scheme from MLCFG to regular WhyML
can be triggered by putting the attribute [@cfg:stackify] on
a function. This method attempts to recover a more
structured program body, reconstructing loops when possible.
7.3.6. Subregion analysis¶
Additional invariants on the generated WhyML code can be inferred by
putting the attribute [@cfg:subregion_analysis] on a
function. These invariants are derived by a static analysis of the
subregions that are never modified in a loop.
7.4. S-expressions¶
This format is intended to be used only as an intermediate language
format. Why3 accepts input files in the format of a big S-expression,
mirroring the internal structure of parse trees, as they are defined
in the module Ptree of the OCaml API. The S-expression in question
is supposed to be generated by the API itself, that is the function
Ptree.sexp_of_mlw_file. It is in fact exactly the format that is
also printed using why3 pp --output with --output=sexp.
Notice that this input format is available only when Why3 is compiled
with S-expressions support provided by the ppx_sexp_conv OCaml
library.
Concerning the use of Why3 as an intermediate language, via the
Ptree API, the use of S-expressions instead of the regular WhyML
syntax is recommended since the printing and re-parsing of parse trees
is automatically generated, hence the result is guaranteed to
faithfully mirror the internal Ptree representation.
7.5. Coma¶
The Coma documentation is described in https://coma-ivl.codeberg.page/.