Is F* ready for its first F* vs ‘pick your favourite language or proof

assistant’ flame war? Can it take on heavyweights such as Haskell and Coq?

Its list of supported features is full of imaginative PL jargon such as

polymorphism, dependent types, user-defined monadic effects, refinement

types, weakest precondition calculus, predicative hierarchy of universes

with universe polymorphism, rich type-level computation, proofs by

reification and so forth, but does F* solve real world problems (TM) or does

one need a PhD in type theory to use it?

Is F* ready for prime time?

Of course the answer to this question will always depend on *who* you are

and what you want to do? I could write another post about *what* I am doing

and *how* I got to work on F* in the first place, but let it suffice to say

that my PhD is in cryptography, not type theory and involved mathematical

models and proofs mostly done on paper. See project everest

on how to encode such models in F*. This means that my programming skills

usually are a bit Rusty, no pun intended.

As to the *what*. About a week ago I came across Joachim Breitner’s inspiring

post on *finding bugs in Haskell code by proving it*. Pitched at the right

level, it is an excellent post for both the applied Haskell programmer and

someone like me curious about Haskell and Coq in theory—without a lot of

exposure to them in practice. What is nice about these two languages (and F*

too) is that general knowledge of mathematical notation and functional

programming goes some way towards understanding the gist of the code. Its

concise and engaging programming style motivated me to dust off my F* skills.

So instead of parsing all of the Haskell and Coq code and installing all the

tools to play with it, I decided to explore the example by re-verifying it in

F*.

This ended up being a lengthy post. Here is the synopsis to enable cherry picking.

The two threads of the post are sandwitched between general F* background

on datatypes and input/output. The first threat treats the actual interval

intersection code and the invariant Joachim proves about it. The second

thread proves functional correctness using a set semantic for intervals.

module IntervalIntersect #set-options "--use_two_phase_tc true"

Activates the new two phase commit feature.

# The importance of libraries

Most of the code is about lists of integers. Luckily, F* has a decent list

library. Instead of mathematical integers, I use OCaml’s native Int63 for

the best performance. As explained by Joachim, verification loves

termination. Se we use a list library of pure and total operations for both

implementation and specifications. In particular list operations are

provably terminating.

open FStar.List.Tot open FStar.Math.Lib open FStar.Int64 // change to re-verify with different machine integer type offset = t

As in its Haskell version, I represent intervals as a datatype constructor.

Note that from is inclusive and to is exclusive.

type interval = | I: from:offset -> to:offset -> interval

F* automatically generates accessor functions for the named arguments of the

dependently typed constructor. Given i:interval, I?.from i returns the

start of the interval. Datatypes with a single constructor, also allow for

the usual ‘.’ notation, e.g., i.from, which makes them a convenient

alternative to records.

# Part 1: Proving the intervals invariant

We define the goodness of interval lists in the same way as Joachim did in

Coq. A good interval list is ordered and consists of non-empty disjoint

intervals.

let rec goodLIs (is:list interval) (lb:offset) = match is with | [] -> true | I f t :: is -> lb <=^ f && f <^ t && goodLIs is t

Contrary to Joachim’s informal claims, already in his Coq development

intervals are however not necessarily non-adjacent. This is a good example

for a divergence between formal spec and human intuitive expectations. Just

because we verified something, doesn’t necessarily mean that we get what we

want. Something has to give, either we remove the informal non-adjacency

requirement, or strengthen the goodLIs predicate to require lb < f.

let good is = match is with | [] -> true | I f t :: _ -> goodLIs is f

Coq is then used to prove that the intersect of two good interval

lists is itself a good interval list. Upon investigation, this invariant

should in fact hold at all times, so I make good use of F*’s refinement

types, to add the requirement to the intervals.

let intervals = is:list interval{good is}

This is a first example for F*’s closely intertwined programming and

verification style. Often the Z3 SMT solver can prove refinements automatically by

discharging them as SMT-LIB queries.

The following function is needed in the termination argument for the

intersect function. I will get back to it in a bit.

let needs_reorder (is1 is2:intervals) : nat = match is1, is2 with | I f1 t1 :: _, I f2 t2 :: _ -> if t1 <^ t2 then 1 else 0 | _, _ -> 0

Joachim is right, the code of intersect is the kind of functional code

that is pleasant to write: A small local recursive function, a few guards to

do case analysis, done. That is when you have the power of Haskell at your

hand. Unfortunately, when clauses are not yet

supported (issue64) by F* for

verification. But they will be one day, so for now I am using nested if

expressions. Oh horror.

Another difference from the Haskell code is that I lifted local functions to

the top level where they can be used in lemmas and instead declare them as

private. Also, as I learned the hard way, while in principle supported,

local functions are still somewhat buggy (issue1361).

let max a b = int_to_t (max (v a) (v b)) //#set-options "--z3rlimit 40 --max_ifuel 7" private let rec go (is1 is2:intervals) : Pure intervals (requires True) (ensures (fun ris -> ( is1=[] \/ is2=[] ==> ris=[]) /\ ( Cons? is1 /\ Cons? is1 /\ Cons? ris ==> (hd ris).from >=^ max (hd is1).from (hd is2).from ) )) (decreases %[List.length is1 + List.length is2; needs_reorder is1 is2]) =

The termination argument uses the lexicographic ordering of two values: the

joint length of the lists and a bit indicating whether a reordering is

necessary. To prove that the function terminates, at every

recursive call either the first value decrease or the second value decreases

while the first stays equal.

match is1, is2 with | _, [] -> [] | [], _ -> [] | i1::is1, i2::is2 -> begin let f' = max i1.from i2.from in // reorder for symmetry if i1.to <^ i2.to then go (i2::is2) (i1::is1) // disjoint else if i1.from >=^ i2.to then go (i1::is1) is2 // subset else if i1.to = i2.to then I f' i1.to :: go is1 is2 // overlapping else I f' (i2.to) :: go (I (i2.to) (i1.to) :: is1) is2 end let intersect (is1 is2:intervals) = go is1 is2

We have now implemented intersect, but at the same time we have also

defined and proven the good invariant on intervals. In the Coq proof

script this is achieved by the intersect_good lemma and about 70 lines of

proof scripts. Not only do we not have to write this lemma, but F*’s

automation really shines in this example. All it needs is the much simpler

post-condition that the result intervals are either empty, or start from a

value that is the maximum of the two input interval lists. It is easy enough

to convince oneself that this invariant holds, but why is it sufficient for

Z3 to complete the proof? As Arthur C. Clark famously said: Any sufficiently

advanced technology is indistinguishable from magic.

I am cryptographer and we like to give paper proofs of results that seem

almost magical. Naturally, the proof is by induction. So after a recursive

call to go, the induction hypothesis is available, and we know that the

returned intervals are good. The SMT solver can also prove locally that

the interval prepended to this result is non-empty, what remains to be shown

is that together they are ordered. That is where knowing that the from of

the first result intervals is the max saves the day.

# Part 2: Proving functional correctness

I could stop and declare victory here, 1:0 for FStar. Arguable there are a

few blemishes, like the F* bug with local functions that I discovered, or

the low priority given to a weakes precondition calculus supporting when

clauses, but overall I was positively surprised by the speed at which I

could translate the Haskell and Coq code to F* and by how little I had to

do in terms of writing proofs.

Mindful of Clarke’s second law that the only way of discovering the limits

of the possible is to venture a little way past them into the impossible, I

wont stop when its easy. The second property Joachim proved about his

intersect function was that it corresponds to its corresponding function.

This sounds a bit tautological. What is meant here is that it corresponds to

another way of representing intervals and another way of implementing

intersection in that representation. Ideally this alternative way is *in
some sense* more pure and mathematical.

# The mathematical representation of intervals

In mathematics, intervals are of course traditionally represented as sets, and

interval intersection simply becomes set intersection. Luckily, the

FStar.Set module already provides us with set operations proven to be pure

and total.

However, unlike Coq.Sets.Ensembles the representation of set a as a function

a -> bool is abstract in F*. Consequently, we cannot define range using a

lambda expression. This is what is lost, what do we then gain by hiding the

representation of sets behind an abstraction? We can change the way sets are

represented in FStar.Set under the hood, without affecting users of the

library. Indeed F* libraries often have two implementations, one acting as a

model to prove that the pre- and post-conditions of the API do not allow to

prove False, and an actual implementation for running the actual program. In

fact different extraction targets, such as OCaml and F# would typically provide

different implementations, e.g. the OCaml implementation of many libraries

modules is based on Batteries. [1]

[1] | For verification to be meaningful, it is important that these alternative implementations are correct. In fact, while developing this post I observed that the BatSet-based OCaml implementation of FStar.Set assumed that set elements were compareable, which can be a source of unsoundness. Another F* bug that I could fix. |

The declarative definition of a set using a predicate that all set members

need to satisfy is known in mathematics as an intensional definition.

Certain sets, for example the set of even numbers can only be defined

intensionally. My intuition is that intensional definitions are more concise

and better suited for proof, conversely, extensional definitions

explicitly guarantee that sets are finite.

To get the best of both worlds, I give both an intensional and an

extensional description of integer ranges, but make the intensional

definition ghost. This means that it never has to be extracted to OCaml code.

let rangeGT (f t:offset): GTot (Set.set offset) = Set.intension (fun z -> f <=^ z && z <^ t)

Because of the immaturity of F* and the need for abstraction to

simultaneously support verification and extraction, I had to fix and extend

the FStar.Set library itself, rather than just add additional lemmas in a

separate file as in the Coq development. I use the ghostly intensional

definition to specify the logical properties of the extensional definition.

let rec range (f:offset) (t:offset): Tot (r:Set.set offset{r==rangeGT f t}) (decreases %[v t- v f]) = if f>=^t then ( Set.lemma_equal_elim (rangeGT f t) Set.empty; Set.empty ) else ( Set.lemma_equal_elim (rangeGT f t) (Set.union (Set.singleton f) (rangeGT (f+^ int_to_t 1) t )); Set.union (Set.singleton f) ( range (f+^ int_to_t 1) t ) )

The proof that the extensionally defined set is the same as the

intensionally defined set uses induction, and the set equality [f,t] = {f}

u [f+1,t].

Using range and following the Coq code, the semI and sem functions

define the semantic of bounds based interval representations in terms of

integer sets.

let semI (i : interval) : Set.set offset = range i.from i.to let rec sem (is : intervals) : Set.set offset = match is with | [] -> Set.empty | (i :: is) -> Set.union (semI i) (sem is)

This simple lemma should move to FStar.Set, but this allows me to

explain some of F*’s automation power. The body of the lemma is straightforward

and the proof is by definition. But the SMT pattern ‘reminds’ the SMT solver to

prove disjointness whenever the intersection of two sets is empty. Below I

commented out two invocations of the lemma that are triggered by the pattern.

let lemma_disjoint_intro (#a:eqtype) (s1 s2:Set.set a) : Lemma (requires Set.equal (Set.intersect s1 s2) Set.empty) (ensures Set.disjoint s1 s2) [SMTPat (Set.disjoint s1 s2)] = ()

I use the lemma to prove that an interval is disjoint from an interval list

whenever their concatenation is good. The local lemma is inspired

by Joachim’s Intersection_range_semLIs_empty lemma. It expresses the same

idea and follows by induction on an increasing lower bound.

let lemma_semI_sem_disjoint (i:interval) (is:intervals) : Lemma (requires good (i::is)) (ensures (Set.disjoint (semI i) (sem is))) = let rec lemma_semI_sem_lb_disjoint (i:interval) (is:intervals) (lb:offset) : Lemma (requires goodLIs is lb /\ i.to <=^ lb) (ensures Set.disjoint (semI i) (sem is)) = if (Cons? is) then lemma_semI_sem_lb_disjoint i (tl is) (hd is).to else (); Set.lemma_equal_elim (Set.intersect (semI i) (sem is)) Set.empty in lemma_semI_sem_lb_disjoint i is i.to // ; lemma_disjoint_intro (semI h) (sem t) //needed if SMTPat is removed

## The heart of the proof

We now get into the heart of the correctness proof for intersect. Recall

that Joachim’s algorithm (after switching) distinguishes three cases about

the head elements of two interval lists:

- the heads are disjoint
- the second head is a subset of the first
- the two heads are overlapping

We prove a lemma for each case.

The first lemma states that if the heads are disjoint, then the head with

the smaller to can be dropped from the intersect computation.

let lemma_disjoint_prefix (is1:intervals{Cons? is1}) (is2:intervals{Cons? is2}) : Lemma (requires (hd is1).from >=^ (hd is2).to ) (ensures (Set.intersect (sem is1) (sem is2) == Set.intersect (sem (is1)) (sem (tl is2)))) = let h2 = hd is2 in lemma_semI_sem_disjoint h2 is1; Set.lemma_equal_elim (Set.intersect (sem is1) (sem is2)) (Set.intersect (sem (is1)) (sem (tl is2)))

The second lemma states that if the second head is a subset of the first,

then the result of intersect is the union of that subset and the

intersect computation with both heads dropped.

let lemma_subset_prefix (is1:intervals{Cons? is1}) (is2:intervals{Cons? is2}) : Lemma (requires (hd is1).to = (hd is2).to /\ (hd is1).from <^ (hd is2).to) (ensures ( let h1::t1, h2::t2 = is1,is2 in let f' = max h1.from h2.from in Set.intersect (sem is1) (sem is2) == Set.union (semI (I f' h2.to)) (Set.intersect ( sem t1) ( sem t2)) )) = let h1::t1,h2::t2 = is1, is2 in let f' = max h1.from h2.from in lemma_semI_sem_disjoint h2 t1; lemma_semI_sem_disjoint h1 t2; Set.lemma_equal_elim (Set.intersect (sem is1) (sem is2)) (Set.union (semI (I f' h2.to)) (Set.intersect ( sem t1) ( sem t2)) )

This is the hardest of the three cases and we ramp up the SMT solvers

timeout to prove it.

#set-options "--z3rlimit 15"

No, unfortunately this is not a panacea and should only be used when

verification is unstable. In other situations, it is a proven recipe for

spending too much time idlying just to get the infamous “could not prove

post-condition” error. Time that one should instead spend on thinking about the

nature of the proof.

To shortcut the thinking a bit, I looked at the Coq proof script. While most

of it was pretty obfuscated without a detailed knowledge of Coq tactics

and the ability to see the proof state, it revealed some crucial hints.

One was the equivalent of the lemma_semI_sem_lb_disjoint lemma and its

proof. The other was the importance of set operation distributivity for the

proof.

let rec lemma_overlapping_prefix (is1:intervals{Cons? is1}) (is2:intervals{Cons? is2}) : Lemma (requires (hd is1).to >^ (hd is2).to /\ (hd is1).from <^ (hd is2).to) (ensures ( let h1::t1 = is1 in let h2::t2 = is2 in let f' = max h1.from h2.from in Set.intersect (sem is1) (sem is2) == Set.union (semI (I f' h2.to)) (Set.intersect (Set.union (range (h2.to) (h1.to)) (sem t1)) (sem t2)) )) = let h1::t1 = is1 in let h2::t2 = is2 in let f' = max h1.from h2.from in

The assert below expresses the outcome of the repeated application of set

distributive law to state the following equality:

- (h1 u t1) n (h2 u t2) = (h1 n (h2 u t2)) u (t1 n (h2 u t2))
- = (h1 n h2) u (h1 n t2) u (t1 n h2) u (t1 n t2)

let is1_n_is2 = Set.intersect (sem is1) (sem is2) in assert (Set.equal is1_n_is2 (Set.union (Set.union (Set.intersect (semI h1) (semI h2)) (Set.intersect (semI h1) (sem t2))) (Set.union (Set.intersect (semI h2) (sem t1)) (Set.intersect (sem t1) (sem t2))))); The rest of the proof simplifies the first three of the four intersects: 1. h1 n h2 = [f', h2.to] .. code:: assert (Set.equal (Set.intersect (semI h1) (semI h2)) (semI (I f' h2.to))); 2. h1 n t2 = [h2.to h1.to] n t2 .. code:: lemma_semI_sem_disjoint (I h1.from h2.to) t2; assert (Set.equal (range h1.from h1.to) (Set.union (range h1.from h2.to) (range h2.to h1.to))); assert (Set.equal (Set.intersect (range h1.from h1.to) (sem t2)) (Set.intersect (range h2.to h1.to) (sem t2))); 3. h2 n t1 = empty .. code:: lemma_semI_sem_disjoint h2 t1;

The proof now applies the distributive law in the other direction to combine

the 3rd and 4th intersect:

Set.lemma_equal_elim is1_n_is2 (Set.union (semI (I f' h2.to)) (Set.intersect (Set.union (range (h2.to) (h1.to)) (sem t1)) (sem t2)) )

The final proof is by case analysis, in each step relying on the induction

hypothesis and one of the lemmas. Again the asserts document document the

proof.

let rec lemma_intersection_spec (is1:intervals) (is2:intervals) : Lemma (ensures (Set.equal ( sem (intersect is1 is2) ) (Set.intersect (sem is1) (sem is2)))) (decreases %[List.length is1 + List.length is2; needs_reorder is1 is2]) = match is1, is2 with | _, [] -> () | [], _ -> () | i1::is1, i2::is2 -> begin let f' = max (i1.from) (i2.from) in // reorder for symmetry if i1.to <^ i2.to then lemma_intersection_spec (i2::is2) (i1::is1) // disjoint: i2.from < i2.to <= i1.from < i1.to else if i1.from >=^ i2.to then ( // lemma_disjoint_intro (semI i1) (semI i2); //needed if SMTPat is removed assert (Set.disjoint (semI i1) (semI i2)); lemma_intersection_spec (i1::is1) is2; lemma_disjoint_prefix (i1::is1) (i2::is2) ) // subset: min (i1.from) (i2.from) <= f' <= i1.to=i2.to, else if i1.to = i2.to then ( assert (Set.equal (Set.intersect (semI i1) (semI i2)) (semI (I f' i1.to))); lemma_intersection_spec is1 is2; lemma_subset_prefix (i1::is1) (i2::is2) ) // overlapping: min (i1.from) (i2.from) <= f' <= i2.to < i1.to else ( lemma_intersection_spec (I (i2.to) (i1.to) :: is1) is2; lemma_overlapping_prefix (i1::is1) (i2::is2) ) end

The following pragma resets the SMT solver to its original resource limit (roughly 5 seconds):

#reset-options

## Taking stock: intrinsic vs. extrinsic, intensional vs extensional

Clearly the second part of verifying intersect was much harder. While in the

first we used F* essentially as a programming language with stronger types and

great automation, in the second we relied on hand-written lemmas and

proofs.

The two parts are also examples of what in F* jargon are called intrinsic and

extrinsic verification styles. In the intrinsic verification style, the property

is expressed in the function definition itself, e.g., by refining operand and

return types, while in the extrinsic verification style, the property is

expressed as a lemma and proved separately.

The code/definition to proof ratio is larger in part 2, but is still

substantially smaller than in the Coq development. In addition to the proof

scripts, Joachim’s development relied on about 160 lines of lemma code about Coq

ensembles, compared to the additional 10 additional lines in FStar.Set to

support intensional set definitions, and the lemma_disjoint_intro lemma.

Arguably this is not what Jachim optimised for. True to the title of the blog

his priority was to prove the code and find bugs, rather than find the most

elegant proof.

I in turn dedicated some time to cleaning up the code and to makeing it

robust to the unpredictability of the SMT solver. I claim that well written

F* code is easier to maintain and adapt than Coq proof scripts. The

hs-to-coq authors understood the importance of proofs evolving with

programs, not the lease because programs often change. The concise and

closely integrated F* proof makes maintaining programs easier, not the

least because no translation is needed. Moreover, it acts as verified

documentation that, like good wine, keeps its value as code is handed down

through generations of programmers.

To validate my claim, I changed the representation of intervals from

mathematical integers to machine integers and this was indeed painfree.

# Extraction and testing

*Testing* too is crucial to program longevity and it is not something F*

has traditionally been good at. Like Haskell and Coq, F* has a bit of an

*if it typechecks it runs*, or even more extreme a *why run?* ethos. Just as

with a Coq proof, actually running an F* program can sometimes feel like an

unnecessary chore.

Lately however, as we ramp up the inter-operability testing of miTLS and

shipped production code developed using F* testing has become more important

A simple print and test function confirm that the code can be executed.

open FStar.All open FStar.IO open FStar.Printf

The single most successful debugging tool—and maybe the most iconic *C*

feature after ++—is printf. See this post

for a celebration of its arrival in F*.

let ppInterval (I f t) = sprintf "0x%d-0x%d" (v f) (v t)

I give two implementations for printing intervals. The first defines the function recursively, while the second uses a higher-order fold function.

let rec ppIntervals' (is:intervals): ML unit = match is with | [] -> stdout <| "." | i::is -> stdout <| ppInterval i; stdout <| " "; ppIntervals' is let toI f t = I (int_to_t f) (int_to_t t) let ppIntervals is = FStar.List.Tot.fold_left (sprintf "%s %s") "" (FStar.List.map ppInterval is) let main = stdout <| ppIntervals (intersect [toI 3 10; toI 10 15] [toI 1 4; toI 10 14])

# And the winner is!

A strange attachment bonds programmers and the language they program in. It

is as if the language becomes an extension of their arms, fingers, eyes, and

minds. Others have noted the addictive nature of programming and theorem

proving. The dopamine rush when a program compiles, runs, or verifies. The

language preference is thus a matter of taste. We are no golems, and won’t

cut off our arm lightly to replace it with a stronger version. Rather we are

like junkies who will go back to what we know to get our next fix.

The title of the post is thus more tongue in cheek. Nevertheless, there are

a few dimensions that one could look at if one wanted to do an objective

comparison, e.g., developer efficiency, the code to proof ratio, useability

and maturity of the system, and the size of the trusted computing base.

## Developer efficiency & code to proof ratio

I measure the success of a project more by the pleasure I get out of it than

by the time I put into it. So it won’t come as a surprise that I am not a

big fan of filling in time sheets. I am interested in surveillance and

anti-surveillance, and some day I might be tempted to use Joachim’s arbtt tool

suggestion for a self study on the effects of (self-)surveillance on

productivity. Until then I will keep it qualitative.

I worked on the blog on-off, for about a week, partly on a holiday to get

some much needed sun that I found beneficial for both coding and writing. I

was positively surprised, if not amazed, by the ease at which F* could be

used to verify side-effect free programs. I have used F* to verify stateful

programs in the past, but that felt substantially harder, likely because it

is simply a harder problem.

I am willing to concede that an experienced Coq expert will get the job done

faster than an F* programmer with only some experience, and, for now the

former will be easier to find.

I would argue that F* wins hands down when it comes to automation and proof

burdon. Coq enthusiasts could argue that automation tactics can bring down

the burdon, and that explicit proof scripts have their value. However, it is

my impression that F* gets a lot of leverage from being designed from the

ground up for automation—and F* tactics are on their way.

## Useability, maturity, and trusted computing base

There are many areas in which bugs in programs have become so expensive,

that programmers are willing to walk an extra mile to avoid bug infested

code. Modern programming languages and program verification both reduce

programming mistakes, and this can justify the effort of learning a new

tool.

There are similarities and synergies between the two. Modern languages,

such as strictly typed functional programming languages, ease (mathematical)

reasoning about programs and program verification. There are also differences:

The expressiveness and abstraction offered by a programming language like

Haskell make code more readable. Supporting these in a verification tool may get

in the way of automation and can result in a large trusted computing base. The

established wisdom here is to have a small core language, and to translate more

complex human readable code to the core.

Ultimately, however, programming is engineering and

trade-offs are unavoidable. F* exposes the full power of

the language to programmers/verifiers for as long as possible, and

translates into lower-level languages only for execution and proof automation.

verification engineers rarely look at the gigabytes of SMT queries produced

by F*, just like programmers rarely have to look at assembly code.

Of course there is an exception to every rule. Jonathan Protzenko developed a

powerful extraction mechanism called KreMLin from a subset of F* to C. C is

the lingua-franca for, among others, efficient cryptographic code. An important

property of that translation is that it preserves comments and readability. This

is necessary as cryptographic code has to be audited, and the experts with the

necessary domain knowledge are unlikely to switch to F* any time soon.

However, never say never. Clarke’s third and last rule goes as follows:

When a distinguished but elderly scientist states that something is possible,

they are almost certainly right. When they state that something is impossible,

they are very probably wrong.