Workshop on The Verification Grand Challenge

Workshop on The Verification Grand Challenge
February 21--23, 2005
SRI International, Menlo Park, CA

Greg Morrisett's Meeting Notes

The Verification Grand Challenge, Tony Hoare

Scientific Idealism.  Need a jump start: grand challenge.  Not a
problem that it's been around for a long time.  Fermat's thm stood for
300 years.  We have some evidence that the challenge can be met.

* A mature scientific discipline: sets its own agenda (for accumulation
  of knowledge and skills), pursues its own ideals (of purity accuracy,
  generality, certainty), poses its own fundamental questions (to
  satisfy curiosity, both practical and theoretical.)

  At one point, realized he was writing proposals quite successfully,
  but wasn't sure if given all the funding necessary, what he would do
  to advance computer science.

* In engineering, we ask: What does the product do? (spec tells us).
  How does it work (its internal interface specs tell us).  How can we
  make it better (cheaper, more reliable, ..., and delivered sooner.)

  An awful lot of cs students that want to be "taxi drivers" not
  engineers.  

* In science, we ask more: why does the product work? (underlying theory
  explains.)  How does the theory generalize? (so results apply in 
  different domains.)  How do we know the answers are right? (by
  calculation & by convincing experiment, reproducible and subject
  to scrutiny.)

* Verified programs:
  * have precise external specs at appropriate level of soundness, safety,
    security, serviceability, functionality, ...
  * have complete internal specs of interfaces between components
  * with correctness checked by proof, which should be fully mechanized
  * based on a sound theory of programming, which should be general and
    complete.
 * Note: these are ideals, no more likely to achieve these ideals than
   when we go to measure something or try to make a material pure.  The
   scientist is never happy with the level of accuracy/purity achieved
   so far.  

* A program verifier:
  * proposed in '69 as a fully automatic check of validity of an explanation
    why the program works.
  * needed as a basic experimental tool for the science of programming
  * its construction is still a grand challenge for research in CS
  * Doesn't necessarily believe that it will be used by all programmers,
    but rather, scientists.  

* The Human Genome project (1990-2004): had a clear set of deliverables,
  a planned route to application, in areas of great potential benefit,
  pursued scientific ideals, free from commercial pressures, building on
  current state of the art, looked 15 years ahead, changed the mode of 
  conduct of science.  One of the great things about this project is that
  it pursued a scientific ideal -- what's the blueprint of humans?  That
  pure scientific curiosity is what made it attractive.  Commercial pressure:
  all stuff was to be immediately published.  Important thing:  it achieved
  a goal that would've never been reached if left to individual scientists.
  Most scientists want to pursue own goals, on the spur of the moment.
  Do not want to subordinate freedom to team projects.  That's right for
  most scientists.  It should only be a minority working on a grand challenge.

* The verified software project:  modelled on human genome project, and
  shares many of its properties.  Does not need glamour or massive funds
  of genome.  

* Deliverables: 
  * comprehensive theory of programming: concurrency, objects, inheritance,...
  * a coherent toolset based on the theory:
    * development aids, test case generators, harnesses, assertion inference
      engines, program analyzers, ...
  * collection of mechanically verified (public domain) programs
    * safety-critical and embedded codes, open source libraries, middleware
      and desktop applications.
    * fully verified at high levels of soundness, safety, security,
      serviceability, functional correctness.
  * Go further than an engineer would, so they can evaluate the tradeoffs.
  * The collection of programs shows that the theory is correct, and the
    tools are up to the task.  
  
* Route to application:
  * verified programs will replace existing versions in daily use.  
    subsequent evolution will maintain correctness.
  * verification technology will be integrated into commercial toolsets
    for general use by software engineers.
  * the costs associated with program error will be significant reduced.

* Based on the software developer and user surveys, the US national
  annual costs of an inadequate infrastructure for software testing is
  estimated to range from $22.2 to $59.5 billion.
  * half costs borne by users
  * remaining costs borne by developers

  "The economic impacts of inadequate infrastructure for software testing",
  US Dept. Commerce Planning Report 02-03, May 2002.

* scientific ideals: academic research pursues ideals: generality of theory,
  of certainty of knowledge, purity of materials, accuracy of measurement,
  and now correctness of programs.  Far beyond the current needs of the 
  market place.

  example: SAT checking.  keeping the tool separate, but easily integrated
  is what wins.

* commercial pressures: program analysis development tools will follow
  market demand, discover more faults in existing programs, appealing to 
  current educational level of programmers, preferably with pictorial
  representations.

* change in conduct of science:  commitment to a large and long term
  project, involving collaboration as well as competition, on an
  international scale.  New links between established research
  schools, conference series, journals, ...  New criteria for refereeing,
  research grant evaluation, personal promotion,...  The best scientists
  put forth experiments, tools, etc. done by other scientists.  There's 
  too much of a tendency to promote someone because they've done it 
  themselves.  That's why we need a grand challenge.  

  Existing approaches tend to keep things apart.  

  Isn't it going to happen anyway?

* Determinants of success.  Support of the scientific community, skill and
  enthusiasm of participants, strategy for accumulation of results by
  co-operation and competitition.  Standards for interoperation of tools.
  Agreement on challenge codes, understanding from funding bodies.

Questions:

Bertrand Meyer:  clarify analogy with science, e.g., mathematics vs.
psychology, etc.  

Hoare: like all analogies, what I've been saying has to be interpreted.
The experimental data of the physical sciences has a sort of externally
imposed inevitability that the scientist can get around.  An advantage
because the scientist can't change it (e.g., simplify the human genome.)
Programs, in contrast, are artificial.  The way in which they are made
owes something to psycology, but I don't think it's important.  Programs
that are used today are jolly like hard experimental data.  If it's
in a repository, then you shouldn't be changing it.  The important thing
is that the experimental evidence should be convincing.  "You can't
say anymore it can't be done!  Here, I've done it."  

John Reynolds:  in the 60's, I wrote algorithm for producing the types
of 1st order programs.  I got back a scathing letter:  Why are you
writing a program to extract what the programmer already knew?  Why
aren't you writing a language that lets you say these things?  Your
criticism is very applicable to today.

Hoare:  Yes, I was more idealistic when I was young.    
Many people thing programs and languages are so awful they should be
thrown away.  I have a lot of sympathy for this view.  The technology
of the development & research that goes into existing programs and
languages will be convincing.  I don't rule out that programs in
the repository are rewritten and then proved.  That's very much part
of the project.  That would give very convincing evidence of this
point of view.  The thing that worries me about the more idealistic
approach is that it's very difficult to do science on a case study
basis.  Science seeks generality.  To do science on an ongoing 
engineering project, it's very difficult to carry conviction.  I have
engaged in projects in industy that used formal methods in the
design phase, and were very successful.  But in both cases, the
company decided not to use the methods on the next project.  The
case study approach just doesn't carry conviction.  

Greg Nelson:  your slide listed deliverables.  the three were
a methodology, a tool set, suite of verified programs.  I would
suggest that bullet 2 should be a verifier.  Dangerous to slip
into toolset.  Will cause us to lose focus.  My quibble.

Hoare:  Quibble to you, essential point to me.  Not as fixed in
my mind as the bullets might make you think.  It is the concensus
of the community that will decide how far the toolset will be
integrated into a single tool.  I'd like to keep the options open
that everybody can contribute.  A single tool (it is our real
dream) might give the wrong ... I mean, "I'm going to build THE
program verifier" gives the wrong impression.

Nelson: yes, I recognize many of the advantages of the tools you
mention, not because I don't want to have them, it's just that in
setting out the clear minimal deliverables...   I wouldn't want
to exclude these, just want to have the minimal deliverables.

David Evans:  one of the things you listed on your requirements was
agreement for challenge codes.  Fairly easy to decide what they are,
but how to divide them.  Had to agree that this lab works on chromosome
7, versus 11, etc.  That was difficult.  To get that division so that
you get the less interesting bits is hard.  

Hoare:  yes, I think so.  good point.

Ed Clarke:  question & quibble:  The genome guys could go to congress
and say if you give us the money, we'll cure cancer.  It seems that
the errors that occur in programs are intermittent in nature.  The
answer is restart the computer and the problem will go away.  The
quibble about SAT solving:  breakthroughs came from EE community,
not thm. proving.  Got their ideas from the constraint logic and AI
communities.  

Hoare:  lovely example of how theory & tools & application by coming
together have produced something that is undreamed of to start with.

---------------------------------------------------------------------------
Ed Clarke: Model Check Concurrent Software

Model checking has been used to verify complex hardware controllers.
Seems reasonable to use it on software.  Automatic for finite state
systems, used for some infinite systems.  Temporal logic-based approach
developed in early 80's.  Specs written in propositional temporal logic.
Verification by exhaustive search.  Advantages: no proofs!  Fast
compared to other rigorous methods such as thm. proving.  Main
limitation is space, not speed.  Diagnostic counterexamples.
State explosion is limiting.  Considerable progress made over the
years.  

Combatting the state explosion problem: BDDs can be used to represent
state transition systems more efficiently (SMV).  The partial order
reduction can be used to redeuce the number of states that must be
enumerated (SPIN).  many other techniques: abstraction,
comp. reasoning, symmetry, etc.  Can handle between 100-1000 state
variables.  Possible to check 10^120 reachable states.  (Note 10^78
atoms in universe.)  By using right abstraction, essentially
unlimited.

1992.  IEEE Futurebus+ cache coherence protocol, using SMV.  Detected
a number of previously undetected errors.

What makes software different?  Large/unbounded base types,
user-defined types, pointers/aliasing, recursion, method lookup,
unbounded # of threads, templates, generics, interrupts, exceptions,
secondary storage, self-modifying code, size: MS Word 1.4MLOC.

1. combine static analysis w. model checking.  Use analysis to
extract a model K from a boolean abstraction of the program.  Then check that
desired formula f is true in K.  SLAM, Bandera, Java PathFinder, MAGIC, ...
Problem with static analysis is false alarms.

2. simulate program along all paths in computation tree.  
Verisoft: source code + backtracking.  Prefix.

3. use finite-state machine to look for patterns in control-flow graph.

4. design with finite-state software models.  Finite state software models
can act as "missing link" between transition graphs and complex software
(e.g., statecharts, esterel.)

5. use bounded model checking and SAT (Kroening).  Problem: how to
compute reachable states?  Fixpoint computation is too expensive.
Restrict search to states that are reachable from initial state within
fixed number n of transitions.  Implemented by unwinding program and
using SAT solver.

* Counterexample guided abstraction refinement:  Kurshan, Yuan Lu, Ball et al.

* Predicate abstraction (Graf & Saidi, Ball et al, Chaki et al, Kroening)

CEGAR (counter-example guided abstraction refinement):
  use counter-example in abstracted model to refine the abstraction.

Predicate abstraction.  Technique produces an over-approximation of
  actual set of initial states and original transition relation.  

Q: is CEGAR (counter-example guided abstraction refinement) related
to the techniques some have used for constructing invariants?  
---------------------------------------------------------------------------
Panel:  Allen Emerson, Luca de Alfaro, Leonardo de Moura, Somesh Jha,
Ken McMillan?

Model checking: M |= f:  yes: correct, no: debugging info

Originally: monolithic after the fact verification

Favors separation of concerns:  develop program, verify.

Actually made it easier for industry to adopt.  In practice, they're
going to develop the program anyway.  It's degree of reliability or
correctness is a secondary concern to shipping.

Dijstra: "Model checking is an acceptable crutch."

Trends: increased capacity, efficiency, due to Moore's law, 
growing expontentially.  Enhanced expressivness (f) LT vs BT,
discrete vs. continuous, costed, probabilistic.  

Enlarged capacity: symbolic representation (BDDs), abstraction (homomorphisms,
symmetry, partial order), compositional reasoning.

Optimistic Impact of Moore's law: |M| grows exp. with respect to time.
If you have a symbolic representation of M, assume symbolic
representation r encodes M of exp(|r|), so we're getting a
double exponential growth in the size of what you can handle.

What if M |= f were solvable in unit time for arbitraryly large model M
and arbitrarly complicated formula f?

Then: rejoice, reliability of many systems enormously enhanced.

Still: A M grows, f get incomphrehsibly large.  That is, the specs
will grow.  

Conclusion: monolithic model checking can only scale so far.

Solution: compositional reasoning.

Model checking domains: hardwrae, concurrency, protocols, embedded systems,
software, ...   What makes domain special?  Why important to verify? Easy?
hard?  What techniques applicable?

Q: if you consider BDD's and symbolic model checking, what's the
state of the art compared to 1990?  

Ken: if you look at the basic techniques, there hasn't been a lot
of progress (epsilon improvements.)  On the other hand, it's been
combined with other things SAT.  

Ed:  Used CEGAR with SAT on hardware, able to do 10,000 bits where
originally they could only do 1,000.  

Allen: so to just assess what you could do with BDD's back then,
what's routine?

Ken: what you can do with basic bdd techniques has not increased
that much.  Moore's law hasn't helped that much.  

Q: what can we do automatically that we couldn't do 15 years ago?

A: we've had an order of magnitude improvement because we're able
to narrow down and look at an abstraction that's focused.

Many things that you can do with BDD's but not SAT solvers yet.
SAT solver has get heuristics for focusing on relevant facts.
If all the facts are relevant, then SAT doesn't help.  

Most of the properties we check are shallow.  Most of these techniques
just weed out lots of the program.  I can break all of them with
15 line programs.

Ed:  but they can find deep errors in simple programs.  

Most of the progress not in finding deep things, but shallow things.

Eric Hehner:  ed started talking about advantages of model checking over
thm. proving, but another way to look at it is to see model checking
as being A thm proving technique that can be used in some bigger context.
Big question how to best integrate model checking into the general
thm. proving architecture.

Allen: monolithic BDDs vs. actual BDDs in various applications.  

Ken: the decomposed transition tricks, variable re-ordering, other
long litany of tricks that are important but then trail off to
diminishing returns.  All developed before 1993.  

What about combining symoblic model checking and thm proving.  
What's the capacity of SMV.  That tool by current standards is
very primitive.  Newer advances in predicate abstraction and
localization reduction would help a lot.  One big left over
problem, when model checker fails to verify or refute, needs
to bring you back some way to refine the model.  

Discussion of 32-bits and how to model, etc.

Daniel Jackson:  a lot of the success of model checking is
around temporal logic.  But in software, reachability is the
key.  

Pneuli: liveness stuff.  

Alex Aiken:  all the software tools are good at proving shallow
theorems on big programs.  The refinement step seems like overkill.
It works extremely well when the refinements are simple, but
doesn't work at all when things are more complicated (my opinion).

Ken:  control properties of device drivers.  
Can we expand this niche to properties about data?  Higher-level
software?  Threaded programs?  What are the issues?

In principle predicate abstraction is complete in a suitable
sense.  If property is quanifier-free inductive invariant.
But practical techniques are incomplete.  Similarly, for
predicate image computation, and indexed predicate abstraction.

Much discussion about why search for completness is important.
Aiken points out that small (unnatural) examples that demonstrate
an incompleteness aren't that good, but Ken counters that there
are small, very natural programs that demonstrate problems.  
Hoare points out that having example-driven repository could
be good or bad (depending upon whether we overfit the techniques
to the examples.)

Challenge:  produce useful information for user on failure to
prove or refute.  Abstract counter examples are a nightmare.
Can a partial proof be used to generate conjectures under which
the property would be true?  Are there ways the model checker
could interact with the user to exploit the user's knoweldge?
Is there any way a failed proof can be used to help identify useful
auxiliary structures?

Leonardo:  How can model checking contribute to the grand challenge?
* verification in the small: components
  * invariant generation (abstraction+ symbolic model checking)
  * automatic generation of test cases (bounded model checking)
  * property checking (abstraction + model checking)
  * improved testing (explicit state model checking)
* verification in the large: integration of components.
  * integration is one of the main sources of bugs in real software
  * the interface of each component can be seen as an automata
  * model checking is used to verify whether the components are going to
    interact correctly or not.
* A tool bus:
  * the tool bus is an API for adding new tools, libraries and managing 
    evidence.
  * high level APIs for integrating model checkers with other tools.
Q: is it possible to integrate model checkers with model checkers (today)?
E.g., take SMV and SPIN interoperate.  

Rupak Majumdar: UCLA
  * program verification failed.  a lot of smart people tried.  what has
    changed?
  * computers have gotten faster.  we can run these algorithms now.
    * ex: vcgen as dags
  * our ambitions have gotten lower.  initially, wanted to check whether
    a routine sorts.  Now we ask, are the indexes in bounds?
  * more money is involved in reliable operations.  
  * is low ambition bad?  no.  type systems and program analysis have 
    focused on very simple properties that are non-theless very useful.
  * why not use type-checking?  properties we are intersted in now are 
    temporal, locks may be held, or not held.
  * let's not give up on soundness just yet.  
  * scenerio: you get 5000 flow sensitive warnings, you find a few
    bugs in the first 400 earnings.  In practice, path sensitivity
    is crucial.  
  * model checking has been a way to get sound, path-sensitive analyses.
  * big problem is that it doesn't scale.  no one will write models of
    the programs, not even annotations, can't assume a nice modeling
    language with well-defined semantics.
  * hence focus on automated abstractions.
The future:
  * important progress in past few years.  simple abstractions, 
    automatically and efficientlyu.  SLAM and BLAST.  I see them
    as proof of concept tools.  But most problems are open.
Open isues:
  * scalability/modularity & heterogeneity:  how to model the external
    environment of my system (e.g., C standard libraries).  Assume it
    can return anything.  Ask user to give pre/post-condition.  
  * data structures, unbounded processes
  * concurrency
  * liveness
  * APIs? -- too early to define an API up front and get our model checkers
    to agree up front.  
  haven't we solved these already?  Not satisfactorily.  Not push-button
  in software context.
---------------------------------------------------------------------------
After lunch.

Software-based model checking.

Holman.  Focusing directly on software artifacts.  Developer produced.

People started to do this about 7-8 years ago.  Where we've gotten:
  * mechanical model extraction, subsets of C & Java
  * automation of abstraction refinement
  * automation of counter-example analysis
  * [but all with a fair amount of manual guidance]
Two main approaches:
  * top-down (minimal detail first, with predicate discovery)
    - define boolean abstraction/skeleton and try to prove property
    - if fail, go back and add in detail
  * bottom up (full detail with irrelevant detail surpressed)
    - usually you can't prove it, so you start taking away detail.
Large application of this in 1998/99 for a switch at bell labs.
Started with real code.  Used slicing techniques to remove detail.
Not a single bug found, but well, Lucent recalled it :-)

My prediction in 5 years:
  * many of these techniques will become practical on fairly large problems
  * but only on fairly narrow domains
---------------------------------------------------------------------------
Matt Dwyer:  model checking *concurrent* java programs
* Tools like JavaPathfinder and Bandera 2/bogor
* check for properties like invariants, LTL, ...
* java programs with at most ~10 threads, ~10k sloc, related to the property.
* it's easy to find a program that cannot be checked (look on sourceforge)
* claims of going significantly furhter cut corners
  * perform incomplete state space search
* Major obstacles:
  threads, heap allocated data, recursive thread behavior, ...
* software developers & specs:
  * can't write them due to lack of usable methods and training
  * won't write them because they don't see value
  * need an effective reward system to address both issues
* what is software:  restrict ourselves to typical programming languages
  (Java, C, C#, ...)  Consistent with view in the trenches.  But reality
is that modern apps are built using some enormous set of libraries,
middleware, frameworks, etc.  Dead before you get started.  Developers
think in terms of extended virtual machine.  Use some (weak) modeling
notations to think about their system.  Reality is that software is
*way* more than just code.  Need to think about modeling notations.

Getting domain-specific:
* notations that express domain primitives
* generate code from those models
  * developers love this...
* check properties of models
  * exploit naturally occuring abstractions in their heads
  * can't let them build it in code and then extract it after the fact
* carry properties forward to implementation
  * need to customize analysis to the domain

Cadena Experience: boeing & lockheed martin, large scale, MLOC,
built on thick middleware stacks.  Developers use informal models
for integration.  We introduced layered formal models for defining:
* component interfaces
* systems as assemblies of components
* component input/output dependencies
* stateful component behavior
* synch
Analysis exploit these layered models.  Incremental investment & benefit.
Abstractions:
* middleware model, event delivery mechanism, threading model, priority
* component state defined by mode variables
  * e.g., stale/fresh vs. time-stamp in implementation
* data and computations are not included:  placeholders left in model
* synch. policy specs vs. mechanism.
Outcomes:
* in the beginning: when event e occurs in mode v then eventually component
  C will receive a B event before the end of the frame.
* 2 years, 15 sites, 12 tool releases, we hear them discussing sequencing
  properties and how they should be written as a regexp.
Reaching out to Software Engineering:
  * current state is that they're using (bad) modeling languages
    that don't support analysis/verification.
  * we need to ensure that software modeling languages mature
    * have well-defined and accepted semantics
    * permit the modeling of emerging software platforms
    * to have broad tool support

Q: what's the diff with just programming at a higher-level of abstraction?
---------------------------------------------------------------------------
Ganesh Gopalakrishnan: some challenges in parallel & distributed
hardware design and programming.  Many cores on chips.  Memory
consistency an issue.  Verification of cache coherency taken
seriously.  Verification grand challenge: MPI library?  Posix threads?
Some challenges: model checking cache coherency.  Model checking
distributed memory programs used for sci. simulations.  Runtime
checking under limited observability.  Modeling MPI library in
promela.  Model checking simple MPI programs.  Unsolved: typical HPC
programming cycle: understand what is being simulated (eg. the
physics, finance, bio).  Develop model of relevant "features" of
interest.  Generate a numerical model.  Solve numerical model using as
serial code.  Later, the numerical model is what's used for
parallelization.  Check for consistency (eg conserve energy/mass).
Post-silicon validation.  Hard since can't make many observations.
Constraint solving in the context of coupled reactive processes.
---------------------------------------------------------------------------
Willem Visser (NASA Ames): Program Model Checking.

Java Pathfinder.  
* best features:
  * can be used by people that don't know what a model checker is
  * supports symbolic execution
  * test case generation
* worst features:
  * we are the most proficient users
  * cannot deal with arbitrary java -- env. generation is the biggest issue.
* currently:
  * novel combinations of explicit-state model checking, symbolic execution,
    and predicate abstraction to analyze ever larger feasible behavior
    spaces.
    * eg combining explicit state model checking and symbolic execution to 
      infer symbolic paths from concrete ones.
* internal
  * tools are better at bug-finding than correctness
  * very good at finding concurrency bugs, focus on the easy bugs
  * tools are good, but not necessarily usable outside of core group
  * many tools based on some form of abstraction & refinement, thus
    limited by available decision procedures.

* solved? -- partial-order reductions, dealing with dynamic data allocation.
* nearly solved? dealing with loops & complex data during symbolic analysis
  * combining dynamic & static analyses
* unsolved: environment generation & modular reasoning

challenges: going from bug-finding to correctness
  * checking (more) complex functional properties
* comparing tools/techniques
  * benchmarks
  * fortunately, we work on the same inputs to comparison easier
  * comparing performance 
  * JPF has benefited a lot from this

jpf has been open sourced out of nasa.  some java programs allowed to hand out.
challenge: how to analyze real-time java.  

what we would like: more decision procedures.
benefits to others: testing, static analysis.
---------------------------------------------------------------------------
Sriram Rajamani: software model checking, state of the art

* we can check control-dominated properties of components written in
common programming languages today.

* confluence of techniques from model checking, thm proving, analysis.

model checking
* strengths: automatic, inductive invariants
* weaknesses: scale, models

thm proving
* handles unbounded domains naturally
* good solutions for equality with unintrpreted functions, linear inequalities

analysis:
* works on code, pointer aware, integrated into compilers, precision
 efficiency tradeoffs well studied.
* but abstraction is hardwired, not targeted at properties like temporal
  properties.

What's happened we've combined these to get rid of weaknesses.
Start with source, use analysis and thm prover to get model 
(eg. boolean program) which is fed into checker.  

Problem is the refinement procedure and these things work well for
control-dominated properties.  Having a refinement technique that
guarantees progress with pointers/procedures/etc. should be a goal.

challenges:  more scalable solutions.
* environment modeling.  SDV worked because the effort i nmodeling the OS
  environment gets amortized due to running the tools on thousands of
  drivers.  Single biggest obstacle to wide-spread use of software 
  model checking.
* integration with design
  * lots of momentum around model-driven development in the UML community
  * huge potential for higher-level models co-existing with code.
---------------------------------------------------------------------------
Questions:

* Why not change to "reliable software".  Why not verified by construction?

Hoare: how little software is written.  Most is developed by changing
existing software.  Motivation is not just that you can write it,
but also maintain it as it changes.  

---------------------------------------------------------------------------
Decision procedures

Randy Bryant:  used in abstraction and verification.  Taking some
subset of say, 1st order logic and asking if this formula is valid
or not.  Originally built inside thm provers, but starting in the
mid '90s they've been yanked out and stuck into middle of a lot of
verification tools.  

1. expressive power

Theories:
* uninterpreted fns
* difference constraints
* linear constraints
* ints or integers or reals or ?
* arrays
* bit vectors
* lists
Domains:
* reals, integers, ...
Added features:
* quantifier elimination
* proof generation
* counter example generation

SAT-based decision procedures.  Eager vs. lazy encodings
Eager: input -> sat preserving encoding -> bool formula
Lazy: ...

Lazy: can be used to combine theories (delays stuff to conjunctions,
e.g., Nelson & Oppen).  Current instances don't scale very well.
Goes around the loop a lot of times.  

Eager: must encode all info which may be impracticle, but lets SAT
solver do all the hard work.  Modern solvers are good at extracting
relevant portions out of very large fomulae.  

Issues:
SAT Engine
  * higher performance
  * features to support decision procedure support
Expressive power
  * what combinations of theories?
Performance:
  * can it handle very large formulas?
  * with complex boolean structure?
Architecture:
  * lazy vs eager approach?
  * framework for combing theories?
  * is sat solver tightly integrated?
  * is code reliabile and maintainable?
---------------------------------------------------------------------------
Lintao Zhang (Microsoft):

Performance improved by 3 orders of mag. in last decade
(algorithm improvements.)

1 million variables, 10 million literals initially: 3 hours

* explanatory sat solver:  
  * when unsat: extract core that is unsat.
  * sat: minimal model extraction
* interpolation using sat solver
* checkable proof of unsatisfiability
* incremental sat solving
* on line progress report

can we improve?  yes.  better heuristics daily, but incremental.
* DPLL algorithm has been around for 40 years.  improvements:
  * learning & non-chronological backtracking (1996)
  * careful implementation & tuning (2001)
  * better decision/branching heuristics
  * better learning & garbage collection techniques
  * exploring the structure of the boolean formula
  *what's next?

need another big change to get 3 orders mag. improvement.
* traditional: throw it to solver & pray
* but need to use app-specific knowledge to get most improvements
* branching suggestions, domain specific implications
* continuous interaction because of abstract and refinement

quantified boolean formulas:
* blowing up formula not a good idea.
* 

J.Moore:  not always best to have fastest procedure, but rather
one that can communicate well (supporting integration).  

Q: the same formula over 32-bit variables vs 64-bit variables?

---------------------------------------------------------------------------
Nils Klarlund (Lucent):  software checking: the performance gap

Deep blue:  raw power.

[missed lots of good stuff due to battery loss]
---------------------------------------------------------------------------
Deduction:  N. Shankar

The verification challenge ranges from discharging assertions to proving
correctness.  Unlike previous panels, we are focused on proof methods for
expressive theories.  Proofs require step-by-step reasoning and problem
decomposition rather than brute-force search/propagation.

Current automation can check proofs at a rigorous informal level of
discourse, and solve the occasional open problem.  

Prediction: over the next 15 years we will be able to automate the
bulk of the verification task through the use of static analysis,
decision procedures, model checking, proof strategies, and libraries.

Pnueli: thinks interactive thm proving biggest waste of time.

Varieties of deduction:  many dimensions
  * logic: quantifier-free 1st order, quantified, h.o. logic, type theories,
    non-standard logics, modal, temporal logics, linear, ...
  * kind of automation: auto vs. human oriented
  * degree of automation: interactive, tactic-oriented vs autonomous,
    built in decision procedures, support for libraries, ...
  * interfaces: for adding low-level automation, and for embedding
    thm provers within other applications.  Most thm provers were built with
    the assumption a human would be driving them.

Issues facing users:
  * why can't i express...?   Formalizing math is tricky.
  * why isn't the proof going through?   is the thm incorrect or the prover
    on the wrong track, or both?  lack of feedback & counterexamples?
  * this is obvious to me, why not the prover?  need lots of special-
    purpose automation.
  * why do i need to provide all the background definitions/thms?  library
    development & maintenance is a tough challenge.
  * i made small changes and my proof doesn't work anymore.  proof strategies
    and automation that are robust.

panelists: J Moore, Deepak Kapur, Jose Meseguer, Carsten Schurmann, John
Harrison, Konrad Slind.
---------------------------------------------------------------------------
J Moore: ACL2

Hypothesis: the high cost of formal methods (when real) is a 
historical anomaly due to the fact that virtually every project
formally recapitulates the past.

The use of mechanized formal methods will eventually decrease
time to market & increase reliability.

CLI Stack
C string library (verified gcc -o 68020 binaries correct)
done in Nqthm and led to ACL2.

Goals:
* ANSI standard language and logic
* execution efficiency
* scale

Some interesting software verified:
* Motorola CAP DSP:  no hazards -> microarch = isa
  * 100 pages of models
  * established correctness of isa as a simulator
  * isa executes 3 times faster than motorola's own microarch simulator
    (in C).
* FDIV on AMD K5 (Moore, Kaufmann, Lynch)
* elementary fp RTL for AMD athlon, opteron and other fpus.  amd will never
  build another unverified unit again.
* fdiv and fsqrt on ibm power 4.  ACL2 fp models are run on hundreds of
  test vectors...
* soundness of the ivy proof checker.
* correctness of BDD implementation.
* union switch & signal post-compiler checker
* rockwell instruct set equivalence thms.
* aamp7 separation kernel in microcode
* ajile systems jem1
* sun CLDC jvm model (700 pages) (e.g., class loader correct)
* properties of various java classes & methods via javac
* correctness of sun jvm verifier
* bryant's y86 verilog microarchitecture correspondence
* dijstra's shortest path
* unicode reader

Proof techniques:
All involve lemmas about inductively defined objects and concepts.
* Induction, simplification
  * congruence-based rewriting, conditional rewriting,
  * fixed-point based typing..., bdd, sat, reflection, monads, ...

proofs based on operational semantics, this deep embedding lets
us go up and down the system heirarchy.  

Why are we succeeding?  Our model language & logic is an ANSI standard
executable programming language, supported on many platforms, good
compilers, editors, debugging, etc.

34 years supporting execution, ...

Chosen the right problems to work on.  Build useful software artifacts.
Before we verify fsqrt, run through 80 million test vectors.  

Drawbacks: acl2 must be trained.  language is not expressive.  finding
appropriate lemmas is often hard.  collections of thm. hard.

short term:  better books/collections of lemmas, fsm decision
procedures, exploit more static analysis tools, import lemmas
proved by shape analysis, etc., import lemmas proved by other
thm. provers.  

Medium: incorporate heuristics for discovering invariants,
exploit examples to guide search.  ....

Long: discover how to mechanize mathematical decomposition, abstraction,
and generalization.  Build a powerful and mechanically verified them
prover.  
---------------------------------------------------------------------------
Deepak Kapur:

Nano steps and baby challenges within the grand challenge.  Induction.
Can we characterize conjectures which can be decided automatically
using inductive methods?

Theory-based definitions of recursive functions: Is this related to
deforestation?

automatic generation of polynomial loop invariants.  polynomial
invariants form an ideal.  a) intersection of invariant ideals
corresponding to all paths of execution of a program, b) program
construct semantics using ideal operations.
---------------------------------------------------------------------------
Jose Meseguer:  computational logical frameworks

Technologies should be generic, not tied to a particular language.
Should be semantics-based.  
A computational logical framework with efficient executability
and a spectrum of meta-tools can serve as a semantic framework
for generic program analysis techniques.  Some requirements:

* good data representation capabilities
* concurrency & nondeterminism
* simplicity
* efficient implementability and efficient meta tools
* reflection
* initial model semantics, to support inductive reasoning

The rewriting logic/maude experience:
  * generate analysis tools for free, Java, JVM, Scheme, ML, Haskell, and bc
  * algebraic denotational semantics and SOS
---------------------------------------------------------------------------
John Harrison (intel):  The LCF approach to proof

LCF: a methodology for making a prover extensible by ordinary users,
yet reliable.

Idea due to Milner in Edinburgh LCF.  Now used in Coq, HOL, Isabelle, Nuprl.

Implement in a strongly-typed functional pl (e.g., ML).
Make thm an abstract data type with only simple primitive inference rules.
Make the implementation language available for arbitrary extensions.

LCF style is procedural. A declarative style is better since it's
easier to write & understand independent of prover, esaier to modify,
less tied to details of the prover, hence more portable.

Mizar pioneered the declarative style of proof.  Recently,
several other declarative proof languages have been developed
as well as declarative shells around existing systems like
HOL and Isabelle.  
---------------------------------------------------------------------------
Konrad Slind: overview of HOL Systems

LCF (1979) -> HOL4 (2001-present)

Research themes:
* hardware verification
* operational semantics of pls
* embedding other formalisms
* theory formalization

embedding:
  * shallow: external meaning function (eg. in someone's brain)
  * deep:  internal meaning function

If you can prove the rules of Hoare logic as thms you have a deep
embedding, but if you're using VCG to generate proof obligations,
then you have a shallow embedding.

The following tools are mature:
* simplification 85%
* 1st-order proof search
* datatype definitions
* recursive functions
* inductive definitions
* arithmetic decision procedures (full Pressburger)

Case studies:
* thorough semantics of Java (Nipkow group, Isabelle/HOL)
* thorough semantics of UDP/IP/Sockets (Sewell & co.)
* large journal papers (70-100 pages)
Moving on from case studies:
* original effort Prosper
* generate circuit checkers from formal semantics of PSL (Gordon)
* model checkers (HOL-BDD library integration)
* check conformance of network traces with formal model
* synthesize hardware from subset of HOL

Open source (HOL-4).  Please rip us off!! :-)

Challenges:
* teach people how to prove theorems
* must continue to attack big problems
---------------------------------------------------------------------------
Tuesday 22 Feb 2005

Greg Nelson:  Lessons from ESC

Kinds of checks: null-pointers, array bounds, deadlocks, ...  The
class of errors just beyond what a type-checker catches.  What
scientific results would further verification?

Under control: automatic theorem proving, error reporting, 
               case-reduced VC generation.
Some progress: concurrency, modular verification semantics
Wide open    : storage management (without GC)
  (e.g., explicit free.)

Simplify: uses cooperating decision procedures, heuristic-driven
instantiation to handle quantifiers, ...  A lot of the performance
work for ESC was simplify.  Benchmarks were important for driving
things, and ESC project had this.  Takes ESC Java about an hour to
check its own front end (2331 methods).

Modular semantics: of all the wonderful inventions in CS, the most
successful was the subroutine.  (AGREED!)  Similarly, Mesa's explicit
interfaces, info. hiding, etc.  (AGREED!)  A procedure spec consists
of a pre-,post-condition, and modifies clause.  Issue is modifies
clause leaks out implementation.  Trick is introducing abstraction,
and modifies clauses for that?

Hoare: argument for continuing to study storage mgmt with explicit free's
can be made in terms of other resources.  Same is true for races === even
if synch./monitors used for avoiding data races, the possiblity for 
unexpected non-determinism in the sequence of calls leads things to
a coarser level of granulairty.  The point is that when the problems
are driven away at the low level, they reappear at a higher level of
abstraction.  

Nelson: example with locks.  Not the right one for storage mgmt.  

Jim: lesson I learned from ESC was that you can start using it without
having to write specifications.  Language itself that you should not
derference null, not have data races, ...  As soon as you start working
up the stack, you need specs on the lower-level routines to start
dealing higher level routines.  

Dawson: how do you catch bugs in ESC implementation?  is there a safety net?

Morrisett: will the thm proving continue to be good enough when we scale
to deeper properties?
---------------------------------------------------------------------------
Bertrand Meyer:  a world without cartesian product?

Kind of math useful for OO programs.  Functions, elementary set theory,
and a bit of curry :-)  

Start with total functions: A -> B.  

Relations: Pow(A) -> Pow(B) such that r({}) = {} and stable under union.
A <-> B.  

A function from A to B is a total function from X to B for some subset X of B.
A -|-> B.

Defined a generalized composition ; for curried functions.
Defined another composition "." operator.

Objects (addresses), values, and states.  (Seems to be redisovering monads.)
Args -> Objects -> States -> Z
When Z is States, we'll call it a command, otherwise query.

... (see slides)

"Principle of general relativity"  everything is relative to the
current object (self, this, etc.)

An assertion on classes is:  Class -> Pow(States)

Don't use higher-level techniques (e.g., quantifiers)  but models.

Eiffel model library: set, sequence, function, relation, etc.
Specs in terms of this.

Start from the mathematics!  As in denotational semantics, 
programming as math but don't be afraid of the big bad state.
---------------------------------------------------------------------------
Static/Dynamic Analysis Tools:  Alex Aiken, David Evans, Dawson Engler,
Henny Sipma, Scott mcpeak, klaus havelund, yuan yu

Alex Aiken:

2015:  what's in the newspaper?  Rare buffer overrun wrecks ship.  

Low-level properties becoming rare.  
Buffer overruns are gone: 1. tools + hand annotation for legacy code,
2. recompiled with special tools, 3. new code written in safe languages.

Tools much improved (bug finding tools):
* static analysis used on 100MLOC routinely
* many new properties can be handled (e.g., locking)
* educated user population
* dividing line between static/dynamic analysis much clearer
* less diversity (?)

Possible problem:  everything becomes more dynamic
  * more late binding, more late decisions, etc.
  * makes static analysis problematic.

Higher level bugs still problematic
* lack of specifications
  * still hard to write
  * even harder to imagine
  * programmers don't know the spec
* specifications come late
  * usually after the code
  * often after a disaster
Dynamic analysis will be important here
* analysis of runs in the field with users
* can notice things without having a specification
* e.g., Watson, TalkBack (feedback systems)

What is a verification grand challenge?
* something industry won't do
  * but this area has industrial interest now...
  * caution: human genome project also had industrial interest (and 
    private company won.)
* something where verification is needed
  * not just bug finding
  * cost of failure must be extremely high

Suggestions:
* Context:  $2M to verify X
  X = A control software, critical public infrastructure, medical application

* Tax on large government projects
  * take 2% of future govt funding for research on same artifact.
---------------------------------------------------------------------------
Henny Sipma:

* Software analysis: from point solutions to an integrated plug-n-play
  infrastructure.

Termination Analysis:  C programs -> CIL -> implementation in mathematica

Enabled:
  Science: easy access to "physical reality"
  * evaluation of faithfulness of formalizations
  * evaluation of effectiveness of methods
  * extraction of programming patterns
  * classification of physical reality
  * specialization of disciplines and methods
  Engineering: optimization and fine tuning for special cases

Provided:
* formal methods became relevant
* new programming paradigms are followed closely by formalizations
* programming practice steered in new directions by the analysis methods

How?
* inspirational workshop
* implementation & integration became respectable research
* conference submissions require publication of APIs
* regulatory agencies
* security?
* wishful thinking? yes

Likely?  self-reinforcing  

Need more women
---------------------------------------------------------------------------
Klaus Havelund, kesterel, NASA/Ames:  Dynamic Program Analysis

Monitor behavior of program and check against spec.  Need to
instrument program to get events out that you can monitor.

Main issue: what is the spec?  
Program instrumentation that is efficient.
Complexity of spec language influences how fast monitoring is,
improtant if you want to do things online.
---------------------------------------------------------------------------
David Evans, UVA:  static/dynamic analysis: past, present and future

Lots of companies with bug finding tools.
Good checking of generic requirements.
Good for checking dynamic assertions inserted by programmers.
Bad at knowing what properties to check:
* automatic inference techniques
* grand challenge repository
No good techniques for combining static and dynamic analyses:  lot of
promise in next 10 years.

Predictions:

Software vendor will lose a major lawsuit because of a bug.
Someone will come up with a cool name and sell a lot of books on verification.
No more buffer overruns in major commercial software.
* 20th oakland conf., Brian Snow thought we would be talking about buffer
  overruns in 2019.
Standard compilers prevent most concurrency problems.
Programmers will still be dumb and make mistakes and resist change.
Good CS degree programs will
* incorporate verification into their first course
* include a course on identifying & checking program properties
---------------------------------------------------------------------------
Scott McPeak (Ccured):

Really interested in scalable and fully/mostly automatic.  But, no one
is using Ccured.  Why?  Main reason -- takes on the order of hours or
days to run on large programs.  Split into two pieces: modular
analysis.  Thought we could publish other ideas.  Making it scalable
in this fashion never thought we could publish.  Do we care more about
advancing careers?

Second point: most of us would be satisfied if we achieved total
soundness and total completeness.  Should also be thinking about
the way these analyses can tell the programmer other things than
bug or no bug.  In particular, how can we take the whole program
analysis and draw conclusions about the program's architecture.  For
example, Ccured likes to divide pointers into different classes.
We might be able to look for connected components...

In the context of how you save inter-module analysis information,
should consider what other good impacts we can have.

---------------------------------------------------------------------------
Yuan Yu (MSR):

Programming in 2015:  C# and Java-like language will be the programming
language of choice.  Most software products will be rewritten in such
a language.  Program execution will be managed by virtual machines
(CLR/JVM).  Concurrency will be widely practiced.

Ramifications:  many shallow bugs of interest today will be endangered
species (buffer overrun, double locking).  Concurrency bugs will still
be a main source of hard bugs.  Virtual machine will become an intruiging
place to add tools.

RaceTrack: add instrumentation to production CLR to detect data races.
MLOC.  Many interesting concurrency bugs.  Operations on threads and
monitors.  Mondify VM's thread implementation.  Load and store operations,
modify JIT to emit RaceTrack calls.  (Takes care of late binding).
Lockset and threadset -- modify GC to include extra memory in object
allocations.  Weaknesses:  overhads. Coverage.
---------------------------------------------------------------------------
Dawson Engler

Goal: find as many serious bugs as possible.  
Approach 1: lightweight unsound static checkers.
  Finds 1000s bugs in linux & such.  Commercialized.
  static tool can use but must not depend on annotation to get good results.
  low fps on few mloc systems != low fp tool.
  people are aggressivley ignorant.
Approach 2: implementation-level model checking
  model checking can work well with large, C implementation (TCP, ext3, ...)
  * even though lots of work does not dominate static

Open Question: how to get the bugs that matter?
Myth: all bugs matter and all will be fixed.
  Find 10 bugs, all get fixed.  Find 10,000...
Reality:  all sites have many open bugs.  Myth lives because state-of-art
so bad at bug finding.
What users want is the 5-10 bugs that really matter.
  90% don't matter.  Finding and fixing 900 out of 1000 take too many
  resources to hunt down and fix.  
How to find worst?  no one has a good answer.

Open Question: do static tools really help?  

Danger: opportunity cost -- you've wasted a lot of time when you could 
have done something more productive. 2. deterministic canary bugs to
non-deterministic.  

Myth: more analysis is better.  

Technical innovation dependent on social process that generates and
accepts it.  Many of the really good/influential tools from outside.
Obvious Prefix.  Much of the interesting bug work published outside
e.g., compare OSDI04 to PLDI04.

Pop-psychology diagnosis:
* bias for complex, formalistic over simple but effective.
* sound checking of narrow properties over results.
* prudish over intersting.  Rejection of "failure oblivious computing"

What's the contribution?  Not deep technical stuff.

Soundness often a distraction.  What soundness wants to really be
is total correctness.  
---------------------------------------------------------------------------
Connie Heitmeyer

SCR Toolset -- key: produce a validated, verified requirements spec.
Spec editor, consistency checker, model checker, dependency graph browser,
  simulator, ...
hooked to theorem provers (PVS, property checker SALSA, invariant generator)
added test case generator, code generator (optimized, provably correct code)
working with Annie Liu.

Since 1999 at least 3 sites of Lockheed Martin have been using the
toolkit to develop critical avionics software.  Largely embedded
software -- very complex control, functions are simple but with
many parts, very simple data types.

  "We currently are suporting close to 1500 models...and have found
   the SCRTOOL suite to be an invaluable aid in finding requirements
   defects, as well as validating functional behavior of our software
   requirements."

Many useful principles & guidelines have been formulated (e.g.,
stepwise refinement, separation of concerns, ...)
Lots of spec & modeling languages, methods, methodologies.

Problem: no consensus on which spec/modeling language, methods,
methodologies, ..  Few examples of high quality specifications.

Overhwelming majority of ssafety critical software built in
ad hoc, unsystematic fashion.

Most developers focus on code.  Documents/specs are universally
incomplete, inconsistent, incorrect, and poorly organized.

Very few thm provers or model checkers used.  View is that verification
tools are NOT cost effective, and little evidence to prove them wrong.
Simulator considered valuable.

Two bright spots:  increasing use of Java, growing popularity of model-
driven design.  However, most of the models are pretty low level and bad.
Lots of simulink models (stateflow.)  Requireements specs don't describe
what the software should do, they describe what the contractor should do.

Need to package the theory in a way that the practitioners can use it.
Methods, models, languages/templates, industrial-strength tools.

Long-term goal: semi-automatic transformation of a spec into a
provably correct, efficient program.  

Developing a well-founded software engineering discipline is of
comparable importance to scientific advances.

* spec languages for particular domains
* powerful robust integrated tools
* templates for specifying system level properties
* proof systems with understandable proofs
* compilers that produce verifiable code
---------------------------------------------------------------------------
Rustan Leino: programming methodology

spec #: whole language environment (dynamic checks, verifier, etc.)

* programming language
* semantics
* decision procedures
* inference techniques
* programming and specification paradigms

The need for programming methodology:
* How can the verification technology scale?
* How do we verify a librari in isolation?
* Once found, how do we fix a bug?

"Can programming theory ye fully explain why big programs work?" -Hoare

Claim:  no.

Areas in need of further progress:
* object invariants
* frame conditions
* module invariants and initialization
* concurrency
* delegates
* dynamic code (perhaps for scripting)
* discipline for using object references

Making progress:
* have checker, will verify
* study programs: discover/formalize/invent verifiable programming paradigms.
* opportunities to have larger impact: build in common paradigms into
  languages, provide tool support.

What are the largest modern programs that have been verified beyond
today's type checking?
---------------------------------------------------------------------------
Doug Smith

Industry directions:  
* omg, meta-object facility, module-driven architecture, AST models
* ...

Clasically:  code generation by refinement

High assurance generative programming:
JavaCard (Kestrel)
AutoBayes, AutoFilter (Nasa/AMES)
Planware (Kestrel)
Ptolemy (Berkeley)
Cryptol (Galois)

potential contributions:
* solvers, constructive thm provers
* libraries of design theories and refinement generators
* rational reconstruction of benchmarks
* participation in standardization activities
---------------------------------------------------------------------------
Peter Holmeir (NSA)

Working with HOL for 10 years.  Interested in deep embeddings of
programming and assertion languages.  Aimed at building verification
systems for programs, nestled within a secure theorem prover.  Deep
embeddings are very difficult.  Been trying to make them easier.

Really want to amplify and help the programmer, with as much freedom
and expressibility as possible without constricting them.  Strawman.
Programmer comes into his office.  Has a spec that he begins with.
Makes creative decisions.  Certain overall structure.  More 
understanding, refinement to design.  Inevitably, not choosing other
possiblities.  Closing off some avenues, but deepening the current
path.  Dijkstra: programs & proofs developed hand in hand.  
Want the programmer to see proof obligations as well as design
obligations.  In large proof development, you often want to program
a new tool to aid in your proving.  More of a synergy between the
creation and verification goals.  
---------------------------------------------------------------------------
Pete Manolios (Ga Tech)

Challenge: a verified microprocessor
verification is the dominant cost in complex designs. will be the barrier.  
(Randy Bryant: "Actually, Allen and I wrote that.")

advantage: the spec is known
it is the most successful abstraction in CS, the ISA
success: boolean equivalence checking, floating point verification.
promising: term-level reasoning UCLID (sat solver)
unsolved: reasoning about system-level properties of bit-level designs.
Need to reason at various levels of abstraction.

WEB Refinement-based approach:  ISA & implementation have same
visible behaviors modulo stuttering.

improvements depend as much on methodology as tool improvements.
Challenge: end-to-end arguments
* prove thms about code running on micro architecture
* need thms about MA we can use eg, liveness

education:
Goal: undergraduates who verify non-trivial systems
public relations/sociological issue:
* how many CS professionals know basic results of logic/recursion

Mechanical verification for the masses.
* mature system that is self-teaching & gives constant feedback

grand challenge remarks:
* objective metric for success
  * well-defined collection of problems & specs
  * beyond the current methods (10K man years?)
  * require significant advances
  * what % of the problems did you solve?
* sociological aspects important
  * significance should be self-evident to colleagues & society
  * e-voting, health care, verify JVM libraries, GC, big ints, ...

---------------------------------------------------------------------------
Moore: post-facto verification or design/synthesis/etc.?

Hoare: emphasis on existing code is because they are a better vehicle
for scientific research & evaluation.  Raising of the principles and
techniques is the way of getting the most from them.  It's just that
we need the scientific evidence.  The initial set of challenge codes
can be selected from those that have specs, requirements, theories,
proofs, etc.  Fair amount of code out there that could be retrieved.
Also believe that some of these codes where we will only have the
specification (e.g., safety critical).  Some of the challenges will
be to reconstruct the code from the specs.  [JGM: don't buy this.]

Rajeev: to me the grand challenge is a tall order.  Can we not set
a shorter term goal that will help us get there.  Proposal:  build
a verified piece of software.  To keep it small, let's make it an
OS for an embedded device.  Will give us the benchmarks, integration,
etc.  

Examples: TinyOS.  Datapoint: Helen Gill was proposing a reference
implementation of a real-time OS.  

Clarke: some of the real-time OS's that are very modest in size.

Alex: if the challenge is to write our own code.  There's a risk
that the world will move along.  The amount of effort to go into
this...  To the degree this is a publicity stunt, it's useful to
pick something that will be of current interest when we get there.
Maybe too much of a constraint.  

Implementation of HTTPS.  A verified version of that.  

Ed Clarke:  Many types of verifiers.  We should have some choice.

Reynolds: how much easier is it to build a system specific to 
a particular verification task vs. a more general system.

Jose: one issue is the tension between post-facto vs. new system
being developed.  Attraction of new systems is that modular methods
can be brought to bear early on.  Picking 1 or 2 carefully chosen
problems could put into practice the whole thing.  

Rupak: another reason to look at 1 program.  Can share much of
the infrastructure among groups.  Also, Linux kernel.  

Alex: sympathetic to Linux kernel.  But maybe challenge problme
each year.  Lowers the risk in something that might not pan out.

Daniel:  bug finding or proof.  Interesting contrast.  Scientific
classification it's the wrong one.  Perfect assurance makes no
sense.  What degree of assurance do you get?  e.g., probablistic
technique?  We need evidence that the assurance stuff works.  
No point in investing a huge amount of effort in non-weakest link.

Cordell: spec to code, code to spec.  Results whre someone has
studied difficulty of mapping spec to code or code to spec?
Connie: we've been looking at that -- extracting higher level spec from code.

Bertrand: real programmers won't write assertions.  1. it's not
relevant.  2. it's not true.  3. we could overcome this.  

Nelson: post hoc verification vs design methods:  we shouldn't worry
about this because the design methods that produce code hand-in-hand
with proof require that you know what the proof looks like in order
to use the requirement...  We've learned to construct a post-condition
of a loop by taking the invariant, conjoining the guard, etc.  
I think we are aiming towards this development process.  But we can't
get there without knowing how to prove a multi-module program w.r.t.
its specification.  
---------------------------------------------------------------------------
23 Feb 2005

Amir Pnueli  (NYU & Weizmann):  Looking Ahead  

Talmudic prophecy: since the destruction of the temple, prophecy has
been taken from the prophets and given to the fools.

A plea for reinstituting a name:  A Verifying Compiler

In favor:
* the term verfication, whether formal or informal, is unambivalently 
interpreted as being applied post-facto.

* a verifying compiler can be used as part of development methodology,
and can cover host of correct by construction methods.

* discussions before/during workshop, seems to rule out development
approaches.

* the view that a program should only be run if syntactically correct,
type-safe, and semantically correct is very appealing.

If you look at Hoare logic, although invented for semantics, if
you run it backwards, you get a program development methodology.
The rules are development rules.  

We heard to: faster machines, brilliant new ideas and algorithms,
*lowering the expected level of verified properties*.  

to which I wish to add:  *lowering the expected degree of automation*.
In fact it is enough to reduce the measure |properties| x |automation|.

Has always been *software verification*.
  hardware verification is a diversion and a distraction.
  gave us many valuable techniques (BDDs, SATs, model checking)
  also gave impression that we could be push-button
These are toy problems compared to the main goal of software verification.

Ed Clarke:  actually, MC developed for software.  Didn't know what a flip-
flop was.  
Amir:  recall in 77 you said deductive methods wouldn't scale. :-)

In FM'99 talk:  Deduction is Forever.  I am now ready to tone it down to:
User Guidance is Unavoidable.  

Law of conservation of user guidance:  push it out of one place, pops up
in another.

* one may replace deduction by abstraction, but then the user has to
provide the abstraction mapping.

* you may try predicate abstraction, but the user has to provide the
predicate base.

* there are ways to automatically construct a base & abstraction refinement,
but we just heard that in the interesting cases, there is no
replacement for manual refinement.

Still, there is an improvement.  The entropy of the amount of effort
user has to invest is becoming simpler.  

  Let the user do what humans do best, and leave the computer to handle
  the combinatorics and tedious searches.

Good example:  predicate abstraction where user provides predicate base
and model checker finds best boolean combination which forms an inductive
assertion.  Other cases: polynomial invariants, abstract interpretation, etc.

My view:  is that it will be a proof checker in which the user will guide
the proof steps.  Each step can invoke different tools and methods
such as a model checker, various program analyses, and deductive proof
steps.  Some of the user guidance can be prepared a priori (off-line)
in the form of assertions, annotations, or proof strategies.  Progress
in our verification ability will be expressed in being able to take
bigger proof steps that will be automatically performed.  Evolution 
instead of revolution (though some steps could lead to quantam leaps.)

So what should we do?  Typing the effort together we should:
* a common set of benchmark programs to be verified
* agreements on format interchange of specifications, models, and
  partial results.  For example, for temporal verification of reactive
  systems, we recommend the use of fair transition systems.  If you want
  to deal with procedures, this model has to be extended.
* where possible, automatic translation between models, specs, & logics.
* better interface for inter-operability of tools Examples:
  * deep embedding of LTL and CTL* in PVS
  * in the context of computing existential abstraction, shipping a query
    to PVS and reciving back a complete partially resolved goal tree.

Why don't we try it on linux?
* like the human genome, a focusing single project
* an open source code, morally deserves our support
* divorced from commercial interests.  no one will give us $2M to
  work on it (maybe MS will give us $2M to not work on it :-))
* opportunity to test Tony's hypothesis
* major effort that will take signficant time, have multple benefits:
  the development of a *specification*.

personally: want a reactive system (not a realtime or safety critical)
but it has some aspects of reactivity.  The hardest thing will be to
agree on the specification.  

Clarke: hardware wansn't a diversion.  next class of systems that we're
able to verify will be very low-level, embedded systems that don't have
complicated data structures, more control oriented.  Certainly protocols.

Q: are we really keeping up with hardware?  They're so complicated.
Clarke: maybe the interfaces.  David Dill has had a lot of success.

Nelson: Alan Kay just said hardware is frozen software.  Agree that
software verif. is critical.  Suggest we try technically & scientifically
to develop methods that apply to both.  Hardware was more successful
in large part because they come from EE community which has more
engineering discipline than the software community.  

Clarke: boolean functions, state machines are principle components.
less use of pointers.  

Jay Moore: one important thing is assumptions on which proofs are based.
Can't imagine going too much lower in abstraction hierarchy.  

Bertrand: choosing open source is a moral issue.  It's a different business
model.  If a company spit out a verifying compiler some would say that's
not acceptable.  
---------------------------------------------------------------------------
Big Questions:

What can we do to help each other?  

I. a survey paper discussing the pros/cons of various proof development
environments for a number of baseline software projects.  Examples include
formalizing a (useful) subset of a programming language's semantics 
(Java, x86, C), important libraries (e.g., AES, string processing, TCP?).
Systems include PVS, ACL2, Isabelle, NuPRL, Coq, Alloy, ...
---------------------------------------------------------------------------
Workshop on The Verification Grand Challenge February 21--23, 2005 SRI International, Menlo Park, CA

Greg Morrisett's Meeting Notes

Workshop on The Verification Grand Challenge
February 21--23, 2005
SRI International, Menlo Park, CA