|

Virtualized Execution Realizing Network Infrastructures Enhancing Reliability (VERNIER)
Harnessing the hidden power of homogeneous software deployments for survivable application communities
The operation of almost every large organization, including the Department of Defense (DoD),
has become dependent on commercial off-the-shelf (COTS) software,
including applications and operating systems.
The prevalence of COTS software
failures due to security breaches, misconfiguration, and bugs has a
severe impact on computing systems performance.
A few software packages, such as Microsoft® Windows and Office, are
heavily deployed and used daily by personnel throughout the DoD.
Many users experience the same failures, which are caused by the same
vulnerabilities, configuration errors, and bugs, and suffer the same
costly, undesirable consequences. Such homogeneity accentuates the
risk, since a single serious vulnerability could be exploited by a
determined adversary, to catastrophic effect.
The goal of the VERNIER project is to turn the size and homogeneity of the user community into an advantage by converting scattered deployments of vulnerable COTS systems into cohesive, survivable application communities (AC) that detect, diagnose, and recover from their own failures.
Principal Investigator:
Staff Members:
Papers:
Team Members:
- Jim Thornton (PARC PI)
- Glenn Durfee (PARC)
- Peter Kwan (PARC)
- John Mitchell (Stanford PI)
- Alex Aiken (Stanford)
- Dan Boneh (Stanford)
- Adam Oliner (Stanford)
- Mendel Rosenblum (Stanford)
- Liz Stinson (Stanford)
Technical Approach:
The VERNIER project combines robust, survivable host-based
execution environments with collaborative diagnosis and response
functions communicating over a hierarchical, scalable
knowledge-sharing infrastructure. The system is composed of
clusters of nodes, each running the VERNIER augmented version of
VMware®, along with our custom software.
A COTS application running safely under supervision of
an extended virtual machine monitor (VMM) may enter an undesirable state, as
a result of malicious network input, or buggy COTS code.
An alarm raised by the VERNIER system then rolls
back execution to a desirable state,
using virtualization capabilities. Simultaneously, the
diagnosis engine combines execution trace information with
community information to formulate a response.
This may involve blocking malicious network traffic, modifying
the system configuration, repairing program state, or other
compensating actions and preventive measures.
Situation awareness is maintained
for the entire community through dissemination of event
diagnosis and response options across the communication
infrastructure, allowing other VERNIER hosts to prevent
similar events.
The operation of VERNIER is best conceptualized
using a cognitive "Observe, Orient, Decide, Act" cycle (see
figure).
At the lowest level, VERNIER uses Virtual Machine Introspection (VMI)
technology on individual nodes to observe
the state and operation of COTS applications and network activity.
These observations form the raw input for Collaborative
Diagnosis, which consists of Runtime Execution Monitoring, Configuration Analysis & Monitoring, and Network Traffic Analysis.
VERNIER orients this data by Learning & Knowledge Sharing, which consists of Active Learning, Abstraction-Based Diagnosis, and
a Secure Knowledge Sharing Infrastructure.
Once the status of the Application Community members has been
successfully diagnosed, the community will decide how to respond in
its current state. Collaborative Response
technologies include Backup and Repair/Prevention, Host-Based Traffic Control,
and System Reconfiguration.
VERNIER will act upon these decisions by using VMI to
manipulate the current state of Application Community members.
In addition to the aforementioned diagnosis and repair capabilities,
VERNIER features a Situational Awareness Gauge (SAG) that makes
use of all parts of the system to provide not only a concise view of the
community's current status but also early indications of potential
threats to security and reliability.
Material:
Sponsorship and Acknowledgments:
This project is sponsored
by the Defense Advanced Research Projects Agency (DARPA) under the Application Communities (AC) program.
We also gratefully acknowledge the support of our technology transition partner, VMware, Inc. |
|