Information System
Survivability
Peter G. Neumann
University of Maryland ENPM 808s
Fall Semester 1999
ENPM 808s:
Information Systems Survivability:
1. Introduction and Overview
- - - - - - - - - - - - - - - - - - -
Introduction to INFOSURV: overview, outline of the course, expectations, guidelines for the course project, concepts, compromisibility, defenses
The URL for this document provides the lecture slides and pointers to the course materials. This collection of material will grow lecture by lecture during the course, and should be checked before each lecture for up-to-date versions. The html version is generated by Otfried Cheong's Hyperlatex program. I have intentionally not kept the entire set of slides in one file because the materials for each lecture will grow as the course progresses. (Caution: The numbering of the files is off-by-one from the numbering of the lectures, but html does not care.)
Students should acquire the requisite background reading materials in advance of the course, and check this file weekly during the course.
The 14 sections of lecture materials are oriented roughly toward each of the 14 class periods. However, the course may move ahead or lag behind, depending on the actual presentation of the material, and thus students should anticipate that the lectures may not stick strictly to the section boundaries outlined here. [Added note: We came close, except for the third class period being blown away by Hurricane Floyd, and being merged into the fourth class period.]
The reading material is intentionally front-loaded, leaving time free for
the final project during the last half of the course.
There will be no final exam or homework (other than
reading and thinking). Instead, there will be a single
final project. In that there is generally no single
set of correct answers, your work is intended to challenge your
creativity and imagination rather than requiring rote responses.
2. The general textbook for the course: 3. More recent further material: 4. Practical Architectures for
Survivable Systems and Networks is browsable: 5. Illustrative Risks provides a frequently updated
topical abstract index to the Risks Forum material: Corrections, suggestions, and new materials
appropriate for the teaching of this course
would be most welcome. Unless otherwise
specified, it will be assumed that your
contributions may be included herein or linked
to your own site (immediately, or perhaps in
any subsequent manifestations of this course)
with explicit credit given to the contributors
and with essentially the same open reuse rights
as the original body of on-line course materials.
All opinions, findings, conclusions,
and recommendations are mine and do not
necessarily reflect the views of ARL, or
SRI International, or the University of Maryland.
Analysis of survivability requirements and
their interdependencies, with survivability
explicitly dependent on security, reliability,
performance, etc.
Identification of inadequacies in existing
commercial systems, missing components,
and approaches and architectures for
overcoming those inadequacies
Explicit conceptual subsystem designs
for the missing components, with
recommendations for their implementation
Survivable architectural structures that
facilitate subsystem integration,
including securely integrated cryptography
Establishment of structural system and
network architectures that can achieve the
desired overall survivability, including
robust mobile code and systems that reduce
trust necessary in end-user systems
Recommendations for future research
and development, education, training, etc.
A paper design of a survivable subsystem or component and
how it might integrate into a robust entirety.
A robustification effort based on existing open-source software,
either as a paper or as a software development.
An analysis of how to eliminate or mitigate a common
software flaw (e.g., buffer overflow): via programming languages,
precompiler, style or system constraints, misuse+anomaly detection, etc.
A "challenge" from Computer-Related Risks
Anticipated adversities might typically
include hardware faults, software flaws,
attacks on systems and networks perpetrated
by malicious users, accidental misuse,
environmental hazards, unfortunate animal
behaviors, acts of God, etc.
Operational survivability depends on hardware,
software, power, communications,
environmental factors such as electromagnetic
interference, extreme weather, earthquakes,
etc.
Operational survivability also depends on the
knowledge, training, and integrity of many
people (procurement officers, system
developers, administrators, users, and even
innocent bystanders) and the dynamic ability
of system administrators to perform emergency
maintenance
Information infrastructures:
the Internet and private nets
Computer-communication systems:
operating systems, database management
systems, networking software, ...
Pervasive understanding of the fundamentals is
essential, not just a scattering of so-called
expertise.
Survivability is an overarching emergent
property. That means it is a property of a
system or network in the large, not just a
property of its components that can be merely
stuck together. Consequently, survivability
of a whole system cannot be analyzed locally.
It depends on many factors (discussed below).
Composition of subsystems is often
unpredictable, with unanticipated side-effects.
Survivability is not easily retrofitted
onto nonsurvivable applications.
System survivability is often inadequate in
existing systems, which are terribly deficient
(for example, in security and reliability)
Generic system/network architectures
Robust protocols and open-system components
Realistic operational prototypes with
nontrivial survivability characteristics
Fully fledged robust commercially supported systems,
nonproprietary where possible
Systematic use of good cryptography for
authentication, integrity, and confidentiality
Applicable research and development, well
integrated into robust systems
Trustworthiness, dependability, assurance
Generalized composition, encompassing all
useful modes of subsystem composition
Generalized dependence, compensating for
and overcoming underlying untrustworthiness
Generalized survivability, based on
many generalized-dependence mechanisms
Compromise from within: using
privileges of the given layer.
Programming errors; internal Trojan horses;
authorized misuse
Compromise from below: from a lower
layer (e.g., hardware-based
attacks on software, OS-based
attacks on applications). Sniffers;
Ken Thompson's C Compiler Trojan horse;
hardware flaws or alterations
The Nature of the Course
- - - - - - - - - - - - - - - - - - -
In many respects, this is an unconventional course.
Although it has a strong engineering perspective, it
also has aspects related to computer science,
economics, philosophy, ethics, and human well-being.
It has an unusually pervasive system sense, with
emphasis on principles and experience rather than on
quick-and-dirty would-be solutions that in reality are
inadequate. It has a component of computer-system
history, in that it recapitulates significant earlier
achievements and lessons that seem to have been largely
forgotten.
The subject matter of attaining high survivability
inherently does not lend itself to cookbook
approaches. You will not be expected to memorize
and regurgitate large amounts of information.
You will be expected to think, reason,
generalize, and draw your own conclusions.
University of Maryland ENPM 808s
Fall Semester 1999
Thursday Evenings, 7:00-9:40p.m. ET
- - - - - - - - - - - - - - - - - - -
Peter G. Neumann
Computer Science Laboratory
SRI International
Menlo Park, CA 94025-3493
Neumann@CSL.sri.com
1-650-859-2375
http://www.csl.sri.com/neumann/
On-line course materials can be found at
http://www.csl.sri.com/neumann/umd.html
Course slides may differ from on-line files.
Course Materials
Most of the course materials are or will be on-line.
- - - - - - - - - - - - - - - - - - -
1. Schedule of lectures, background:
http://www.csl.sri.com/neumann/umd808s.html
Peter G. Neumann, Computer-Related Risks,
Addison-Wesley, ISBN 0-201-55805-X.
http://www.csl.sri.com/neumann/risks-new.html
http://www.csl.sri.com/neumann/arl-one.html
also available in PostScript and pdf forms:
arl-one.ps and arl-one.pdf
The report contains many relevant references, including those cited here.
http://www.csl.sri.com/neumann/illustrative.html
as well as
http://www.csl.sri.com/neumann/illustrative.ps
http://www.csl.sri.com/neumann/illustrative.pdf
Reuse of the Course Materials
- - - - - - - - - - - - - - - - - - -
The on-line course materials are
copyright by Peter G. Neumann, but are
explicitly intended to be freely available
for noncommercial and academic reuse in the
spirit of open-source code, e.g., copyleft.
Acknowledgements, Disclaimers
- - - - - - - - - - - - - - - - - - -
The work on which this course is based is
supported by the U.S. Army Research Lab
under Contract DAKF11-97-C-0020.
LTC Paul Walczak (PWalczak@arl.mil),
1-301-394-3862 is the contact.
Topics To Be Covered
- - - - - - - - - - - - - - - - - - -
Illustrative risks that motivate the needs
for survivability
Basic Course Outline
- - - - - - - - - - - - - - - - - - - - - -
1. 2 Sep Introduction and overview
2. 9 Sep Survivability-related risks
3. 16 Sep Risks continued, and Threats
4. 23 Sep Survivability requirements
5. 30 Sep Deficiencies in existing systems
6. 7 Oct Overcoming these deficiencies 1
7. 14 Oct Overcoming these deficiencies 2
8. 21 Oct Architectures for survivability 1
9. 28 Oct Architectures for survivability 2
10. 4 Nov Reliability in perspective
11. 11 Nov Security in perspective
12. 18 Nov Architectures for survivability 3
-- 25 Nov Thanksgiving. No class.
13. 2 Dec Implementing for survivability
14. 9 Dec Conclusions
Classes
- - - - - - - - - - - - - - - - - - -
Classes will generally consist of about an hour
and fifteen minutes (lecture-style, with some
questions permitted at UMd), a break, and then
informal discussion driven to a considerable extent
by student questions for the remainder of
the class period (with possibly some remote
questions where first-level satellited video
facilities make that feasible).
Projects
- - - - - - - - - - - - - - - - - - -
In lieu of a final exam, a class project will be due
at the last class, on 9 December. A few suggestions
for possible topics are at
http://www.csl.sri.com/neumann/umd.html.
Proposals for projects are due by e-mail no later
than 22 October, with responses of acceptances or
required iteration to be available by e-mail prior
generally about a week later. Earlier submissions
will allow for more iteration if needed and more time
for your project. Projects should be scoped not to
exceed the normal load of homework, studies, and
final exam in a comparable course. Group projects
may be proposed, although they should clearly reflect
who is going to do which portions of the total
effort.
Illustrative Project Topics
- - - - - - - - - - - - - - - - - - -
A report analyzing a relevant
topic, such as robust algorithms, authentication, preventing service
denial, roles of public-key cryptography, how to robustify open-source
components
Survivability
- - - - - - - - - - - - - - - - - - -
In the present context, survivability is the ability
of a computer-communication system-based application to
satisfy in an ongoing basis certain specified critical
requirements (for example, security, reliability, real-time
responsiveness, and correctness), in the face of adverse
conditions. In some cases, survivability may require
reconfigurability, interoperability, etc.
Survivability Challenges
- - - - - - - - - - - - - - - - - - -
System and network survivability ultimately
depends on the reliability, fault tolerance,
security, performance, and operational
robustness of many constituent systems and
networks.
Infrastructures
- - - - - - - - - - - - - - - - - - -
Critical national infrastructures:
telecommunications, power and energy, water,
transportation, banking and finance,
emergency services, continuity of government,
etc.
See http://www.pccip.gov.
Difficulties To Be Overcome
- - - - - - - - - - - - - - - - - - -
There are generally no easy answers.
Cookbook approaches are insufficient.
Basic Needs
- - - - - - - - - - - - - - - - - - -
Definitive survivability requirements
and subrequirements
Concepts
- - - - - - - - - - - - - - - - - - -
Noncompromisibility from outside, from
within, from below, in the presence of
all survivability-relevant threats
Compromises
- - - - - - - - - - - - - - - - - - -
Compromise from outside: from above,
or laterally at the same layer
(e.g., access with no authorization,
exploiting a logic flaw). Letter bombs; spoofing;
penetrations
Some (sub)systems are more trustworthy
with respect to certain threats (to security,
reliability, etc.) because their design and
implementation reduce the likelihood of being
compromised in those respects.
Dependability is a perception of trustworthiness
associated with a specific requirement
Assurance is a measure of faith that can
be placed in trustworthiness and dependability.
Some (sub)systems are trusted to behave
properly even when they are untrustworthy.
This is a very serious problem.
Misplaced trust can lead to compromises.
Many commercial systems are not easily
composed with other systems.
Much theoretical research on composition
is not applicable to realistic systems. TCSEC
criteria are deficient (Orange Book, Red Book).
Trustworthiness depends critically on
the noncompromisibility of compositions.
Compositions may be untrustworthy despite
local trustworthiness of the components.
Generalized dependence enables acceptable
operation despite faults, failures, errors, and
misbehavior or misuse of underlying mechanisms.
It compensates for, bypasses, or otherwise
overcomes errant behavior. Thus, it enables
the enhancement of trustworthiness.
Generalized dependence approaches the
problem of trying to "make a silk purse
out of a sow's ear." It can succeed quite
effectively in certain respects in various
cases (as considered next).
Identified Types of Generalized Dependence
for Enhancing Trustworthiness (See Section 1.2.5 of Security: Domains with mutual suspicion,
trustworthy firewalls, guards, and wrappers,
crypto and especially multikey or Byzantine
crypto, real-time analysis and response,
but not necessarily kernels and TCBs,
which assume bottom-up trustworthiness
Survivability: Run-time checks on all
requirements, underlying security and
reliability as well as survivability,
real-time analysis extended to survivability
The field of error-correcting codes
has evolved considerably since then, with
codes designed to detect and/or to correct certain
patterns of errors: random, asymmetric (e.g.,
1-to-0 only, or 0-to-1 only), bursty, or otherwise correlated,
in block, variable-length, and sequential
communications, as long as the required
redundancy does not cause the available
channel capacity to be exceeded.
The best known and simplest class of error-correcting
codes is the Hamming Code,
which can correct any random single-bit error in a
code word of length n, in which
k bits are information bits, and
n-k bits are redundant check bits, where
n = 2n-k - 1 ; in principle, the n-k redundant bits
provide 2n-k - 1 error syndromes pinpointing
where the (single) error occurred, and an all-zero
syndrome denotes the case of no error. (More later.)
For example, a Byzantine clock
(Lamport et al. 1982) with 3k+1 subclocks
can tolerate arbitrary failure modes in
any k clocks, including malicious modes.
(This cannot be done with fewer clocks
unless constraining assumptions are made.)
A firewall might try to prevent
unauthorized users from accessing an internal
system from outside, and prevent certain
types of access as well. Unfortunately,
some types of traffic (e.g., http) can
compromise internal security.
A wrapper might try to mediate access
to a flawed component and to block harmful
use either on the way in or the way out.
In principle, these components attempt
to overcome the shortcomings of the
systems that they supposedly protect,
in the spirit of generalized dependence.
Today's guards, firewalls, and wrappers
tend to be incomplete in their requirements,
flawed, and often bypassable, penetrable,
and compromisible from within and below.
"Trusted paths" to PCs are a challenge.
Even if they were more trustworthy,
there would still be risks of compromise.
In practice, computing enclosures
tend to leak profusely, because of weak
hardware and poor software security.
Domains are an old idea, going back
to Multics rings in 1965, Mike Schroeder's
1972 thesis, and capability architectures.
Unfortunately, they are seldom used to
realize their potential.
Cryptography is almost always vulnerable
to compromise from within and below (and
denial of service attacks). Compromises
are usually easier to perpetrate than
brute-force exhaustive cracking of keys.
However, 56-bit DES keys can now be
broken (e.g., Deep Crack) in a few days
with a single computer system, and much
more rapidly with multi-system approaches
(e.g., Distributed Crack).
Attaining survivability requires a
combination of many techniques for ensuring
necessary lower-layer properties, including
relevant generalized-dependence mechanisms.
In addition, real-time monitoring with
detection of survivability-relevant anomalies,
automated diagnosis, the ability to correlate
across multiple platforms, and stable
reconfiguration are survivability-relevant
instances of generalized dependence.
MLS: Multilevel security
(Bell and LaPadula)
MLI: Multilevel integrity (Biba)
MLA: Multilevel availability
(Neumann-Proctor-Lunt; conceptual)
MLX: Multilevel survivability
(new concept, primarily conceptual)
These are considered as architectural
concepts later on.
Consider the Illustrative Risks document on-line at
Illustrative Compromises
Layer of Compromise Compromise
abstraction from outside; from within;
Needs exogirding Needs endogirding
----------------------------------------------------
Outside Acts of God,
environment earthquakes,
lightning, etc.
----------------------------------------------------
User Masqueraders Accidental mistakes
Intentional misuse
----------------------------------------------------
Application Penetrations of Programming errors
application in application code
integrity
----------------------------------------------------
Middleware Penetration of Trojan horsing of
Web and DBMS Web and DBMS
servers servers
----------------------------------------------------
Networking Penetration of Trojan horsing of
routers, firewalls network software
Denials of service
----------------------------------------------------
Operating Penetrations of Flawed OS software
system OS by Trojan-horsed OS
unauthorized Tampering by
users privileged
processes
----------------------------------------------------
Hardware Externally generated Bad hardware design
electromagnetic or and implementation
other interference Hardware Trojan horses
External power- Unrecoverable faults
utility glitches Internal interference
----------------------------------------------------
Inside Malicious or Internal power supplies,
environment accidental acts tripped breakers,
UPS/battery failures
----------------------------------------------------
Illustrative Compromises, continued
Layer of Compromise Compromise
abstraction from within; from below;
Needs endogirding Needs undergirding
-------------------------------------------------------
Outside Acts of God, Chernobyl-like
environment earthquakes, disasters caused
lightning, etc. by users or operators
-------------------------------------------------------
User Accidental mistakes Application system
Intentional misuse outage or service denial
-------------------------------------------------------
Application Programming errors Application (e.g., DBMS)
in application code undermined within
operating systems (OSs)
--------------------------------------------------------
Middleware Trojan horsing of Subversion of
Web and DBMS middleware from OS
servers or network operations
--------------------------------------------------------
Networking Trojan horsing of Capture of crypto
network software keys within the OS
Exploitation of lower
protocol layers
--------------------------------------------------------
Operating Flawed OS software OS undermined from
system Trojan-horsed OS within hardware:
Tampering by faults exceeding fault
privileged tolerance; hardware
processes flaws or sabotage
--------------------------------------------------------
Hardware Bad hardware design Internal power
and implementation irregularities
Hardware Trojan horses
Unrecoverable faults
Internal interference
--------------------------------------------------------
Inside Internal power supplies,
environment tripped breakers,
UPS/battery failures
--------------------------------------------------------
Trust and Trustworthiness
- - - - - - - - - - - - - - - - - - -
Trust and Trustworthiness are different.
Generalized Composition
- - - - - - - - - - - - - - - - - - -
Composition must accommodate
serial connections, feedback, hierarchical
layering, networking, collateral dependence,
mutual suspicion, clients/servers, guards, ...
without undermining the given requirements
and component interoperability
Generalized Dependence
- - - - - - - - - - - - - - - - - - -
Strict dependence (the hierarchical "uses"
relation of Parnas 1974) demands strict
correctness of underlying mechanisms to
attain correctness of a given mechanism.
http://www.csl.sri.com/neumann/arl-one.html.)
- - - - - - - - - - - - - - - - - - -
Reliability: error-correcting coding,
Moore-Shannon and von Neumann theories,
Byzantine algorithms, self-resynchronization,
and fault-tolerance techniques generally
Generalized Dependence:
Error-Correcting Codes
- - - - - - - - - - - - - - - - - - -
Claude Shannon long ago showed how to signal
through a noisy channel with arbitrarily
high reliability by using sufficient redundancy.
Generalized Dependence:
Moore and
Shannon 1956: "Crummy Relays"
- - - - - - - - - - - - - - - - - - -
Given a box of relays of unknown goodness, test them
for reliability. Throw out those with failure
probability near one-half. Accept those with lesser
failure probability, using positive logic. Reverse
the logic on those with greater failure probability
so that they then be considered to have failure
probability less than one-half. As an oversimplified
example, suppose each relay has failure probability p
less than one-half when it is required to provide a
closed path. Then, because only one good path is
sufficient to close the circuit, a parallel circuit
of n logically equivalent relays with identical
failure probabilities has an aggregate failure
probability pn, which can be arbitrarily small
for large enough n. Then, use serial-parallel
combinations to attain arbitrarily low failure
probabilities.
Generalized Dependence:
Byzantine Agreement
- - - - - - - - - - - - - - - - - - -
Whereas error-correcting codes make some
assumptions about the transmission or
storage medium and about the worst-case
failures, and Moore-Shannon circuits
make assumptions about local failure
probabilities, Byzantine agreement makes
no local assumptions about any components.
Each component that fails may fail in any
arbitrarily nasty way.
Generalized Dependence for Security:
Guards, Firewalls, and Wrappers
- - - - - - - - - - - - - - - - - - -
A security guard might try to prevent
the outflow of sensitive information and
prevent the inflow of Trojan horses that
could inflict damage on the guarded system.
Generalized Dependence for Security:
Protection Domains
- - - - - - - - - - - - - - - - - - -
In principle, it should be possible
to confine the results of execution
within a domain, strictly confining any
would-be side-effects. This would be
particularly valuable with mobile code.
Security Instances of Generalized
Dependence:
Cryptography
- - - - - - - - - - - - - - - - - - -
Cryptography provides the ability to
use a completely unprotected storage or
communications medium for sensitive
information, to establish integrity
of otherwise potentially untrustworthy code
and data, and to authenticate the identity
of otherwise potentially unknown entities.
Commonality among Generalized Dependence
- - - - - - - - - - - - - - - - - - -
In all these examples, there are of course pitfalls,
including risks of subversion and limitations of the
theory when applied to practice. For example,
single-error-correcting codes can miscorrect multiple
errors. Byzantine systems fail if there are more than
k out of 3k+1 misbehaving systems. Firewalls
can be penetrated if they are configured to permit
certain traffic through. Wrappers and almost everything
else can be compromised from within or from below.
Exploiting Generalized Dependence
- - - - - - - - - - - - - - - - - - -
Survivability depends (in the sense
of generalized dependence) on security,
reliability, fault-tolerance, performance, ...
Mandatory Policies (introduction)
- - - - - - - - - - - - - - - - - - -
In principle, mandatory policies have the potential
advantage that, once they are established, they
cannot be compromised from above. They are very
valuable as architectural concepts, although there
are many practical problems (as we shall see).
Reading for the Next Class Period
- - - - - - - - - - - - - - - - - - -
Read Chapter 1 of Computer-Related Risks.
Pick some of the topics that interest you most
in Chapters 2 and 3, and read at least the chapter sections
that seem most relevant to survivability.
http://www.csl.sri.com/neumann/illustrative.html
(or .ps or .pdf)
and think about how many of those risks cases are
survivability related. (Note: The descriptor
symbol V is used to denote cases that are
particularly surViVability related, although many
other cases are also survivability relevant.)