Go up to Top
Go forward to 2 Survivability-Related Risks

1 Introduction and Overview

Information System
Survivability
Peter G. Neumann
University of Maryland ENPM 808s
Fall Semester 1999

ENPM 808s:
Information Systems Survivability:
1. Introduction and Overview
- - - - - - - - - - - - - - - - - - -
Introduction to INFOSURV: overview, outline of the course, expectations, guidelines for the course project, concepts, compromisibility, defenses

The URL for this document provides the lecture slides and pointers to the course materials. This collection of material will grow lecture by lecture during the course, and should be checked before each lecture for up-to-date versions. The html version is generated by Otfried Cheong's Hyperlatex program. I have intentionally not kept the entire set of slides in one file because the materials for each lecture will grow as the course progresses. (Caution: The numbering of the files is off-by-one from the numbering of the lectures, but html does not care.)

Students should acquire the requisite background reading materials in advance of the course, and check this file weekly during the course.

The 14 sections of lecture materials are oriented roughly toward each of the 14 class periods. However, the course may move ahead or lag behind, depending on the actual presentation of the material, and thus students should anticipate that the lectures may not stick strictly to the section boundaries outlined here. [Added note: We came close, except for the third class period being blown away by Hurricane Floyd, and being merged into the fourth class period.]

The reading material is intentionally front-loaded, leaving time free for the final project during the last half of the course.

The Nature of the Course
- - - - - - - - - - - - - - - - - - -
In many respects, this is an unconventional course. Although it has a strong engineering perspective, it also has aspects related to computer science, economics, philosophy, ethics, and human well-being. It has an unusually pervasive system sense, with emphasis on principles and experience rather than on quick-and-dirty would-be solutions that in reality are inadequate. It has a component of computer-system history, in that it recapitulates significant earlier achievements and lessons that seem to have been largely forgotten.

The subject matter of attaining high survivability inherently does not lend itself to cookbook approaches. You will not be expected to memorize and regurgitate large amounts of information. You will be expected to think, reason, generalize, and draw your own conclusions.
There will be no final exam or homework (other than reading and thinking). Instead, there will be a single final project. In that there is generally no single set of correct answers, your work is intended to challenge your creativity and imagination rather than requiring rote responses.

University of Maryland ENPM 808s
Fall Semester 1999
Thursday Evenings, 7:00-9:40p.m. ET
- - - - - - - - - - - - - - - - - - -
Peter G. Neumann
Computer Science Laboratory
SRI International
Menlo Park, CA 94025-3493
Neumann@CSL.sri.com
1-650-859-2375
http://www.csl.sri.com/neumann/
On-line course materials can be found at
http://www.csl.sri.com/neumann/umd.html
Course slides may differ from on-line files.

Course Materials Most of the course materials are or will be on-line.
- - - - - - - - - - - - - - - - - - -
1. Schedule of lectures, background:
http://www.csl.sri.com/neumann/umd808s.html
2. The general textbook for the course:
Peter G. Neumann, Computer-Related Risks, Addison-Wesley, ISBN 0-201-55805-X.
3. More recent further material:
http://www.csl.sri.com/neumann/risks-new.html
4. Practical Architectures for Survivable Systems and Networks is browsable:
http://www.csl.sri.com/neumann/arl-one.html
also available in PostScript and pdf forms:
arl-one.ps and arl-one.pdf
The report contains many relevant references, including those cited here.
5. Illustrative Risks provides a frequently updated topical abstract index to the Risks Forum material:
http://www.csl.sri.com/neumann/illustrative.html
as well as
http://www.csl.sri.com/neumann/illustrative.ps
http://www.csl.sri.com/neumann/illustrative.pdf

Reuse of the Course Materials
- - - - - - - - - - - - - - - - - - -
The on-line course materials are copyright by Peter G. Neumann, but are explicitly intended to be freely available for noncommercial and academic reuse in the spirit of open-source code, e.g., copyleft.
Corrections, suggestions, and new materials appropriate for the teaching of this course would be most welcome. Unless otherwise specified, it will be assumed that your contributions may be included herein or linked to your own site (immediately, or perhaps in any subsequent manifestations of this course) with explicit credit given to the contributors and with essentially the same open reuse rights as the original body of on-line course materials.

Acknowledgements, Disclaimers
- - - - - - - - - - - - - - - - - - -
The work on which this course is based is supported by the U.S. Army Research Lab under Contract DAKF11-97-C-0020. LTC Paul Walczak (PWalczak@arl.mil), 1-301-394-3862 is the contact.
All opinions, findings, conclusions, and recommendations are mine and do not necessarily reflect the views of ARL, or SRI International, or the University of Maryland.

Topics To Be Covered
- - - - - - - - - - - - - - - - - - -
Illustrative risks that motivate the needs for survivability
Analysis of survivability requirements and their interdependencies, with survivability explicitly dependent on security, reliability, performance, etc.
Identification of inadequacies in existing commercial systems, missing components, and approaches and architectures for overcoming those inadequacies
Explicit conceptual subsystem designs for the missing components, with recommendations for their implementation
Survivable architectural structures that facilitate subsystem integration, including securely integrated cryptography
Establishment of structural system and network architectures that can achieve the desired overall survivability, including robust mobile code and systems that reduce trust necessary in end-user systems
Recommendations for future research and development, education, training, etc.

Basic Course Outline
- - - - - - - - - - - - - - - - - - - - - -
1.   2 Sep Introduction and overview
2.   9 Sep Survivability-related risks
3.  16 Sep Risks continued, and Threats 
4.  23 Sep Survivability requirements 
5.  30 Sep Deficiencies in existing systems
6.   7 Oct Overcoming these deficiencies 1
7.  14 Oct Overcoming these deficiencies 2
8.  21 Oct Architectures for survivability 1
9.  28 Oct Architectures for survivability 2
10.  4 Nov Reliability in perspective
11. 11 Nov Security in perspective
12. 18 Nov Architectures for survivability 3
--  25 Nov Thanksgiving.  No class.
13.  2 Dec Implementing for survivability
14.  9 Dec Conclusions

Classes
- - - - - - - - - - - - - - - - - - -
Classes will generally consist of about an hour and fifteen minutes (lecture-style, with some questions permitted at UMd), a break, and then informal discussion driven to a considerable extent by student questions for the remainder of the class period (with possibly some remote questions where first-level satellited video facilities make that feasible).

Projects
- - - - - - - - - - - - - - - - - - -
In lieu of a final exam, a class project will be due at the last class, on 9 December. A few suggestions for possible topics are at
http://www.csl.sri.com/neumann/umd.html. Proposals for projects are due by e-mail no later than 22 October, with responses of acceptances or required iteration to be available by e-mail prior generally about a week later. Earlier submissions will allow for more iteration if needed and more time for your project. Projects should be scoped not to exceed the normal load of homework, studies, and final exam in a comparable course. Group projects may be proposed, although they should clearly reflect who is going to do which portions of the total effort.

Illustrative Project Topics
- - - - - - - - - - - - - - - - - - -
A report analyzing a relevant topic, such as robust algorithms, authentication, preventing service denial, roles of public-key cryptography, how to robustify open-source components
A paper design of a survivable subsystem or component and how it might integrate into a robust entirety.
A robustification effort based on existing open-source software, either as a paper or as a software development.
An analysis of how to eliminate or mitigate a common software flaw (e.g., buffer overflow): via programming languages, precompiler, style or system constraints, misuse+anomaly detection, etc.
A "challenge" from Computer-Related Risks

Survivability
- - - - - - - - - - - - - - - - - - -
In the present context, survivability is the ability of a computer-communication system-based application to satisfy in an ongoing basis certain specified critical requirements (for example, security, reliability, real-time responsiveness, and correctness), in the face of adverse conditions. In some cases, survivability may require reconfigurability, interoperability, etc.
Anticipated adversities might typically include hardware faults, software flaws, attacks on systems and networks perpetrated by malicious users, accidental misuse, environmental hazards, unfortunate animal behaviors, acts of God, etc.

Survivability Challenges
- - - - - - - - - - - - - - - - - - -
System and network survivability ultimately depends on the reliability, fault tolerance, security, performance, and operational robustness of many constituent systems and networks.
Operational survivability depends on hardware, software, power, communications, environmental factors such as electromagnetic interference, extreme weather, earthquakes, etc.
Operational survivability also depends on the knowledge, training, and integrity of many people (procurement officers, system developers, administrators, users, and even innocent bystanders) and the dynamic ability of system administrators to perform emergency maintenance

Infrastructures
- - - - - - - - - - - - - - - - - - -
Critical national infrastructures: telecommunications, power and energy, water, transportation, banking and finance, emergency services, continuity of government, etc.
See http://www.pccip.gov.
Information infrastructures: the Internet and private nets
Computer-communication systems: operating systems, database management systems, networking software, ...

Difficulties To Be Overcome
- - - - - - - - - - - - - - - - - - -
There are generally no easy answers. Cookbook approaches are insufficient.
Pervasive understanding of the fundamentals is essential, not just a scattering of so-called expertise.
Survivability is an overarching emergent property. That means it is a property of a system or network in the large, not just a property of its components that can be merely stuck together. Consequently, survivability of a whole system cannot be analyzed locally. It depends on many factors (discussed below).
Composition of subsystems is often unpredictable, with unanticipated side-effects.
Survivability is not easily retrofitted onto nonsurvivable applications.
System survivability is often inadequate in existing systems, which are terribly deficient (for example, in security and reliability)

Basic Needs
- - - - - - - - - - - - - - - - - - -
Definitive survivability requirements and subrequirements
Generic system/network architectures
Robust protocols and open-system components
Realistic operational prototypes with nontrivial survivability characteristics
Fully fledged robust commercially supported systems, nonproprietary where possible
Systematic use of good cryptography for authentication, integrity, and confidentiality
Applicable research and development, well integrated into robust systems

Concepts
- - - - - - - - - - - - - - - - - - -
Noncompromisibility from outside, from within, from below, in the presence of all survivability-relevant threats
Trustworthiness, dependability, assurance
Generalized composition, encompassing all useful modes of subsystem composition
Generalized dependence, compensating for and overcoming underlying untrustworthiness
Generalized survivability, based on many generalized-dependence mechanisms

Compromises
- - - - - - - - - - - - - - - - - - -
Compromise from outside: from above, or laterally at the same layer (e.g., access with no authorization, exploiting a logic flaw). Letter bombs; spoofing; penetrations
Compromise from within: using privileges of the given layer. Programming errors; internal Trojan horses; authorized misuse
Compromise from below: from a lower layer (e.g., hardware-based attacks on software, OS-based attacks on applications). Sniffers; Ken Thompson's C Compiler Trojan horse; hardware flaws or alterations

Illustrative Compromises

Layer of    Compromise       Compromise             
abstraction from outside;    from within;           
            Needs exogirding Needs endogirding      
----------------------------------------------------
Outside                      Acts of God,           
environment                  earthquakes,           
                             lightning, etc.        
----------------------------------------------------
User        Masqueraders     Accidental mistakes    
                             Intentional misuse     
----------------------------------------------------
Application Penetrations of  Programming errors     
            application      in application code    
            integrity                               
----------------------------------------------------
Middleware  Penetration of   Trojan horsing of     
            Web and DBMS     Web and DBMS          
            servers          servers               
----------------------------------------------------
Networking  Penetration of   Trojan horsing of    
          routers, firewalls network software     
          Denials of service                      

----------------------------------------------------
Operating   Penetrations of   Flawed OS software    
system      OS by             Trojan-horsed OS      
            unauthorized      Tampering by          
            users             privileged            
                              processes             
----------------------------------------------------
Hardware Externally generated Bad hardware design   
         electromagnetic or   and implementation    
         other interference   Hardware Trojan horses
         External power-      Unrecoverable faults            
         utility glitches     Internal interference           
----------------------------------------------------
Inside      Malicious or      Internal power supplies,    
environment accidental acts   tripped breakers,       
                              UPS/battery failures  
----------------------------------------------------

Illustrative Compromises, continued

Layer of    Compromise           Compromise
abstraction from within;         from below;
            Needs endogirding    Needs undergirding 
-------------------------------------------------------
Outside     Acts of God,         Chernobyl-like
environment earthquakes,         disasters caused
            lightning, etc.      by users or operators 
-------------------------------------------------------
User        Accidental mistakes  Application system 
            Intentional misuse   outage or service denial  
-------------------------------------------------------
Application Programming errors   Application (e.g., DBMS)
            in application code  undermined within 
                                 operating systems (OSs)
--------------------------------------------------------
Middleware  Trojan horsing of    Subversion of 
            Web and DBMS         middleware from OS 
            servers              or network operations
--------------------------------------------------------
Networking  Trojan horsing of    Capture of crypto 
            network software     keys within the OS 
                                 Exploitation of lower 
                                 protocol layers    
--------------------------------------------------------
Operating   Flawed OS software   OS undermined from 
system      Trojan-horsed OS     within hardware: 
            Tampering by         faults exceeding fault
            privileged           tolerance; hardware 
            processes            flaws or sabotage
--------------------------------------------------------
Hardware    Bad hardware design  Internal power 
            and implementation   irregularities
            Hardware Trojan horses          
            Unrecoverable faults            
            Internal interference           
--------------------------------------------------------
Inside      Internal power supplies,    
environment tripped breakers,       
            UPS/battery failures  
--------------------------------------------------------

Trust and Trustworthiness
- - - - - - - - - - - - - - - - - - -
Trust and Trustworthiness are different.
Some (sub)systems are more trustworthy with respect to certain threats (to security, reliability, etc.) because their design and implementation reduce the likelihood of being compromised in those respects.
Dependability is a perception of trustworthiness associated with a specific requirement
Assurance is a measure of faith that can be placed in trustworthiness and dependability.
Some (sub)systems are trusted to behave properly even when they are untrustworthy. This is a very serious problem. Misplaced trust can lead to compromises.

Generalized Composition
- - - - - - - - - - - - - - - - - - -
Composition must accommodate serial connections, feedback, hierarchical layering, networking, collateral dependence, mutual suspicion, clients/servers, guards, ... without undermining the given requirements and component interoperability
Many commercial systems are not easily composed with other systems.
Much theoretical research on composition is not applicable to realistic systems. TCSEC criteria are deficient (Orange Book, Red Book).
Trustworthiness depends critically on the noncompromisibility of compositions. Compositions may be untrustworthy despite local trustworthiness of the components.

Generalized Dependence
- - - - - - - - - - - - - - - - - - -
Strict dependence (the hierarchical "uses" relation of Parnas 1974) demands strict correctness of underlying mechanisms to attain correctness of a given mechanism.
Generalized dependence enables acceptable operation despite faults, failures, errors, and misbehavior or misuse of underlying mechanisms. It compensates for, bypasses, or otherwise overcomes errant behavior. Thus, it enables the enhancement of trustworthiness.
Generalized dependence approaches the problem of trying to "make a silk purse out of a sow's ear." It can succeed quite effectively in certain respects in various cases (as considered next).

Identified Types of Generalized Dependence for Enhancing Trustworthiness (See Section 1.2.5 of
http://www.csl.sri.com/neumann/arl-one.html.)
- - - - - - - - - - - - - - - - - - -
Reliability: error-correcting coding, Moore-Shannon and von Neumann theories, Byzantine algorithms, self-resynchronization, and fault-tolerance techniques generally
Security: Domains with mutual suspicion, trustworthy firewalls, guards, and wrappers, crypto and especially multikey or Byzantine crypto, real-time analysis and response, but not necessarily kernels and TCBs, which assume bottom-up trustworthiness
Survivability: Run-time checks on all requirements, underlying security and reliability as well as survivability, real-time analysis extended to survivability

Generalized Dependence:
Error-Correcting Codes
- - - - - - - - - - - - - - - - - - -
Claude Shannon long ago showed how to signal through a noisy channel with arbitrarily high reliability by using sufficient redundancy.
The field of error-correcting codes has evolved considerably since then, with codes designed to detect and/or to correct certain patterns of errors: random, asymmetric (e.g., 1-to-0 only, or 0-to-1 only), bursty, or otherwise correlated, in block, variable-length, and sequential communications, as long as the required redundancy does not cause the available channel capacity to be exceeded. The best known and simplest class of error-correcting codes is the Hamming Code, which can correct any random single-bit error in a code word of length n, in which k bits are information bits, and n-k bits are redundant check bits, where n = 2^n-k - 1 ; in principle, the n-k redundant bits provide 2^n-k - 1 error syndromes pinpointing where the (single) error occurred, and an all-zero syndrome denotes the case of no error. (More later.)

Generalized Dependence: Moore and
Shannon 1956: "Crummy Relays"
- - - - - - - - - - - - - - - - - - -
Given a box of relays of unknown goodness, test them for reliability. Throw out those with failure probability near one-half. Accept those with lesser failure probability, using positive logic. Reverse the logic on those with greater failure probability so that they then be considered to have failure probability less than one-half. As an oversimplified example, suppose each relay has failure probability p less than one-half when it is required to provide a closed path. Then, because only one good path is sufficient to close the circuit, a parallel circuit of n logically equivalent relays with identical failure probabilities has an aggregate failure probability pⁿ, which can be arbitrarily small for large enough n. Then, use serial-parallel combinations to attain arbitrarily low failure probabilities.

Generalized Dependence:
Byzantine Agreement
- - - - - - - - - - - - - - - - - - -
Whereas error-correcting codes make some assumptions about the transmission or storage medium and about the worst-case failures, and Moore-Shannon circuits make assumptions about local failure probabilities, Byzantine agreement makes no local assumptions about any components. Each component that fails may fail in any arbitrarily nasty way.
For example, a Byzantine clock (Lamport et al. 1982) with 3k+1 subclocks can tolerate arbitrary failure modes in any k clocks, including malicious modes. (This cannot be done with fewer clocks unless constraining assumptions are made.)

Generalized Dependence for Security:
Guards, Firewalls, and Wrappers
- - - - - - - - - - - - - - - - - - -
A security guard might try to prevent the outflow of sensitive information and prevent the inflow of Trojan horses that could inflict damage on the guarded system.
A firewall might try to prevent unauthorized users from accessing an internal system from outside, and prevent certain types of access as well. Unfortunately, some types of traffic (e.g., http) can compromise internal security.
A wrapper might try to mediate access to a flawed component and to block harmful use either on the way in or the way out. In principle, these components attempt to overcome the shortcomings of the systems that they supposedly protect, in the spirit of generalized dependence.
Today's guards, firewalls, and wrappers tend to be incomplete in their requirements, flawed, and often bypassable, penetrable, and compromisible from within and below. "Trusted paths" to PCs are a challenge.
Even if they were more trustworthy, there would still be risks of compromise.

Generalized Dependence for Security: Protection Domains
- - - - - - - - - - - - - - - - - - -
In principle, it should be possible to confine the results of execution within a domain, strictly confining any would-be side-effects. This would be particularly valuable with mobile code.
In practice, computing enclosures tend to leak profusely, because of weak hardware and poor software security.
Domains are an old idea, going back to Multics rings in 1965, Mike Schroeder's 1972 thesis, and capability architectures. Unfortunately, they are seldom used to realize their potential.

Security Instances of Generalized
Dependence: Cryptography
- - - - - - - - - - - - - - - - - - -
Cryptography provides the ability to use a completely unprotected storage or communications medium for sensitive information, to establish integrity of otherwise potentially untrustworthy code and data, and to authenticate the identity of otherwise potentially unknown entities.
Cryptography is almost always vulnerable to compromise from within and below (and denial of service attacks). Compromises are usually easier to perpetrate than brute-force exhaustive cracking of keys. However, 56-bit DES keys can now be broken (e.g., Deep Crack) in a few days with a single computer system, and much more rapidly with multi-system approaches (e.g., Distributed Crack).

Commonality among Generalized Dependence
- - - - - - - - - - - - - - - - - - -
In all these examples, there are of course pitfalls, including risks of subversion and limitations of the theory when applied to practice. For example, single-error-correcting codes can miscorrect multiple errors. Byzantine systems fail if there are more than k out of 3k+1 misbehaving systems. Firewalls can be penetrated if they are configured to permit certain traffic through. Wrappers and almost everything else can be compromised from within or from below.

Exploiting Generalized Dependence
- - - - - - - - - - - - - - - - - - -
Survivability depends (in the sense of generalized dependence) on security, reliability, fault-tolerance, performance, ...
Attaining survivability requires a combination of many techniques for ensuring necessary lower-layer properties, including relevant generalized-dependence mechanisms.
In addition, real-time monitoring with detection of survivability-relevant anomalies, automated diagnosis, the ability to correlate across multiple platforms, and stable reconfiguration are survivability-relevant instances of generalized dependence.

Mandatory Policies (introduction)
- - - - - - - - - - - - - - - - - - -
In principle, mandatory policies have the potential advantage that, once they are established, they cannot be compromised from above. They are very valuable as architectural concepts, although there are many practical problems (as we shall see).
MLS: Multilevel security (Bell and LaPadula)
MLI: Multilevel integrity (Biba)
MLA: Multilevel availability (Neumann-Proctor-Lunt; conceptual)
MLX: Multilevel survivability (new concept, primarily conceptual)
These are considered as architectural concepts later on.

Reading for the Next Class Period
- - - - - - - - - - - - - - - - - - -
Read Chapter 1 of Computer-Related Risks. Pick some of the topics that interest you most in Chapters 2 and 3, and read at least the chapter sections that seem most relevant to survivability.
Consider the Illustrative Risks document on-line at
http://www.csl.sri.com/neumann/illustrative.html (or .ps or .pdf) and think about how many of those risks cases are survivability related. (Note: The descriptor symbol V is used to denote cases that are particularly surViVability related, although many other cases are also survivability relevant.)