Prev Up Next
Go backward to 1 Introduction and Overview
Go up to Top
Go forward to 3 Threats to Survivability

2 Survivability-Related Risks

ENPM 808s
Information Systems Survivability:
2. Survivability-Related Risks

- - - - - - - - - - - - - - - - - - -
Survivability-related risks, causes, effects, analysis of case histories, common modalities, lessons to be learned (Y2K, development fiascoes, etc.)

Source materials: Computer-Related Risks and the Illustrative Risks summary
http://www.csl.sri.com/neumann/illustrative.html (or .ps or .pdf)

Considerable background for this class period is found in Chapters 2 and 3 of Computer-Related Risks, in the archives of the ACM Forum on Risks to the Public in the Use of Computers and Related Technologies, and summarized in one-liners in the the Illustrative Risks document on-line
http://www.csl.sri.com/neumann/illustrative.html
(or .ps or .pdf) referring to many of those cases.
Practical Realities
- - - - - - - - - - - - - - - - - - -
Enormous risks exist in all of the critical national infrastructures, telecommunications, electric power, pipelines, transportation, etc. See the President's Commission on Critical Infrastructure Protection (http://www.pccip.gov) These infrastructures all depend critically on each other, and on information infrastructures.

The U.S. Government and the Dept. of Defense are increasingly dependent on off-the-shelf systems. Those systems are not adequately reliable, available, or secure, not adequately interoperable, reusable, maintainable or evolvable, and consequently not adequately survivable.

Survivable systems and networks require extraordinary human care. Their requirements vastly exceed what is found in practice.

Serious security and reliability vulnerabilities abound in existing systems. Potential threats are dramatically outstripping capabilities. The risks are innumerable. Systems and networks are also very difficult to maintain without introducing new vulnerabilities.

System development and procurement processes are abysmal. Consider some recent fiascoes: the ADATS tank-based anti-helicopter missile system, the Air Traffic Control modernization, the IRS Tax Systems modernization, the FBI fingerprint system and NCIC-2000 development, the impending Y2K crisis, etc.

System operational experience is abysmal. Consider the Yorktown dead in the water for almost three hours due to a divide-by-zero in a Windows NT application; AEGIS software problems; etc.

Significant advances in research and development relating to survivability are very slow in finding their way into the commercial marketplace, and into adoption by the critical-infrastructure providers.

Some Large-Scale Network Outages
- - - - - - - - - - - - - - - - - - -
1980 ARPAnet collapse. Hardware omission (no memory parity), flaky garbage collection, two hardware errors, propagated to entire net: four-hour outage.

Internet service outages. 1997 black hole due to routing table error: 50,000 addresses pointing to one site. 1986 Northeast disconnect.

Network Solutions Internet domain table error blocks .com and .net for four hours.

1990 AT&T long-distance calling blockage due to SS7 software flaw: crashes propagated incessantly for 11 hours.

1998 AT&T packet-data frame-relay network collapsed US-wide: faulty software upgrade gave 26-hour black-hole effect.

More Large-Scale Case Histories
- - - - - - - - - - - - - - - - - - -
Summer 1996 Western power outages: 2 July 1996 over 10 Western states (tree, operator, propagation). 10 August 1996 affected 8 million customers in 8 states, Canada, Baja.

May 1998 Galaxy IV satellite failure, primary and backup systems both failed, disabling 40M pager systems, etc. Reconfiguration problematic, moving satellites or antennas, reprogramming, ...

1997 Kansas City power outage triggered national air-traffic snarl: operator shutdown wrong system half in preventive maintenance. Many other air-traffic propagation effects.

Military Operational Risk Cases
- - - - - - - - - - - - - - - - - - -
Vincennes' AEGIS shootdown of Iranian Airbus

Stark operation against Iraqi Exocets

Recurring Black Hawk problems: EMI, hydraulic failures, friendly fire over Iraq

Patriot missile system clock drift

Sgt York gun software problems

Ships out of commission: USS Hue City and USS Vicksburg software integration problems, plus USS Yorktown noted above

Tomahawk missile test abort

Some Localized Case Histories
- - - - - - - - - - - - - - - - - - -
Soviet Phobos I and Phobos II failures (misguided ground control)

1984 TDRS outage (cosmic rays)

1986 Challenger disaster (cold O-ring)

Skylab orbit alteration (sunspots)

Atlas-Centaur program alteration (lightning)

emh3.arl.mil 4-day delay on incoming e-mail 24 Sept 1998; dockmaster.nscs.mil loss of week's worth of e-mail in Feb 1998 upgrade.

More Accidental Denials of Service
- - - - - - - - - - - - - - - - - - -
Telephone system and power outages due to cable cuts, rodents, hardware, software, inadvertent battery usage, Hillsdale fire, floods, air conditioning failures, ...

Collapses of air- and auto-traffic control systems

ATM, bank, credit-card system outages, business interruptions

Electromagnetic interference: Sputnik, Reagan's AF-One, Ft.Detrich (garage-door openers!); MtDiablo/Navy; Quebec power; nuclear reactor; effects on flight controls?

eBay's three-time major losses of service, hardware, software, net connection (after creating hot standby, which was useless in the absence of the net connection!)

See the Illustrative Risks compendium for a summary of many cases involving failures of survivability, security, reliability, safety, and human well-being.
A Few Cases of Sabotage
- - - - - - - - - - - - - - - - - - -
1987 Australian communications blackout (24 cables severed in 10 locations)

1997 San Francisco blackout (switches thrown) (RISKS-19.42)

Atlas rocket database tampering (RISKS-11.95, 12.60); malicious password changes (e.g., Washington D.C. city computer blocked), etc. Many other cases in Illustrative Risks, Trojan horses, logic bombs, time bombs, viruses, Encyclopedia Brittanica, Italian newspaper, Arts Assets, election frauds, ...

Hacked and Altered Web Sites
- - - - - - - - - - - - - - - - - - -
Justice Department (RISKS-18.35)
CIA (RISKS-18.49)
Air Force (RISKS-18.64)
NASA (RISKS-18.88)
Three Army Web sites (RISKS-19.63)
FBI, U.S. Senate, Department of the Interior (RISKS-20.43)
Other cases in Illustrative Risks
The ease with which this can be done is really absurd!

But the lessons being drawn are evidently the wrong ones (e.g., arrest more hackers). The real lesson should be that the computer-communication infrastructure is very weak.

Sources of Risk
- - - - - - - - - - - - - - - - - - -
The system development process is full of risks in conceptualization, requirements, models, design, implementation, support systems, testing and verification, evolution and maintenance, retrofitting, ... [Problems here are considered in greater detail in the class period on deficiencies.]

System operation and use are also risky: Hardware malfunctions, power failures, environmental factors, software flaws, human (mis)behavior, ...

Proprietary code is often enormous in size, inscrutable in not being subject to outside analysis, inflexible in the unavailability of source code, and likely to force future systems into noninteroperable standards.

People are always an enormous source of risks.

Causes and Effects
- - - - - - - - - - - - - - - - - - -
Reliability and security are often closely interrelated.

Survivability, like reliability and security, is a weak-link problem. Even with defensive design, there may be weak links or combinations of weak links.

Similar effects may result from either reliability problems, or security attacks, or both: 1980 ARPAnet collapse, 1986 New England separation from the ARPAnet, 1990 AT&T long-distance collapse, electromagnetic interference.

Known potential reliability problems create ideal opportunities for attacks, e.g., at midnight between 31 Dec 1999 and 1 Jan 2000. It may be very difficult to diagnose the causes.

Discussion Topics: Related Risks
- - - - - - - - - - - - - - - - - - -
Preliminary consideration of lessons to be learned from the survivability-related risks

Do other risks that we have not considered (e.g., human safety, privacy, person well-being, financial fraud, medical health, etc.) have lessons for us that are relevant to survivability? Are they more of the same, or are there real differences?

Reading for the Next Class Period
- - - - - - - - - - - - - - - - - - -
Read Chapter 2 (survivability threats) of the arl-one report http://www.csl.sri.com/neumann/arl-one.html.

Read Chapter 4 of Computer-Related Risks. (It is very short.)

Note: The second lecture material on risks is likely to continue somewhat into the third class period.


Prev Up Next