Copyright Peter G. Neumann, June 1999,
although freely quotable with appropriate acknowledgement.
THIS DOCUMENT includes NEW MATERIAL that supplements the fourth printing of the book, Computer-Related Risks, Addison-Wesley, 1995, ISBN 0-201-55805-X. (An ERRATA LIST for first three printings is available at http://www.csl.sri.com/neumann.html in case you have an earlier printing.)
For the foreseeable future, there will be no second edition. Instead, subsequent printings will simply refer to more recent on-line material. I have edited several Inside Risks articles, and shown how they relate to the existing book chapters. This version includes material through May 1999. Recent material will continue to be added progressively.
For my convenience, if not for yours, the chapter and section numbers are coordinated with the printed book. The indexed page references are for this incremental draft.
I am grateful to Otfried Cheong's Hyperlatex for making this on-line browsable version so easy to produce. Cross-references, references, and index items are all clickable. The unbound crossreferences to existing sections of the printed book will be correctly inserted later. This on-line supplement to the book is intended to be a living document that will continue to grow, and which is intended to supplant the need for a second edition. There is no other way to keep the printed book up-to-date.
The reader is encouraged to look at my on-line one-liner RISKS index,
Illustrative Risks to the Public in the Use of Computers and Related
Systems, which is accessible in
form http://www.csl.sri.com/neumann/illustrative.html and also in .pdf and
.ps form. Newer references favor the on-line Risks Forum directly, rather
than the less accessible RISKS Sections of the ACM Software Engineering
Notes. The on-line RISKS archives are available with a classy search
PREFACE TO THE SECOND EDITION
Much has happened since this book originally went to press. There have been many new instances of old problems previously documented, but relatively few new types of problems. In some cases, the technology has progressed - although in those cases the threats, vulnerabilities, risks, and expectations of system capabilities have also escalated. On the other hand, sociopolitical considerations have not contributed noticeably to any lessening of the risks. Basically, all of the conclusions of the book seem to be just as relevant now - if not more so. Thus, the second edition amplifies and clarifies rather than modifies.
Several recent successes are noted. For example, subsequent to the retrofits to overcome the initial mirror flaw in the Hubble Space Telescope (see Page 204 of the book), further hardware and instrumentation upgrades have enabled some spectacular new discoveries. (However, one of three NICMOS cameras remains out of focus.) Also, recent Space Shuttle missions seem to have had fewer problems than those recorded for the earlier flights - although minor problems have continued, and the Space Station has undergone many difficulties.
Here are just a few specific recent problems of particular note:
There are many additional cases that can be characterized as "more of the same."
A summary of essentially all interesting RISKS cases can be found at ftp://ftp.csl.sri.com/illustrative.PS. My Web site http://www.csl.sri.com/neumann.html contains further information, including my 1996 testimony for the Permanent Subcommittee on Investigations of the Senate Committee on Governmental Affairs  on security risks in the infrastructure, analysis of risks relating to the Social Security Administration's PEBES (Personal Earnings and Benefit Estimate Statement) Web site and related identity-related risks, and my 1997 testimony for the Senate Judiciary Committee on risks in key-recovery.
If you wish to catch up with recent events and you are able to browse the Internet, you are encouraged to peruse the RISKS archives - at ftp://ftp.sri.com/risks or http://catless.ncl.ac.uk/Risks/VL.IS.html (where VL is the VoLume number and IS the ISsue number). The ftp.sri.com site directory "risks" and ftp.csl.sri.com both contain the most recent PostScript copy of my comprehensive historical summary of mostly one-line descriptions of the RISKS cases in the file, illustrative.PS. Further instructions for on-line access are given in Appendix . Additional material can also be found in each regular issue of the ACM Software Engineering Notes and in the Inside Risks column in each issue of the Communications of the ACM.
Internet routing black hole. On April 23, 1997, at 11:14 a.m. EDT, Internet service providers lost contact with nearly all of the U.S. Internet backbone operators. As a result, much of the Internet was disconnected, some parts for 20 minutes, some for up to 3 hours. The problem was attributed to MAI Network Services in McLean, Virginia (www.mai.net), which provided Sprint and other backbone providers with incorrect routing tables, the result of which was that MAI was flooded with traffic. In addition, the InterNIC directory incorrectly listed Florida Internet Exchange as the owner of the routing tables. A "technical bug" was also blamed for causing one of MAI's Bay Networks routers not to detect the erroneous data. Furthermore, the routing tables Sprint received were designated as optimal, which gave them higher credibility than otherwise. Something like 50,000 routing addresses all pointed to MAI.1
Internet nameserver problem affects .com and .net domains. Around 11:30 p.m. EDT on July 16, 1997, Network Solutions Inc. attempted to run the autogeneration of the top-level domain zone files, which resulted in the failure of a program converting Ingres data into the DNS tables, corrupting the .com and .net domains in the top-level domain name server (DNS), maintained by NSI. Quality-assurance alarms were evidently ignored and the corrupted files were released at 2:30 a.m. EDT on July 17 -- with widespread effects. Other servers copied the corrupted files from the NSI version. Corrected files were issued four hours later, although there were various lingering problems after that.2
Network Solutions goof bumps Nasdaq off the Internet (Will Rodger, RISKS-19.34)
The Nasdaq stock exchange was knocked off much of the Internet for several hours on 19 Aug 1997 as a result of administrative errors at the InterNIC, a centralized Internet address clearinghouse run by Network Solutions Inc. of Herndon, Va. Though the problem was initially invisible to Nasdaq, which maintains its own database of Internet addresses, the temporary suspension of access to the exchange's site blocked users of major computer networks - including those owned by IBM Corp., MCI Communications Corp., PSINet Inc. and UUnet Technologies Inc. As a result, Nasdaq was unreachable to most Internet users for at least several hours Tuesday morning. Problems with the Web site had no effect on the functioning of Nasdaq itself. The snafu was due to a clerical error at NSI, which evidently lost track of Nasdaq's $50 fee, submitted in October 1996. [PGN Abstracting, from article by Will Rodger, in Inter@ctive Week Online, 21 Aug 1997]
Will remarked that things like this seem to be occurring more often. The weekend before, more than 5,000 Web sites were blocked for over 24 hours, when Web Communication Inc and other domains were bumped from the Internet after a screw-up in routine InterNIC maintenance.
Redundant virtual circuits both fail. A report from Finland indicated that the main and reserve lines between Oulu and Kajaani went through the same physical circuit, despite an agreement with Finnnet that they should be separate.3
MCI Internet gateways choked. MCI's inbound Internet gateways were saturated during July 1994, resulting in days of delay in delivering e-mail to MCI customers. A fix was considered to be months in the offing.4
Vandals cut cable, slow MCI service. MCI's telephone traffic between New York City and Washington was disrupted for almost four hours when vandals removed a 20-foot section of fiber-optic cable in Newark on August 26, 1994.5
Netcom crash. Netcom, Inc. (now part of ICG Communications Inc.) went down for more than 14 hours during the week of June 17, 1996, because of an extra "&" in the border gateway protocol code in the MAE-East router in the Washington, D.C., area. Recovery required that all of the more than 100 routers be brought down.6
Prodigiously prodigal Prodigy commercial. Alan Wexelblat reported seeing a commercial for Prodigy's on-line computer service during Game 6 of the 1994 Stanley Cup finals on ESPN. The ad cut to a live computer screen showing Prodigy. Suddenly, a big window came up on the screen, saying communication error. The ad was talking about how great the hockey game was, but that it it didn't compare to the excitement available on Prodigy. Apparently, at that time Prodigy users observed that the system locked up for almost a minute, and then their screens went completely blank. ESPN quickly cut away to another commercial. The curse of the live demo!7
Prodigy misdirects or loses e-mail messages. A software glitch on March 10, 1995, caused Prodigy's e-mail system to send 473 e-mail messages to incorrect recipients and to lose 4,901 other messages. The system had to be shut down for five hours.8
Microsoft, AT&T, AOL netwoes. Microsoft shut down its nationwide network on June 23, 1996, for 10 hours as part of an intended backup power-supply upgrade, but the upgrade failed and they had to try again.
AT&T had to shut down its Internet access for up to 8 hours each week, for maintenance.
America Online was out of service for an hour on June 19, 1996, when a planned system software upgrade backfired.9 AOL's computer systems (near the Dulles Airport facility in Virginia) went down at 4 a.m. EDT on August 7, 1996. Service was reportedly restored sporadically 19 hours later, around 11 p.m. EDT. The crash was caused by new software installed during a scheduled maintenance update. Earlier in the same week an AOL representative had said that AOL computers are "virtually immune" to this kind of outage.10
On December 2, 1996, AOL's main server building flooded, knocking out the entire AOL network for hours and denying E-mail service for hours more after that.11 On February 5, 1997, AOL's network succumbed to a problem during a software upgrade, and was off the air for more than two hours.12 More extensive AOL e-mail outages were required in early April 1997, when service was suspended for several days in order to do an upgrade.13 Explosion causes Internet blackout in New England (Edupage, R 19 29-30)
More than 200 New England businesses experienced a four-hour Internet blackout on 7 Aug 1997 after an explosion knocked out electrical power in the Boston area. One person was killed in the blast, which overloaded a panel switch at MIT, causing a fire and cutting off Internet access to BBN Planet customers. Access resumed around 10:00. The speed with which the incident happened made it impossible to reroute traffic, said a BBN spokesman. (TechWire, 8 Aug 1997; Edupage, 10 Aug 1997)
No network, no demo (Martin Minow)
Larry Ellison, CEO of Oracle Inc, and a strong proponent of network computers, was demo-ing his network computer at the Oracle OpenWorld conference. Unfortunately, the network crashed and the application hung "and Ellison was left hanging on stage."
Attack on fiber-optic cables causes Lufthansa delays. On February 1, 1995, unknown attackers severed 7 fiber-optic cables near the Frankfurt/Main airport. About 15,000 telephone lines were interrupted. The cables also carried data for Lufthansa's booking computers; consequently, new reservations had to be made manually. As Lufthansa's main computers (at Frankfurt airport) were cut off for some time, delays of up to 30 minutes were caused.14
Ground-cable removal blows Iowa City phone system upgrade. On November 19, 1994, Iowa City's US West telephone system shut down at about 3:30 p.m., local time, and service was gradually restored between 7:30 and 9:30 p.m, affecting about 60,000 people. Analysis showed that a new switching system had been installed in July 1994. In removing the old system, an electrical grounding cable had been inadvertently removed.15
Garbage-truck worker wipes out telephone service. A cowboy garbage-truck driver in Oregon playing the game of "swing the cables" with his fork lift accidentally severed a cable that disrupted service for a wide area of subscribers.16
Disruption from stolen cables. In Ulan-Ude, Russia, a man harvested 60 meters of cable, disabling external phone service on June 19, 1997. Previously, 2 thieves in eastern Kazakhstan were electrocuted trying to steal high-voltage copper wires. In a much older case recalled by Cliff Krieger, a computer backup system failed when it was needed because a cable had been stolen at the Korat Royal Thai Air Force Base in 1973.17
Swedish telephone outage (Danny Kohn) (R 20 29)
After a number of ISDN outages last year and some this year in the country, our nationally owned telco Telia had two big outages in the capital of Stockholm. It happened the first time 15 Mar 1999, when millions of phone lines including the police headquarters' PBX were unusable for 8 hours! The outage was repeated exactly a week later between 10:25am and 11:05am, when incoming calls to the police PBX and to another 250 business PBXs where blocked.
The second outage is explained as an intermittent error that disturbed the communication between PBXs and the telco equipment. In addition the software that would localize the problem had a bug so that the error would not display.
Comming to mind is that telco exchanges are often purchased in international competition. A telco operator can not see through the software. But given the complexity neither can the producer - we might not have bugs if they did. So, if a intruder paid by some nearby country wanted to, he could program some code "detonating" as a part of war attack.
Computer error costs MCI $millions. MCI reported that they will refund approximately $40 million due to a computer error. This was the aftermath (!) of a slight billing error uncovered by investigative reporters from a local television station, WRIC in Richmond, Virginia, who in pursuing it found that it was a widespread phenomenon.18
Bell Atlantic 411 outage. On November 25, 1996, Bell Atlantic had an outage of several hours in its telephone directory-assistance service, due apparently to an errant operating-system upgrade on a database server. For unknown reasons, the backup system also failed. The result was that for several hours 60% of the 2000 telephone operators at 36 sites had to take callers' requests and telephone numbers, look up the requested information in printed directories, and call the callers back with the information. Apparently, the problem was solved by backing out the software upgrade. This was reportedly the most extensive such failure since operators began using computerized directory assistance.19
MFS Communications switch fails, with widespread effects (Steven Bellovin)
Around 7 p.m. on the evening of 8 Sep 1997, the main MFS Communications switch (MFS Switch One) failed, downing UK telecommunications links provided by MFS, Worldcom, and First Telecom. The outage also affected most of CompuServe's UK customers, whose access is typically via an MFS phone number. [PGN Stark Abstracting. Evening usage is not necessarily off-peak, because it is an excellent time to access computers in the U.S. No one has yet reported how long it took to restore service. PGN]
Satellite transmission snafu leads to diplomatic incident (Nick Brown)
On 19 Jul 1997, a "technical error" caused the contents of a channel on a satellite (operated by France Telecom) to be transmitted on another channel, for about twenty minutes. Normally this would have been merely annoying for the viewers. However, these viewers were in (among other places) Saudi Arabia, the channel they expected to be watching was the French government-run, general interest and news station, Canal France International (CFI), and the program which replaced it was a hard-core pornographic movie that should have been shown on the subscription-only, encrypted French domestic station, Canal Plus. As a result, Arabsat cancelled its contract with France Telecom, claiming that France Telecom had not "honoured its commitment to respect Arabic and Islamic values." The French Foreign Ministry and the French Ambassador in Riyadh are trying to calm what has become a diplomatic incident.
Indian satellite failure (Scott Lucero)
According to the 6 Oct 1997 Daily Brief, officials in India say the country's most advanced communications satellite was abandoned on 5 Oct 1997 due to a power failure aboard the craft. The loss of the satellite reportedly affected communications to remote parts of the nation and the operation of satellite-dependent functioning of India's stock exchange. This appears to be an example of the familiar RISK of having a single point of failure, or, more colloquially, putting all your eggs in one basket.
Blown fuse takes out 911 system. A blown fuse took out a large portion of Iowa's 911 emergency phone system for three hours over the 1996 Thanksgiving weekend. U.S. West could not say how many 911 calls went unanswered. A spokesperson said that the problem came from the complexity of the system.20
San Francisco 911 system woes. San Francisco tried for at least three years to upgrade its 911 system, but computer outages and unanswered calls remain rampant. For example, on October 12, 1995, the dispatch system crashed for over 30 minutes in the midst of a search for an armed suspect (who escaped). The dispatch system was installed two months before as a temporary fix to the recurrent problems, and it too suffered unexplained breakdowns. Screens freeze; vital information vanishes; and roughly twice a week the system crashes. Dispatchers are not able to answer between 100 and 200 calls a day. Many nonemergency calls are also being lost. The reported extremely stressful working conditions seem similar to those experienced by air-traffic controllers. The 911 system collapsed again on November 4, 1995, for an hour; the absence of an alarm left the collapse undetected for 20 minutes.21
Software bug cripples Singapore phone lines. A bug in newly-installed computer software corrupted one of the two common channel signaling systems, affecting 26 out of 28 exchanges, and knocking out two-thirds of Singapore's telephone lines on October 12, 1994. Handphones, fax machines, pagers and credit cards were all hit by the disruption, which began at 11:31 a.m. in the City Exchange. It took Singapore Telecom's engineers about five hours to get services back to normal again. Fortunately the old backup system was still running side by side with the new system.22
Calling-Number ID ghosts calls. In early March 1995, a Detroit area woman looked at her Calling-Number Identification unit (misnamed Caller ID) and was puzzled to notice that it indicated 19 received calls that evening, even though only one person had called. Then she checked the names listed. John F. Kennedy. Thomas Paine. Harry S Truman. John Hancock. Ulysses S. Grant. Samuel Clemens. Ronald Reagan. And many others. Most of the phone numbers were non-working, but a few were. A neighbor had also been plagued with phone calls for Abraham Lincoln. Ameritech believes the Caller ID box was probably a pre-programmed demonstration model, although a telecommunications consultant suspected the work of a phone hacker.23
Does CNID blocking really give you anonymity? From the time of an upgrade on January 1 until January 26, 1997, the mechanisms that were supposed to block Calling Number ID failed in the 510 and 415 areas codes. Numerous businesses with PBXs were able to obtain calling numbers despite presumed blocking.24
Telstar 401 catastrophic failure. On January 11, 1997, AT&T's Telstar 401 satellite went dead, with a full complement of both C and Ku band transponders. Technicians were unable to reestablish contact. The satellite normally carries both broadcast network and syndicated television programming. The networks, as "platinum" customers, were quickly switched to an alternative bird. Almost everyone else was scrambling to find transponder space for their programming. The risk? Don't assume that a satellite will always be there!25
SpaceCom technician disables millions of pagers. At the SpaceCom uplink facility in Tulsa, Oklahoma, an operator accidentally sent out a command shutting down the satellite receivers used by pager systems throughout the country, affecting millions of pagers. SpaceCom supports 5 of the largest 10 paging outfits. This happened at 1 a.m. on September 26, 1995, and each receiver had to be manually reprogrammed -- which took all day until most of the service could be restored. Apparently, the operator omitted a carriage return at the end of a line, which is sort of the inverse of intending to type rm *.log but accidentally fat-fingering the carriage return just after the asterisk.26
Playboy strikes again. TCI's cable-TV provider in Springfield, Missouri, was testing its planned inclusion of the Playboy Channel (to begin in February 1997), when the Cartoon Network Channel suddenly began airing the Playboy video along with the regularly programmed Flintstones' audio. The results were perhaps more noticeable than they might have been, because bad weather had closed the local schools and children were at home.27There seems to be something magnetically RISKS-attractive about the Playboy Channel, which appeared unscrambled in the Palo Alto area. A city-wide power outage (see Section ) on August 13, 1996 fried the Palo Alto Cable Co-op circuit board that normally scrambles the Playboy Channel, despite surge protection. When power was restored, the Playboy Channel went out unscrambled. To make matters worse, Co-op's phone system had died when the standby batteries ran down.28
A Playboy Channel program (PC is a nicely overloaded acronym, because the Personal Computer program was presumably not Politically Correct!) had previously appeared in the Jeopardy time-slot in the Chicago area for 10 minutes, due to a screwup.29
Woodpeckers delay shuttle launch. Yellow-shafted flicker woodpeckers chipped away at the insulating foam on the space shuttle Discovery's external fuel tank, causing at least 71 holes, from half-inch to four inches in diameter, and delaying the scheduled launch.30
Ariane-5 problems. Following the failure of the main cryogenic motor during an attempted Ariane-5 launch on May 5, 1995, and the death of two technicians resulting from asphyxiation due to a nitrogen leak (in Cayenne, at the French Guiana Space Centre), another test on May 30, 1995, was aborted by the computer control system several seconds after ignition of the new European rocket.31
On June 4, 1996, another Ariane-5 exploded, due to faulty software in the inertial guidance system. Software from Ariane-4 had been reused in Ariane-5 without testing. When subjected to the higher accelerations produced by the Ariane 5 booster, the software (calibrated for an Ariane 4) ordered an "abrupt turn 30 seconds after liftoff", causing the airframe to fail. Apparently, conversion from a 64-bit floating representation to a 16-bit signed representation caused an Operand Error.32
Final report on the SOHO spacecraft problems
We reported earlier on the NASA/European Space Agency Solar and
Heliospheric Observatory (SOHO) spacecraft on 24 Jun 1998 (R 19 87).
Nancy Leveson gave a preliminary analysis (R 19 90), followed by a
later note from Craig DeForest (R 19 94) summarizing the final report
of the Investigative Board, as follows. The proximal cause of the
loss was a mis-identification of a faulty gyroscope: two redundant
gyroscopes, one of which had been spun down(!), gave conflicting
signals about the spacecraft roll rate, and the ops team switched off
the functioning gyro. The spun-down gyro became SOHO's only
information about roll attitude, causing SOHO to spin itself up on the
roll axis until the pre-programmed pitch and yaw control laws became
unstable. This was the last in a series of glitches in the
operational timeline on 24 Jun; the full story is on-line
There were many other factors leading to the loss. The report reads like a roll call of well-known risky behaviors, including a staffing level too low for periods of intensive operations; lack of fully trained personnel due to staffing turnover; an overly ambitious operational schedule; individual procedure changes made without adequate systems level review; lack of validation and testing of the planned sequence of operations; failure to carefully consider discrepancies in available data; and emphasis on science return at the expense of spacecraft safety.
[Contact with SOHO was subsequently re-established, and - following thawing of the frozen hydrazine rocket fuel on board - full attitude control seems to have been restored, allowing recommissioning and testing of the spacecraft and instruments.]
Titan IV explodes with Vortex satellite; total cost over $1B The Lockheed-Martin Titan IV that began self-destructing at 20,000 feet only 40 seconds after liftoff from Cape Canaveral carried a top-secret satellite (code-named Vortex) for the U.S. National Reconnaissance Office. It was destroyed on ground command two seconds later. The Air Force gave no information on the cause. This was the final launch for this Titan IV model; future launches are already scheduled to use an improved model. [Source: Reuters item, 13 Aug 1998; PGN Abstracting]
Only two failures out of 25 launches is reportedly thought to be a reasonably good record, although this loss is expensive - $300M for the Titan, and between $800M and $1B for the satellite. Associated Press noted that a previous Titan IV failure occurred from Vandenberg AFB in August 1993. (There was also a Titan IV motor that blew up on the test stand on 1 April 1991 (R 12 09), as a result of a problem that seemingly could have been caught in simulation.) Further commentary in (R 19 93).
More satellite woes: Ikonos 1 lost, Titan 4B puts Milstar in worthless orbit; Delta III does same for Orion (PGN)
In 1994, the U.S. Government authorized Space Imaging to launch a private imaging satellite, for beneficial public uses. Ikonos 1 was finally launched on 27 Apr 1999, but contact was mysteriously lost 8 minutes later (R 20 36). No further details have emerged.
A $433M Titan 4B rocket launched on 30 Apr 1999 apparently triggered separation of the payload four hours early, and placed an $800M Milstar satellite in a low elliptical orbit rather than a geostationary one (R 20 36). The blame was placed on Lockheed Martin engineers loading faulty software (R 20 39). This was the third Titan failure in a row - following the Titan 4A with a Vortex satellite last August 1998 in a mission with comparable costs (R 19 91), and a missile warning satellite on 9 Apr 1999 stuck in a useless orbit.
Then, on 4 May 1999, a Boeing Delta III rocket launch dumped Loral's Orion intended geostationary communications satellite in an elliptical orbit with a max of 862 miles. A previous launch try two weeks before had gone to the countdown of zero, but a software flaw prevented ignition (R 19 38). The first Delta III launch ended after 71 seconds when the rocket exploded because of a software flaw that caused the hydraulic fuel to be expended prematurely.
Russian rocket blows 12 Globalstar satellites Globalstar (42% owned by Loral Space and Communications) used a Yuzhnoye (Ukraine) rocket for the 10 Sep 1998 launch from Baikonur (Kazakhstan) of 12 Globalstar satellites intended to be part of a world-wide wireless phone network. Two separate computer faults 4.5 minutes after launch reportedly resulted in the complete loss of the rocket and the satellites. [Source: Dan Fost, San Francisco Chronicle, 11 Sept 1998, A1 (R 19 95)]
Missing bounds check? Off-by-one error? Hardware? All your eggs in one basket? Not really. Globalstar is shooting for 52 low-orbit satellites. Cheaper by the dozen? This one cost $270M for the satellites ($190M expected to be covered by insurance!), and about $100M for the rocket.
Peter Ladkin (R 19 97) discussed reports that the malfunction resulted in the failure of the Zenit booster. The Energomash second-stage booster was shut down prematurely. Apparently, two of the three primary flight-control computers shut down.
Cruise Missile software bugs. During the Iraqi war, bomb damage assessment of the initial cruise-missile strike indicated that three of the 10 targets attacked by 13 Air Force CALCMs (Conventional Air-Launched Cruise Missiles) emerged with `no detectable damage.' The Boeing CALCMs (earlier intended as nuclear weapons) had been adapted for being launched from B-52H bombers over the Persian Gulf, but without making software changes necessary for the new uses.33
Accidental missile launch: color-code mixup. (R 18 40) The Canadian Navy mistakenly launched an unarmed missile at a town near Victoria, B.C. on August 28, 1996, hitting a residential garage and narrowly missing a food store and day-care center. Sailors were testing weapons systems aboard the HMCS Regina at 11 a.m. when the missile was fired at the town of View Royal on Vancouver Island. Apparently, an unarmed live missile had been substituted for the intended dummy, because of a mixup relating to the color-coding of the missiles. While the test called for a green "inert test set," which contains no propellant and therefore could not launch, a blue "inert practice round" was mistakenly used. The military has since suspended all such testing on both coasts and ordered an inquiry. Although nobody was injured, residents of the bedroom community of 6,000 people say things could have been much worse. Thirty-two children were a half-block away at the Tiny Tots Day Care Centre when the incident occurred.34
Navy software problems (Michael Stutz via Jim Horning) If you think Windows 98 is an upgrade nightmare, consider the task of adding a new combat system to a Navy cruiser. Last week the US Navy acknowledged that two prized battle cruisers (the USS Hue City and the USS Vicksburg) will be out of commission until further notice as engineers try to integrate new onboard weapons-control systems. "Microsoft comes out with upgrades every three years, and they crash all the time," said one Navy source, who spoke on condition of anonymity. "The Navy comes out with upgrades every five years, but we can't afford for our systems to have any glitches, so we have to make sure that we get it just right."
The heart of the problem lies with two new systems being built into the ships. The Aegis Baseline 6 system helps defend the vessels against air attacks, and the Cooperative Engagement Capability (CEC) system gathers and shares radar data from multiple ships. Engineers are having trouble getting the new systems to work with each other and with the ships' legacy software.
[Aegis is written in Ada and C++ and other languages, with the latest
upgrade reaching 8M lines of code, up from 3M. Installation is taking much
longer than expected. The problems are largely in integration and
interoperation, including a new display system, and are compounded by the
Navy not having source code. PGN Abstracting from "Navy Software Dead in
the Water" by Michael Stutz, 16 Jul 1998
USS Yorktown dead in water after divide by zero (R 19 88)
The Navy's Smart Ship technology is being considered a success, because it has resulted in reductions in manpower, workloads, maintenance and costs for sailors aboard the Aegis missile cruiser USS Yorktown. However, in September 1997, the Yorktown suffered a systems failure during maneuvers off the coast of Cape Charles, VA., apparently as a result of the failure to prevent a divide by zero in a Windows NT application. The zero seems to have been an erroneous data item that was entered manually. Atlantic Fleet officials said the ship was dead in the water for about 2 hours and 45 minutes. A previous loss of propulsion occurred on 2 May 1997, also due to software. Other system collapses were also indicated. (One quote suggested the ship had to be towed, but another refuted that.) [Source: Gregory Slabodkin, Software glitches leave Navy Smart Ship dead in the water, Government Computer News, 13 Jul 1998, PGN Stark Abstracting] Discussion in RISKS included further comments about Windows memory management, the use of NT, smart-ship technology, and COTS in battle-critical applications (R 19 88-92); doubts about official reports (R 19 91) and confusions therein (R 19 94), as well as speculations on the hardware behavior (R 19 92-93), and still more discussion (R 19 94). This case holds many lessons for the future, in the true spirit of RISKS, including a reminder from the 19th Century British Navy (R 19 89).
Revisiting the USS Yorktown dead in the water (Mike Martin, R 20 37)
The March 1999 Scientific American included a letter from from Harvey McKelvey, former director of Navy programs for CAE Electronics, the firm which apparently built the misbehaving Windows NT application on the Yorktown (R 19 88 ff.), widely attributed to an unchecked divide by zero. [PGN-ed]
McKelvey writes that the failure "was not the result of any system software or design deficiency but rather a decision to allow the ship to manipulate the software to stimulate [sic] machinery casualties for training purposes and the `tuning' of propulsion machinery operating parameters. In the usual shipboard installation, this capability is not allowed." McKelvey adds that CAE Electronics expressed "serious concern" when this test was proposed.
So it seems that as long as there are no "machinery casualties", everything will be fine. Then again, the incident may have provided useful information to improve system robustness. (Mike Martin)
Chinook helicopter engine software implicated? (Mike Ellims) (R 19 51)
In 1994, a Chinook helicopter crashed into hills on an island off the coast of Scotland, killing 29 people. At the time the engine control software was absolved of blame, although problems with it were known to exist. The Minister of Defense was quoted as saying of the software that 485 observations were made but none was considered safety-critical.
In recent weeks Channel 4 in Britain raised the question of whether or not there were actually serious problems with the software, via a leaked report from EDS-Scicon. This report listed 56 category-1 errors (most serious), which indicate either a coding error or non-compliance with documentation. A further 193 errors were listed as category-2 errors, which relate to the quality of the code. It was further alleged on Channel 4 that the RAF test pilots who develop operation procedures etc. for new aircraft refused to fly the helicopter. The aircraft was introduced into operational service, but with restrictions on load that do not apply to the Mark-1 version. The official line is that there is no shred of evidence to suggest that anything other than pilot negligence caused the crash. However, there is some possibility that another investigation into the crash may occur.
Stansfield Turner's new book includes near-war risk (R 19 43)
Admiral Stansfield Turner's book, Caging the Nuclear Genie, describes an incident that occurred on 3 June 1980 when he was President Carter's CIA director. Colonel William Odom alerted Zbigniew Brzezinski at 2:26 a.m. that the warning system was predicting a 220-missile nuclear attack on the U.S. It was revised shortly thereafter to be an all-out attack of 2200 missiles. Just before Brzezinski was about to wake up the President, it was learned that the "attack" was an illusion - which Turner says was caused by "a computer error in the system." His book makes various suggestions that would greatly reduce the threats of accidental nuclear war. "We have had thousands of false alarms of impending missile attacks on the United States, and a few could have spun out of control." [Source: Keay Davidson, San Francisco Examiner, in the San Francisco Sunday Examiner and Chronicle, 19 Oct 1997, p. A-17.]
Missile passes American Airlines Flight 1170 over Wallops Island. At 1:45pm on August 29, 1996, American Airlines flight 1170 was flying over Wallops Island, Virginia, en route from San Juan to Boston, when the captain reported "a missile off the right wing". The location is close to the Wallops Flight Facility (Section ), with nearby Navy installations at Norfolk and Lexington Park. 35
F-16 risky incidents involving TCAS. Several incidents involved F-16s and commercial airliners with Traffic Alert and Collision Avoidance System (TCAS).
1. On February 5, 1997, two Air Force F-16s closed on a Nation's Air Boeing 727 passenger jet heading for New York's JFK Airport. A TCAS alarm caused the 727 pilot to take evasive action, flooring three passengers and crew members. This occurred in a fairly large restricted area through which the 727 had been cleared to fly. One of the F-16 pilots had earlier identified the 727 as a passenger plane, but continued to chase it "as an intruder into his airspace". The instructor pilot told his trainee pilot to stay out of the way "till this, uh, bozo gets out of the airspace." He was eventually ordered to stop the chase, but "the command may have been delayed because the fighter pilot was on the wrong frequency" (according to the Air Force report).
2. On February 7, 1997, four Air National Guard F-16s from Andrews Air Force Base passed an American Eagle commuter plane bound from Raleigh to NY. Three of the F-16s were above the commuter plane, one below. A TCAS alarm caused the American Eagle pilot to take evasive action.
3. On the same day, two Air Force F-16s entered the safety zone around an American Airlines jet over Palacios, Texas.
4. On the same day, two Air Force F-16s entered the safety zone around a Northwest Airlines jet over Clovis, New Mexico.
The Air Force insists that none of these cases was a close call (that is, with less than 500 feet separation), and that such close encounters have happened routinely in the past without causing concern -- before the advent of TCAS. So, we can chalk this up either as an indication that TCAS works (albeit too well?), or as a failure of the Air Force to understand the risks of false alarms in someone else's safety system!36
Another TCAS incident. An erroneous command in TCAS nearly resulted in a midair collision on June 4, 1995, involving a United 737 and a Viscount Air Services 737, both on approach. After both aircraft received a TCAS warning, the United airplane began to climb from 10,000 feet to 12,000 feet while the Viscount plane started to descend to 10,000 feet. The two aircraft came within 200 feet of each other before controllers instructed the Viscount flight to return to 11,000 feet. The incident occurred in the "Northwestern portion of the U.S."37
***APPEND TO A FLY-BY-WIRE ITEM:*** 38
NY Air Route Traffic Control Center computer failure. The NY ARTCC computer lost significant service capability twice on the evening of May 20, 1996 -- the first time for 23 minutes, and the second time for about an hour, one hour later. The FAA had installed new software four days earlier.39
Power outage downs Pacific Northwest air traffic.
A technician accidentally pulled the wrong circuit board, cutting off all
power to an air-traffic control center for five minutes on January 6, 1996.
150 flights were delayed for more than an hour, throughout the Pacific
Northwest. Controllers used car cell-phones to communicate with pilots via
other air-traffic control centers. Backup power failed because the damaged
unit also controlled switchover.40
More ATC problems, fall 1998:
New air-traffic control radar systems fail, losing aircraft at O'Hare
(R 20 07); Dallas-FortWorth ARTS 6.05 TRACON gives ghost planes,
loses planes (one for 10 miles), one plane on screen at 10,000 feet
handed off and showing up at 3,900 feet! 200 controller complaints
ignored, system finally backed off to 6.04 (R 20 07); near-collision
off Long Island attributed to failure at Nashua NH control center
(R 20 11); TCAS system failures for near-collision over Albany NY
(R 20 11); two more TCAS-related incidents reported (R 20 12);
landing-takeoff near-miss on runway at LaGuardia in NY (R 20 13);
discussion on trustworthiness of TCAS by Andres Zellweger, former
FAA Advanced Automation head (R 20 13)
Dulles radar fails for half-hour 23 Nov 1998 (R 20 10);
discussion of air-traffic control safety implications (R 20 11),
and ensuing comments from a controller (R 20 12)
Computer glitches foul up flights at Chicago airports (Keith Rhodes)
The Chicago area TRACON in Elgin was testing new software on 5 May 1999 that displays aircraft sizewise. As a result of problems, there were serious traffic problems at O'Hare and Midway. Even after fixes were made, delays continued. United cancelled 25% of its afternoon flights, American 13%. [Source: Associated Press, 5 May 1999; PGN-ed; R 20 38]
Brief KC power outage triggers national air-traffic snarl (R 19 51)
Power went out at Kansas City's Olathe Air Route Traffic Control Center at 9:03a.m. CST on 18 Dec 1997, resulting in a "brief and supposedly impossible power failure" . A technician routed power through half of the redundant "uninterruptible" power system, preparatory to performing annual preventive maintenance on the other half. Unfortunately, he apparently pulled the wrong circuit board, and took down the remaining half as well. The maintenance procedure also bypassed the standby generators and emergency batteries. The resulting outage took out radio communications with aircraft, radar information, and phone lines to other control centers. Power was out for only 4 minutes, communications were restored shortly thereafter, and backup radar was working by 9:20a.m. However, at least 300 planes were in the Olathe-controlled airspace at the time, and the effects piled up nationwide. Hundreds of flights were cancelled, diverted, or delayed. There were delays of up to 2 hours, and delays continued into the evening. [Sources: 1. Matthew L. Wald, The New York Times, 19 Dec 1997; 2. Kansas City Star, 19 Dec 1997.]
The Times article noted that this is the latest in an "improbable series of problems". The NY Terminal Radar Approach Control (TRACON) was shut down almost completely on 15 Oct 1997, because of dust from ceiling tiles, and a similar situation occurred at the Jacksonville center. The TRACONs at Dulles and O'Hare were closed when fumes invaded the ventilation systems. A response from Bill Murray (R 19 52) suggested these events are not improbable at all.
Review of air-traffic control outages (Peter B. Ladkin)
Outages (complete failures, a distinct from degradation of service due to partial failures) of air traffic control computer systems, particularly those at the U.S. Air Route Traffic Control Centers (ARTCCs) have been a subject of continuing interest in RISKS (a keyword search on the archive showed well over a hundred references, many of which refer to partial failures or outages).
The U.S. National Transportation Safety Board (NTSB) prepared a report in January 1996 (NTSB/SIR-96-01) on ATC system outages, dealing with incidents between September 12 1994 and September 12 1995 and assessing the FAA's modernisation program. There is a significant `legacy' problem with some of the systems, and the scope of the FAA's Advanced Automation System (AAS), which has been in development since 1981, was significantly revised downwards when the contract for the `be all to end all' system was cancelled by the FAA in mid-1994 because of continual schedule slippage and cost overruns. The NTSB report discusses the architecture of the display systems in the ARTCCs, the nature of the outages (4 power failures, 7 computer problems), and the FAA's upgrade plans (which crudely amount to replacement of legacy systems in an evolutionary manner, rather than a redesign). In the 11 incidents, only one operational error (loss of separation) was reported, although all involved degradation of service (i.e. delays) ranging from the trivial (1) to the extreme (485). The report also notes that many controllers do not appear to be aware of the full range of functions still available to them during partial degradation. The board concludes that the system remains `very safe', even though the failures have a significant economic impact, but is concerned about the safety implications of the increasing number of failures of the older equipment.
The AAS is considered a `high-risk program' by the U.S. General Accounting Office (GAO), which has produced a series of reports, the latest from 1997 being on the WWW. A `high-risk program' is one at `high risk for waste, fraud, abuse and mismanagement' (!).
The NTSB report is now available on the WWW in the compendium `Computer-Related Incidents with Commercial Aircraft', which also contains links to the GAO reports: http://www.rvs.uni-bielefeld.de, click on `Computer-Related Incidents' then `U.S. Air Traffic Control Center Outages and the Advanced Automation System'.
[Other recent additions to the compendium include the Rapport Preliminaire of the French DGA on the A330 test flight accident in Toulouse (Mellor, RISKS-16.19; Jackson, Ladkin, 16.22, Ladkin, 16.23; Hollnagel, 16.31; Ladkin, 16.39); the final report on the Lauda Air B767 accident (Leyland, 11.78; Grodberg, Kopetz, Morris, Philipson, 11.82; Neumann, 11.84; Mellor, 11.95; Leveson, 12.16; Leveson, 12.69); the 1985 China Air B747 accident (Trei, 3.79); and the 1983 Eastern Airlines L1011 Common Mode Failure incident (not itself computer-related, but I believe relevant for understanding common mode failures resulting from imperfect maintenance, as in the 1996 Aeroperu B757 accident, Ladkin, 18.51; Neumann, 18.57; Ladkin, 18.59).]
I am very grateful to Hiroshi Sogame of the Safety Promotion Committee of All-Nippon Airways for his public service in preparing various of these and other reports for the compendium.
Boeing 777 alarms triggered by fruit/frog cargo. False alarms on the Boeing 777 have been triggered by unusual humidity and temperature conditions in cargo holds. For example, a London-bound Emirates aircraft was diverted to Cyprus, due to heavy-breathing mangos, and a Cathay aircraft was evacuated and the fire-suppression system activated -- due to a combination of fruit and frogs. Apparently, tropical fruit (and especially durian fruit) generates enough humidity to be detected as smoke -- thereby triggering the alarms.41
Aviation risks using Windows NT avionics systems (Peter B. Ladkin) (R 19 46)
An article `Windows added to cockpit choices' in Flight International, 5-11 November 1997, p 25, explains that the US company Avidyne has certificated an avionics system based on Windows NT. The hardware supplier is Electronic Designs, who has recently received approval from the FAA (approval for what is not specified). Avidyne is apparently working on Level-C approval, which will allow use of its moving-map display for IFR navigation. One of the benefits is said to be the wide range of interfaces available to other devices.
This is for general aviation. The first Supplemental Type Certificate (required FAA documentation for installation) is for a Mooney piston single.
One major drawback could arise from the hardware. As noted here earlier, the Pentium and Pentium MMX chips may be halted by execution of a single instruction in any mode, independent of any memory protection in the operating system. This instruction (in machine language) is F0 0F C7 C8 in hexadecimal.
If Electronic Design's box is Pentium-based, the FAA could therefore shortly be asked to certificate a design for IFR flight that can be halted in mid-use. Unavoidably. By a few lines of software that are trivial to write. I would hope I am not alone in feeling very uncomfortable about the precedent this might set for acceptance procedures for COTS products in safety-related environments.
This is a static bug, so programs are already available (see RISKS-19.45 for one) which sweep through your software to determine if this instruction is somewhere therein. But I wonder if the FAA will insist that Avidyne install such programs and make it a required part of the use of the equipment that this program is run as part of the pre-flight check before flight under IFR? However, even this does not guard against programs which dynamically generate this instruction.
For the history of a dynamically-generated instruction that halted the Shuttle flight-control software in 1981, recounted at length in the Communications of the ACM 27(9), September 1984, pp.874-900, see our compendium `Computer-Related Incidents with Commercial Aircraft' (http://www.rvs.uni-bielefeld.de).
Another air-traffic-controller spoofer. Someone using a hand-held, battery-operated transmitter gave out false information to aircraft landing at Manchester airport in the U.K.42
Radar blip lost Air Force One (Doneel Edelson)
The Federal Aviation Administration is investigating whether an air-traffic tracking system went out amid reports that Air Force One vanished from radar screens for 24 seconds. Broadcast reports said the airplane disappeared from radar screens on the morning of 10 Mar 1998 while President Clinton traveled to Connecticut. The long-range radar system at the center reportedly has a history of momentary blips. [USA Today, 11 Mar 1998]
Air-traffic control upgrade problems The Northeast Air Traffic Control Center in Nashua, New Hampshire, reverted to the old voice-and-paper-slip backup system for 37 minutes on 19 Aug 1998, because of a computer failure. 350 planes were being handled at the time. The system also failed again the next day. Over 100 system failures have been reported already this year at that center. William Johannes, president of the National Air Traffic Controller's Association, said, "It's like a Chevy with 485,000 miles on it and you are trying to stretch it. The longer it goes, the more times we are going to have failures." The mainframes ("aging equipment") are supposed to be replaced beginning in 1999, with a new display system expected in 2000. [Source: David Tirrell-Wysocki, Computer crash cripples New Hampshire air traffic controllers, Associated Press, 21 Aug 1998] (Are we hoping that the Y2K impact on the ATC system will last only 37 minutes?)
The attempted upgrade to ARTS 6.05 at the Dallas-FortWorth Air Route Traffic Control Center reportedly had ghost planes appearing on screens, and real planes missing from screens. Eventually, the FAA admitted there were problems and backed off to the previous version of ARTS 6.04 (R 20 07). But this is the same software in use experimentally in Chicago and several other heavy-traffic centers. The new software is supposed to solve the Y2K compatibility problem, as well as allow double-stacking of planes flying into Chicago's O'Hare (R 20 07). But controller complaints of malfunctions in the Airport Surveillance Radar-9 system forced Chicago to back up to the earlier version (which is non-Y2K-compliant). After the outage in the new system, a backup system was activated - but it had a 20-mile blind spot to the north. [Source: Gilber Jiminez, Chicago Sun-Times, 14 Nov 1998.] The new software was still being used in New York, Denver, and Southern California when we went to press. Of course, the standard statement is that "everything is perfectly safe" - although the increased stress on controllers should not be ignored. FAA Administrator Jane Garvey says not to worry, and she will fly cross-country on New Year's Day 2000. Several Internet wags suggested that will present no problems at all - because her plane may be the only one in the air at the time!
By the way, the Salt Lake ATC center lost both primary and backup radar for about a minute on 4 Nov 1998, with the blackout affecting 200 planes in the air over Utah, Nevada, Idaho, Montana, and Wyoming (R 20 05).
Western states ATC glitches. Radio communications between pilots and air-traffic controllers vanished for one minute on August 11, 1995 (until the backup system could be engaged), over a 200,000 square-mile area including all of Washington state and parts of Oregon, California, Nevada, Montana, and Idaho. The problem resulted from a software glitch in a 2-month-old $1.4 billion computer system at the regional center in Auburn, Washington. "The FAA says the new system, which replaces one dating from the 1950s, is more reliable and flexible, safer, easier to repair and provides better voice quality when controllers talk to pilots."43
More air-traffic control problems. Further air-traffic-control snafus occurred in Chicago, Miami, Washington DC, Dallas-FortWorth, Cleveland, New York, and Pittsburgh, and Oakland, California, in a very short period in the summer of 1995. These cases are documented in the RISKS archives. There were three outages in the Chicago center in one week in July 1995. In one of two Oakland outages, on August 9, 1995, the ATC system lost all radar and radio contact with airborne planes, during maintenance. In Miami on August 12, 1995, lightning knocked out the main power and the backup for more than an hour. Chicago failed again on September 12, 1005, and Oakland twice more on September 13, 1995, when a microwave link failed. Pittsburgh briefly lost radio and radar contact on September 23, 1995.44 The main systems are in many cases over 30 years old, and the backups even older.
Aeroperu 757 crash. The fatal crash of Aeroperu Flight 603 was blamed on the fact that masking tape used in maintenance had not been removed from the left-side static port sensors.45
Korean Airlines KAL 901 accident in Guam
The Guam KAL 801 crash killed 225 of 254 on board. A bug was uncovered in upgraded software that had existed worldwide (R 19 29), relating to incorrect barometric altimetry in the Ground Proximity Warning System (GPWS). See a detailed analysis by Peter B. Ladkin and other discussion (R 19 37-38).
Dominican Republic 757 crash. Investigators are investigating the February 6, 1996, Boeing 757 flight that ended in the ocean, killing all 189 people aboard. Early reports suggest that the disaster may have been due to a faulty airspeed indicator that misled pilots, leading them to believe that their speed was adequate when they were flying at 7000 feet.46
Airbus autopilot failure? (Chuck Weinstock)
On 19 Apr 1999, an Air India Airbus 320 en route from Singapore to Bombay via New Delhi apparently had an autopilot failure at 27,000 feet, resulting in a dive that injured three crew members (two seriously) and an infant. The pilot was able to regain control, and manually flew the jet to Bombay. [Source: AFP, 19 Apr 1999]
Another London train crash. A London commuter train carrying about 400 passengers from Euston Station crashed into an empty train heading into Euston Station, killing one passenger and injuring about 100 near Watford Junction in Hertfordshire, 20 miles north of London, in the afternoon rush-hour on August 8, 1996. Signaling and train systems apparently worked properly.47
Stack overflow shut down new Altona switch tower on its first day. Klaus Brunnstein reported that on Sunday evening, March 12, 1995, the Bundesbahn (German Railway) attempted to replace its old railway switch tower at the heavily used Hamburg-Altona station, installing a fully computerized system from Siemens' railway technology branch. However, the central computer failed immediately. Two days later, Siemens' experts finally identified a stack-overflow condition that resulted in a deadloop, and the bug was finally fixed by Wednesday morning. Nevertheless, because the switchmen were not accustomed to the new system, there was still only restricted traffic days later. Apparently, the programmers had assumed that the stack-overflow routine would never be used!48
S-Bahn stopped by new switching software. Debora Weber-Wulff reported that in October 1996, the Berlin S-Bahn installed new light-rail switching software on the same weekend that the light rail was moved back from the regular train track to its own tracks -- which had been under repair. The tracks were cut off all weekend, with buses attempting to move passengers. The software was installed at a central switching board, so that the transportation company can save the money they would otherwise pay people to manually move the switches. The software kicked in, and all went well until rush hour --when a stack overflow occurred, as in Hamburg! (Siemens also wrote this software -- perhaps it was the same code?) It took hours to get the system back up.49
New York City subway crash. A New York City subway train crashed into the rear end of another train on the Williamsburg Bridge on June 5, 1995. The motorman apparently ran through a red light. The safety system did apply emergency brakes, as expected. However, the safety parameters and signal spacing were set in 1918, when trains were shorter, lighter, and slower, and the emergency brake system could not stop the train in time.50
Amtrak mainline train collision in Maryland. A train wreck occurred in February 1996, when an Amtrak train leaving the Washington area was switched around a stopped freight train; it then had a head-on collision with an inbound MARC train that had failed to slow for a warning signal and was going twice its expected speed. The warning signal for the inbound train had previously been moved to a position before the station stop, from its earlier position after the station.51
Amtrak ticket system breaks down. On Friday, November 29, 1996, Amtrak's nationwide reservation and ticketing system was rendered almost useless by a breakdown in the network, during the Thanksgiving weekend -- usually the heaviest travel weekend of the year. The outage caused enormous confusion and delays, because agents typically had no printed schedules and fare tables, and had to issue tickets by hand!52
Washington D.C. Metro crash kills operator. In the monster Washington D.C. snowstorm in early February 1996, a Metro train operator was killed when his train ran into the back of a parked train at the Shady Grove station, while he was taking the train out of service. There was considerable early confusion about whether the train was running on automatic, whether the operator had requested cutover to manual control, and whether that request had been denied. Apparently the request was made and denied, on the grounds of conforming to standard practice. That standard practice has now been changed.53
MARTA train jumps track. On June 1, 1996, a commuter train operated by the Metro Atlanta Regional Transit Authority (MARTA) had a car leave the track, causing injuries to 19 people and much embarrassment for the "Official Spectator Transportation System" for the Olympic games. According to local TV news and newspaper reports, the train had stopped before a red signal, apparently on automatic control. The operator called dispatch, requesting permission to go to manual. Permission was granted, and the operator proceeded through the red signal -- setting off alarms. The train was stopped and put into reverse. As one of the middle cars passed over a crossover switch some or all of its wheels were lifted and displaced. The train stopped very suddenly, tossing the operator and 18 passengers from their seats.54
Trains fail to trigger computerized crossing gates. The Long Island Rail Road tested three level crossings after a train passed one of them and its driver had noticed that the gates did not operate. These three crossings in Sayville all use the same computer system and are the only such systems on the LIRR. The failure proved to be reproducible at two out of the three.55
Union Pacific rolling (?) stock (Daniel P. B. Smith)
Following Union Pacific's assimilation of Southern Pacific, to form the nation's largest railroad, UP has been unable to accurately track its freight cars, resulting in gridlocks and lost trains - most visibly in the southern corridor from LA to Texas, the Gulf Coast region, and the central corridor from Oakland to Chicago. There are major bottlenecks in LA, North Platte, Chicago, and Houston. Integrating the computer systems was reportedly "more difficult than anticipated."
There are many horror stories, including a load of liquid gas that had "virtually evaporated into thin air by the time it arrived;" it took 51 days to ship a load of plastic resin from Dallas to Forth Worth; a shipment from Memphis to California by way of Little Rock, then Memphis, then Little Rock, then Memphis, then Little Rock, then El Paso... Mr. Lundgren of Englin Cotton Oil Mill reported watching one of his own freight cars on UP tracks barreling past his office. "A few days later, he saw it pass again in the opposite direction." [Culled by PGN from Daniel's submitted item by Anna Wilde Mathes and Daniel Machalaba, Wall Street Journal, Monday, 13 Oct 1997, p. B1, and another detailed item by Carl Nolte and Kenneth Howe, San Francisco Chronicle, 11 Oct 1997, D1. Massive grain backlogs and storage problems were also noted (R 19 43).]
Computer crash impacts Washington DC Metro (Epstein Family) (R 19 50)
According to The Washington Post (17 Dec 1997), a computer failure caused 20-minute delays on Metro's Red Line. "The problem occurred when workers in Metro's downtown central control room tried to add an accessory to the main computer that monitors trains' positions. The computer crashed and came back on line only when the accessory was detached, Metro officials said." No indication what the "accessory" was or why it caused a crash. Washington D.C. Metro stops payments on troubled computer (Scott Lucero) (R 19 71)
The Washington Post (29 April 1998) reported that Washington DC's Metrorail stopped payment on a system that pinpoints the position and operation of every train in the 92-mile system and controls 470 switches and 500 signals. Metro officials say that the system has crashed 50 times in the last 15 months. Screens go black, images jiggle, duplicate train numbers and slow response occur frequently according to officials. According to the Metro General Manager, "First we couldn't get the source code from [the contractor]. Then when we got it, it was in foreign language because they had a contractor work on it overseas... They've had people come and go. There has not been total continuity." A familiar RISK, not having developers close to the system. I used to think that not having the escalators work was a big deal - it appears they've got bigger problems.
Runaway train on Capitol Hill (Thomas A. Russ) (R 20 13)
There is a runaway train on Capitol Hill. The automatic brakes on the Senate subway between the Russell Office Building and the Capitol failed in December 1998, sending the train crashing into a wall and slightly injuring the operator and the two other people on board. In the best congressional spirit, a spokesman for the architect stressed that "there was no operator fault involved. It's all automatic, and it's supposed to stop by itself." [Source: Los Angeles Times, 16 Dec 1998. Especially intriguing are the spokesman's comments. There is also the nagging question of why there is an operator on a fully automated system in the first place. TAR]
Taipei subway computer crash. Taipei's only subway-line service was completely disrupted on June 3, 1996, due to the simultaneous shutdown of both the main computer and the backup system. At 9:27 a.m. that morning, the main control computer suddenly printed out 14 pages of extraneous program code. Eight minutes later, both the main control computer and the backup system went down. The control center ordered an emergency shutdown of the entire system (without incident). Maintenance engineers, with the help of a Matra engineer, were unable to reboot either system. Digital engineers arrived shortly and discovered that one of the rebooting programs was missing. They reloaded the rebooting program from backup media, and the subway system returned to normal after four hours and thirty-four minutes. (Incidentally, Matra also made the software for Ariane-5, whose crash the very next day is noted in Section .)56
BART ghost train, software crash, system delays; old cable. Bay Area Rapid Transit (BART) had another bad day on December 19, 1996. At 7am, a ghost train appeared in the computer system at the San Francisco 24th Street station, requiring manual operation through that station. Independently, three trains had to be taken out of service because of mechanical problems. All of this caused a 15-minute delay systemwide. Later, a computer crash caused delays up to 30 minutes systemwide, from 5:50 p.m. to 9:45 p.m.
BART also had a serious power cable outage in the transbay tunnel on December 12, 1996. That cable problem was traced to sloppy maintenance after the cable was damaged when it was initially installed in the early 1970s. BART management observed that prior to that outage a complete cable overhaul had been considered to be an urgent step in upgrading the aging infrastructure.57
Channel Tunnel Syndrome: unexpected ghost trains and emergency stops. On June 8, 1994, a train traveling through the Channel Tunnel from England to France was evacuated after an emergency light came on in the driver's cabin. The drivers of the 10 lorries on the train were evacuated to the English end of the Tunnel, through the access tunnel. This was the first would-be emergency on the Chunnel (which officially opened in May 1994), although it turned out to be a false alarm.58
Apparently, unanticipated high levels of sea water on the tracks in the Chunnel triggered alarms to drivers and train controllers, forcing an emergency stop and manual inspection -- typically causing a 20 minute delay. In April 1995, they averaged about 5 emergency stops a week, out of 100 trains a day. The action of the train at 100 mph going through the chunnel raises a mist of salt water behind it, which short-circuits a low-voltage connection between the rails, and mimics the appearance of a train on the tracks. It appears that engineers have underestimated the effect of sea water, an excellent conductor of electricity, on trackside electronic equipment. John Wodehouse suggests they may also have underestimated the corrosive effects of salt water.59
Phantom trains down Miami's Metromover inner loop. The downtown 1.9-mile inner loop of Miami's Metromover was closed for more than two days because of "phantom" trains on the track, until the afternoon of April 26, 1995. Metro-Dade Transit Agency technicians attributed the problem to a faulty transmitter in a computer. "Phantom" trains have been a recurring Metromover glitch, one of a long string of computer-related problems plaguing the system and that are likely to continue. MDTA disclosed that in the spring and fall of 1994, sunshine sometimes trips safety sensors that detect the presence of trains. Those sensors have been realigned to shield them from the sun.60
Computer crash halts train traffic in 8 states. A computer crash caused various effects on CSX rail service, freezing passenger and freight trains in their tracks for 2 hours in 8 states in the Southeast US, during the evening rush hour on March 27, 1995. This affected 2100 Amtrak passengers and 5000 Tri-Rail commuters in south Florida, and freight trains from Louisiana to North Carolina. Service was restored "under human direction".61
***Section 2.5.1, Insert at end (p.56) before the last sentence.***
The Big One belittled? A similar roller-coaster accident occurred in England on July 7, 1994, on The Big One -- the world's highest and fastest roller-coaster, at Blackpool's Pleasure Beach. Two trains on the new £12 million ride collided 30 feet above ground. Eight passengers had to be cut free, trapped by jammed safety bars. (The bars worked correctly.) 27 people were taken to hospital with minor injuries, while others were treated for shock. One train (going much more slowly than its top speed of 85 miles an hour) collided with the rear of another, which had been slowed by the braking system. Earlier, on the roller-coaster's first day on May 28, 1994, 30 people were trapped 235 feet up after a fault in the computer system.62
Mad-bus disease (Geert Jan van Oldenborgh) (R 19 40)
Nine people were injured, one seriously, when a Dutch long-distance bus suddenly accelerated from the bus terminal behind Eindhoven Central Station, and ran into the station restaurant. The builder acknowledged that these sudden accelerations were a known problem, he suspected that it had something to do with interference on the electronic accelerator pedal by the communications equipment, the 2-way radio, the mobile telephone and/or the little box which operates traffic lights. No technical shortcomings had been found in previous inspections, but the busses still careen out of control every now and then... The worst-affected 22 out of 178 have now been taken out of service. [source: NRC Handelsblad, 25 and 26 sep 1997].
Two out-of-band comments: in case you wondered, a long-distance bus is defined locally as one that goes more than 50km. The linear dimensions of our country are about 200km... Secondly, with regards to the computer-operated storm-surge barrier I reported on earlier, a week later it transpired that the software was not yet ready in fact, and would become operational this autumn. Until then a human would decide when to close off Rotterdam harbour. Fairly typical I assume... GJ
Bright Field crash in New Orleans. According to John Hammerschmidt of the National Transportation Safety Board, preliminary investigations into the freighter Bright Field crashing into the Riverwalk in New Orleans in 1996 suggest that an oil-pump failure caused the ship's computer to automatically reduce speed. A standby pump kicked in, but under reduced power the ship's maneuverability was decreased. The impact cut a 200-foot swath into shops and a hotel condominium complex, and the pedestrian walkway. A language barrier between the Chinese-speaking captain (and crew) and the English-speaking pilot reportedly may also have contributed. The Liberian-registered 69,000-ton ship was not equipped with a U.S.-recommended voice recorder, and a second voice recorder was not functioning. Coast Guard Captain Gordon Marsh confirmed that large ships lose steering power as often as once a week. Michael Quinlan noted, "The captain also acknowledged forgetting he had a computer override button on his console that could have allowed him to bypass the computer and increase the ship's speed and maneuverability."63
The Royal Majesty. The cruise ship Royal Majesty ran aground off Nantucket in 1995. The explanation ultimately given is that the GPS antenna failed and the alarm was not loud enough to alert the crew to switch to Loran.64
Denver hi-tech baggage handling problems. The opening of the new Denver airport was seriously delayed (with losses estimated at $1 million each day), primarily because of difficulties in getting the $200 million automated baggage-handling system to work adequately. With about 100 computers, there were mechanical problems and some software glitches.
*** ADD TO EXISTING ITEM on Seattle Evergreen Drawbridge: After a second incident involving a death, the Evergreen draw span was rebuilt in 1994. The old mechanical system has been replaced by computer controls with a series of safety features that must be manually overseen by the bridge operator.65
Massive failure of Washington D.C. traffic lights. Most of the traffic lights in downtown Washington D.C. went onto their weekend pattern (typically 15 seconds of green per light), rather than their rush-hour pattern (typically 50 seconds of green per light) during the morning rush hour on May 8, 1996. This problem was reportedly caused by a new version of software installed in the central control system. This caused mile-long traffic jams. By the afternoon rush hour, the software glitch had been "fixed". It wasn't clear whether that meant they reloaded the old software or fixed the bug.66
Computer malfunction floods Boulder garages and basements (S.J. Hutto) (R 19 34)
"Officials blamed a malfunctioning computer for five water main breaks late Saturday that cut service to about 40 homes, flooded basements and garages and turned city streets into rushing streams." A computer controlling water pressure gave inaccurate readings, prompting a city worker to open up the mains. [Source: Rocky Mountain News, 25 Aug 1997]
*** Section 2.8, replace last para, Tempo AnDante, p. 67, with
Tempo AnDante? The crawl of two robots. The two Dante robots provide a saga of what can go wrong in a hostile environment. Dante I was descending for exploration inside the Mount Erebus volcano when its fiber-optic control cable snapped only 21 feet from the top of the volcano, immobilizing the robot.67 Dante II, its successor, was much more successful in its August 1994 exploration the volcanic crater of Mount Spurr in Alaska after the 1992 eruption, and determined that the volcano would be safe for humans. However, its descent was marred by falling rocks, mud and snow, prior to which its dish antenna had been chewed on by a bear. It survived a power loss, a dead transmitter, and a moisture-induced short in its power-communication tether. However, its ascent was stopped when one of its octopods failed. A helicopter hoist failed when its tether snapped -- perhaps wrapped around a very sizable boulder. It was finally rescued with human intervention, although with injuries to one graduate student and six of Dante II's legs.68 Five-million-dollar bug? A Tokyo University research team is implanting electrodes in cockroaches to see if their movements can be remotely controlled. However, the controls themselves still have bugs.69
Programmed tunnel-digging robot runs a-muck. A tunnel-digging robot "mole" uses programmed directional coordinates to chew through 70 feet of soil a day. Sewer diggers in Seattle were surprised when the mole did not reappear at its expected exit point, and Anthony Catania was suspicious when his restaurant-supply store began to shake. The misprogrammed mole left a 700-foot hole that had to be filled with concrete, at a cost of $600,000. (The 18-foot-long mole costs $475,000.)70
Electrocauterizer EMI alters pacemaker. Carl Maniscalco reported that an acquaintance had received emergency care in a hospital after accidentally pulling out her dialysis shunt. The attending physician had been informed that she had a pacemaker, but used an electrocautery device in an attempt to stop her bleeding. The electromagnetic interference from the device apparently corrupted the software in the pacemaker. When the problem was finally detected, the manufacturer was able to reprogram the pacemaker, using data transmitted to the still implanted unit as audio tones via a transducer.71
RF EMI turns into pacemaker life-saver. In contrast with the above cases of harmful EMI effects on pacemakers, here is a beneficial one. A 42-year-old man of The Hague in The Netherlands collapsed in front of a swimming pool when his pacemaker failed. A police officer in the vicinity radioed for help - upon which the pacemaker started working again, because of the radio-frequency interference. The officer was able to keep the man alive by using his transceiver until an ambulance arrived.72
More on RFI effects medical equipment. Radio-frequency interference generated by radios and cellphones has also been known to mess up sensitive medical equipment such as heart defibrillators, diagnostic equipment, and even electric wheelchairs. There is a report of an electric wheelchair "zapped by radio waves" that sent its passenger over a cliff.
A 72-year-old man died in an ambulance when the heart defibrillator device he was on failed due to RFI from the ambulance two-way radio. The ambulance manufacturer had replaced the steel roof with a fiberglass dome, and put the antenna on top.
In a case in which diagnostic equipment indicated a man needed a pacemaker, it was later discovered (after the operation) that the diagnosis had been in error because of RFI from a television set in the same room.
A cellphone used by a mother in the front seat of a car affected the ventilator her child was using in the back seat.
In a hospital ward, various ventilator alarms were triggered whenever the handyman keyed his transceiver.73
Harvard Pilgrim HMO scheduling system creates chaos. The scheduling computers of the Harvard Pilgrim Health Maintenace Organization "broke down" on March 4, 1996, and were unavailable for several days. Nonemergency patients needing to make appointments had to wait until the computers were again available. The medical records system was also down for seven hours on March 6. Harvard Pilgrim indicated this was a "standard database problem." (Terrific standard!)74
Millstone 2 safety risks. Northeast Utilities reported that it had failed to follow proper safety procedures on two occasions in April 1994 at its Millstone 2 plant in Waterford. On April 23, an indicator showed that some of the control rods were stuck. The crew concluded that the problem must have been with the indicator and left for the day. When the new crew arrived, they discovered the rods were indeed stuck, but they failed to shut down the reactor as quickly as they should have and underclassified the seriousness of the event. failed a Northeast Utilities test on reactor theory and were removed from duty for training. The utility's report blamed the problem in part on the operators' failure to understand reactor theory and a failure of plant management to "fully appreciate the implications" of the safety-related event and to provide sufficient oversight.
The second incident involved a coolant leak from the plant's reactor. In this case, the operators again underclassified the seriousness of the event. Notification of federal authorities was delayed by 16 hours.75
Xerox machine caused nuclear-power plant emergency halt. One of the Swedish nuclear reactors, Ringhals 4, was automatically shut down during a routine safety check. When the computer safety system noticed that the instructions were incomplete (because a page had been truncated when copied), it shut down the reactor.76
Western U.S. power blackouts. More than a dozen states including California, Oregon, Washington, Utah, Nevada, Wyoming, Arizona, reported power outages on July 2, 1996. At least 11 separate power plants "inexplicably were knocked off line". Later in the day, plants in Rock Springs, Wyoming, and along the Colorado river also went off line.77 On the following day, parts of Idaho were again blacked out.78
It took until July 20, 1996 -- 18 days later -- for the official cause of the July 2 outage to be announced: an Idaho transmission line that short-circuited when electricity jumped from a low-hanging wire to a tree that had grown too close. The tree, since removed, caused a flash-over in an area about 100 miles east of the Kinport substation in southeastern Idaho. The line carried 345 kilovolts.79 Reportedly, the indication of the initial outage was detected but not relayed on to the appropriate authorities -- because the operator could not find the correct phone number.
On August 10, 1996, there were further outages that affected 8 million accounts in 8 states, parts of Canada and Baja, with major outages, including propagating air-traffic effects.
On August 13, 1996, electricity for the city of Palo Alto was shut down due to erroneous signal sent by a neighboring power company in mistaken anticipation of a power surge.80
Misdirected phone call shuts down local power. Mike Winkelman reported that power went out for an hour and one-half for about half of his town of 38,000 when an apparently automated phone call to shut down a power station was directed to the wrong substation.81
Effects of another San Francisco power outage (R 20 11)
At 8:15 a.m. on 8 Dec 1998, a power surge resulted from an attempt to reconnect a San Mateo power substation to the grid after maintenance. Unfortunately, the temporary grounding had not been removed, providing a massive short. This knocked at least two other power plants off line, and affected about 1 million customers in the San Francisco Bay Area - many for two or three hours, some for up to 8 hours. The blackout took down the SFO Airport, the Pacific Stock Exchange, rapid transit, and ATMs, as well as homes, offices, and hospitals. There were reports of people stuck in elevators and problems with home medical equipment. SFO was back up by 9:45 a.m. with emergency generators. SRI International experienced only a power blip, but it was enough to wipe out a bunch of servers throughout the institute; our lab's computers were down for more than two hours. [See a well-informed follow-up discussion by Cathy Horiuchi (R 20 12).]
The widespread consequences of this local outage give us one more reminder (if we need any) of the importance of routine preparedness for foreseeable but not adequately foreseen events. Natural causes tend to surprise us; the possibility of Y2K-related outages should no longer be a surprise.
How a fuse caused a hospital to disconnect from the power grid (Joan Grove Brewer) (R 20 11)
In April 1998, the Valley Medical Center in Renton, Washington, attempted to cut over to its new power cogeneration plant, independent of the local utility's power grid. The staff was apparently not adequately prepared, because it had assumed the cutover would be seamless. Initially, the hospital indeed ran smoothly, but then lights began to flicker, ventilation fans cut out, alarms beeped, and computer screens blinked on and off. [Source: How a $5.9 million power plant brought a hospital to its knees, by Byron Acohido, Seattle Times staff reporter, The Seattle Times, 6 Dec 1998, PGN Abstracting] Power outage leaves hospitals in the dark (Dave Weingart) (R 20 25)
On 10 Mar 1999, two of the three hospitals that make up Long Island Jewish Medical Center in Long Island, NY were without power for a period of 47 minutes, starting at 5:58pm. Patient care was apparently not impacted, although 2 operations were completed by battery-operated lights, and bags of ice were hauled from the cafeteria to the blood bank to keep things cold. Life-support equipment has an internal battery backup and kept functioning during the outage. Investigations are underway to determine why none of the four backup generators worked.
Kids, let this be a lesson. It's not enough to have a backup system in place; you need to make sure it will work when needed.
Rats take down Stanford and Net connections. Stanford University was without power on October 10-11, 1996, because of a-gnawing rats and a subsequent explosion. This outage also disrupted the BBN Planet Internet hub, affecting Net connectivity for many Silicon Valley companies, and the Websites of the Los Angeles Times and San Francisco Chronicle.82
Rat brings down U.C. Berkeley campus. The entire campus of the University of California at Berkeley was blacked out for almost 6 hours on August 12, 1994, when a rat bridged a power connector. Backup facilities were able to provide limited emergency power during that period.83
Squirrels of the World, Unlight. SRI International experienced its fourth recorded squirrelcide -- which brought down the entire institute on October 12, 1994, for something like eight hours, and created all sorts of internal power surges, despite the isolation supposedly provided by our cogeneration plant hookup. My monitor was fried.84 A 5th squirrelcide at SRI subequently caused an 18.5-hour institute outage, knocking out both utility and cogeneration power (R 19 96).
Another squirrel tail in Washington State. A squirrel shorted itself between 69,000-volt and 12,000-volt lines on December 13, 1995, and brought down the "high-tech financial hub of Southeast Washington" - affecting 4000 downtown customers, and causing an explosion and fire inside Pacific Power's central substation.85
Squirrels bring down Nasdaq. Nasdaq trading was shut down by an energetic squirrel who apparently chomped on a power line near the stock market's computer center in Trumbull, Connecticut on August 1, 1994. The system failed to perform the automatic switchover to the temporary backup power supply (designed to last until the backup system in Rockville, Maryland, could be brought up), and consequently the market was down for 34 minutes. A similar problem occurred in December 1987.86
Snail causes Liechtenstein's cable TV system to fail. Soccer fans in Liechtenstein were unable to watch the final minutes of a soccer match between the French team of Auxerre and Switzerland's Zürich Grasshoppers when a snail crawled into a socket. The resulting short-circuit caused the entire cable TV network of Liechtenstein to fail in October 1996.87
"Buffer overload" crashes network bridge. Jeff Anderson-Lee reported on the custodians at Berkeley during the summer of 1996 plugging in their heavy-duty floor buffers, which tended to blow the archaic circuit wiring. Instead of resetting the breaker, they kept trying other outlets. As a result, the network bridge on that circuit was put out, and the two halves of net were cut off from each other half. The custodian who had been trained not to do this was on sick-leave.88
$25m Australian power system runs amok. Failure of an automated system for a Queensland power station (requiring twice as many engineers as the previous system) caused more than $1.5 million damage to machinery at the Swanbank station near Ipswich when the system failed to prevent a trip (shutdown) cutting oil flow to a turbine, which resulted in a bent shaft and left the turbine with reduced generating capacity. An automatic alarm system failed, almost two years after it was installed. Improper testing and waiving of commissioning and acceptance testing were implicated.89
Power outage in Russian missile site The plug was reportedly pulled at a major Russian missile site, because their electric bill had not been paid.90
As the year 2000 approaches, the risks of calendar-clock problems looms large whenever two-digit year fields are used. The number 99 is larger than 00, not smaller, and we can expect all sorts of computer calendar date-time arithmetic to fail whenever the relative order of dates is considered. For example, COBOL programs use a two-digit year field, and COBOL programmers are increasingly scarce. Consequently, some folks are in panic, whereas others have a while longer to plan ahead (MS-DOS bellies up on 2048 Jan 01 and the programming language Ada has a time_of year field that is exhausted after 2099). Some folks believe they are really immune - such as users of Java, which runs out of dates in the year 292271023. As noted in Section , some systems have already run out or will soon (Tandem CLX; Apollo workstations exhaust their time fields on November 2, 1997; the Global Positioning Satellite GPS on August 21, 1999; Ed Ravin noted the Fujitsu model SRS-1050 ISDN display phones had their clocks stop at 1994 Sep 30 11:59 PM.91), some later. Pundits are creating estimates of how much it will cost to fix all of the software that is expected to die, beginning at the transition on midnight from 1999 Dec 31 to 2000 Jan 1. A figure of $300 to $600 billion (thousand million) has often been quoted as the estimated worldwide cost. $30 billion was cited as the cost to the U.S. Government, with the prognostication that 30% of the systems would not be fixed in time. Consumers Power Co. in Michigan estimated that their upgrade (begun in 1993) would cost up to $45 million. The average Fortune 500 company was expected to spend $100 million.92
Some effects were already being felt at the end of the 1990s, as systems were unable to handle expiration dates into the the 2000s. Scot E. Wilcoxon noted that a Minneapolis newspaper pointed out that five-year planning programs were already at risk in 1995. John Cavanaugh recalled seeing a Computerworld article in 1975, when some programs that did projections 25 years ahead started failing.93 In the United Kingdom, the Department of Social Security in 1996 postponed the ability of divorcing couples to split their pensions until the year 2000, because of the effects on the computer databases.94
Some lawyers are drooling over the expected lawsuits. Some hucksters are selling easy solutions. There is even a report of a Year-2000 Shark who was scamming businesses by offering to fix credit-card systems that allegedly would not work on a card with a year-2000 expiration date.95
*** ADD TO LEAP-YEAR SECTION ***
Leap-day 1996 in New Zealand. A computer glitch at the New Zealand Aluminium Smelter plant at Tiwai Point in New Zealand (South Island) at midnight on New Year's Eve 1996 left a repair bill of more than NZ$1 million. Production in all the smelting potlines ground to a halt at midnight, when the computers unexpectedly all shut down. General manager David Brewer said the failure was traced to a faulty computer software program that failed to account for 1996 being a leap year: the computer was not programmed to handle the 366th day of the year. The same problem occurred two hours later at Comalco's Bell Bay smelter, in Tasmania, Australia. (New Zealand is two hours ahead of Tasmania.) Both smelters use the same program, which was written by Comalco computer staff. Before the Tiwai problem could be fixed that afternoon, five cells had over-heated and were damaged beyond repair. Mr. Brewer estimated the replacement cost at more than NZ$1 million.96
All the News That Fits We Print: No-Op-Ed. On July 10, 1995, Simson Garfinkel gave me a copy of The New York Times Op-Ed page from that day's National Edition. The page was mostly blank, with a nicely black-boxed obit-like message: "TO OUR READERS, Because of a computer breakdown, some copies of The Times were printed without the Op-Ed page."97
Logic flaws. There was lengthy discussion in the on-line RISKS relating to the Pentium floating-divide chip flaw that resulted from a table incorrectly copied from the Intel 486 chip design.98 A flaw in the Intel Orion 82450 chipset (an auxiliary to the Pentium Pro) was also discovered, although it affected performance and not correctness.99 Jim Haynes recalled earlier floating-point flaws in the early VAX 11/780s and the General Electric 635.100 Chris Phoenix noted a software flaw in the built-in BASIC on the TI 99/4A computer.101
Microsoft mathematics bugs: Calculator and Excel. For several years a mathematics flaw existed in the Calculator applet that came bundled with Microsoft Windows. This remained uncorrected in several releases over a considerable period of time. A new flaw surfaced in Microsoft's Excel spreadsheet: type or paste 1.40737488355328 into a cell and you will be rewarded, not with the number you expect but with 0.64. If you perform arithmetic with this, it will act as if 0.64 had been entered -- so it is not simply a display error. When the number is used as part of a formula, the error is not apparent.102
NEW SECTION, COMBINING WHAT IS CURRENTLY IN SECTION 5.7 on accidental financial losses. Social Security Administration problems. The SSA botched a software upgrade in 1978 that resulted in almost 700,000 people being underpaid an estimated $850 million overall, as a result of cutting over from quarterly to annual reporting.103 Subsequently, the SSA discovered that its computer systems do not properly handle non-Anglo-Saxon surnames (for example, with spaces as in de la Rosa, or that do not appear at the end, as in Park Chong Kyu) and married women who change their names. This glitch affected the accumulated wages of $234 billion for 100,000 people, some going back to 1937.104
Glitch causes 4 billion euro overdraft (Monty Solomon) (R 20 30)
Although the January switch to the single European currency was smooth at most European banks, a prominent German discount bank and its customers this week were acutely aware that not all possible euro-caused glitches have been found. Customers of Bank 24, a discount bank owned by Deutsche Bank AG, were astonished [on 6 Apr 1999?] to find that their securities accounts appeared to be overdrawn to the tune of 4 billion euro ($4.32 billion). An oversight connected to the change to the euro was responsible for the error, affecting 55,000 customers. (Source: Mary Lisbeth D'Amico, IDG, 12 Apr 1999)
Nasdaq Computers Crash. The U.S. automated over-the-counter Nasdaq marketplace went down for 2.5 hours on the morning of July 15, 1994, when the computer system died. (It was finally restored just before N.Y. lunchtime.) The problem was traced to an upgrading to new communications software. One new feature was added each morning, beginning on Monday. Thursday's fourth new feature resulted in some glitches, but the systems folks decided to go ahead with the fifth feature on Friday morning anyway -- which overloaded the mainframes (in Connecticut). Unfortunately, the backup system (in Rockville, MD) was also being upgraded, in order to ensure real-time compatibility. The backup died as well. The backup system is "really for natural disasters, power failures, hardware problems that sort of thing," said Joseph R. Hardiman, Pres and CEO of Nasdaq. "When you're dealing with operating software or communication software, it really doesn't help you." Volume on the day was cut by about one third, down from a typical 300 million shares. The effects were noted elsewhere as well, including several stock indexes, spreading to the Chicago options pits, trading desks, and the media. That in turn affected the large stock-index mutual funds.105 (Squirrel-caused Nasdaq outages are noted in Section 2.10.2.)
NY Stock Exchange halted for one hour. The New York Stock Exchange opened an hour late on December 18, 1995, after a weekend spent upgrading the system software. At 9:15 a.m. on Monday, it was discovered that there were serious communications problems in the software between the central computing facility and the specialists' displays. The problem was diagnosed and fixed by 10:00 a.m., and the market reopened at 10:30 a.m. It was the first time since December 27, 1990, that the exchange had to shut down. The Chicago Mercantile Exchange, Boston Stock Exchange, and Philadelphia Stock Exchange all waited until the NYSE opened as well. (The monster snow storm on January 8, 1996 subsequently caused a late start and an early close.)106
Alberta Stock Exchange shutdowns. For the second time in six sessions and the third time in 1997, the Alberta Stock Exchange lost a day of trading on March 11, 1997, because of system problems. Fixing the software took all day and night. Previous software errors had stopped trading all day on March 4, and earlier, in May 1996 and January 1997.107
Johannesburg Stock Exchange computer failures. The Johannesburg Stock Exchange's automated trading system (JET) began fully automated trading on June 10, 1996. A failure on July 1 was attributed to "human error." On July 22, 1996, the system and the backup system both failed after only forty minutes trading and were down for the rest of the day.108
Washington Post runs old stock prices; file-name confusion. The Washington Post printed a full page of old stock prices in their business section in late December 1994, because a space was left out of a file name.109
NASD loses records on 20,000 brokers. The National Association of Securities Dealers (NASD) is the self-regulatory organization that oversees broker-dealers and their employees in the United States. It maintains a database of brokers and any disciplinary actions taken against them. Unfortunately, 20,000 records were accidentally purged from their files, and there was no backup file.110
Computer glitch gives Schwab investors instant loss of balance. A program error caused Schwab's computer systems to omit a significant number of mutual funds when investors used Telebroker to track holdings by phone, leading some of them to believe themselves broke. The problem existed for two days, scaring scores of investors. Janus, Putnam, and Schwab's own funds were among those omitted from net asset calculations.111
Rough days on the stock markets (PGN)
With the huge fluctuations in stock prices on 27-28 Oct 1997, the NYSE and Nasdaq each handled over a billion shares for the first time ever on 28 October 1997, with the NYSE at 175% of the previous blockbuster day. The bad news is that those folks who relied on the Internet to do their panic trading were in for a rough time. There were huge numbers of e-trades already queued up before opening, causing an early traffic jam. Joseph Konen of AmeriTrade Holding blamed some of the delays on limitations of its firewall technology. Many would-be Internet buyers and sellers simply could not get access, in part because their Internet service providers were saturated. Many customers were blocked out because others were tying up lines just to monitor the market. (Illustrating the extent to which Internet trading has become a part of the markets, Schwab normally does 35 percent of its trading on-line; yesterday's trading of more than 300,000 on-line transactions more than doubled their Monday load and tripled their typical day.) Conventional trades were also affected. [Steve Bellovin, Frank Carey, and Nick Bender gave lots of details, including Nick noting the effects on Nasdaq of a sequence-number overflow from 999,999 to 000,000 (R 19 44).]
Chemical Bank's ATMs go down after snafu. Chemical Bank's ATMs were out of commission for more than five hours, beginning at 6:45 a.m. on July 20, 1994. A routine file update was botched, overloading the computer system. This came six months after Chemical systematically charged its customers twice for cash withdrawals.112
Patched software threatens $26 billion federal retirement fund. Inadequate configuration control often presents serious risks. "An audit of the $26 billion federal employees' Thrift Savings Plan found that ineffective control of software development has left the plan vulnerable to processing interruptions and may have compromised its data integrity." The audit found that between 1990 and 1993, more than 800 changes were made annually to the software; about 85 percent of 1993 updates, mandated or emergency changes, bypassed upfront quality assurance database testing; comprehensive quality assurance testing was rarely performed; six programmers (17 percent) accounted for more than 40 percent of all 1992 and 1993 TSP software changes, for which there was little documentation.113
Fidelity Brokerage computer problems. Fidelity Brokerage Services (a discount stock brokerage in London) rushed a new system into operation in April 1996 without adequate testing. As a result, they had more than 50 people working 14-hour days to sort through and manually correct three months of records ("late bookings of dividends and other problems"). British authorities forced FBS to stop taking new customers until the problems were solved.114
Interac. On November 30, 1996, the Canadian Imperial Bank of Commerce Interac service was halted when an attempted software upgrade failed, affecting about half of all would-be transactions across eastern Canada.115
German Railway payroll software glitch. The German Bundesbahn failed to meet its payroll correctly for four months running, because of software problems resulting from new pay adjustments in the privatized rail system. Some people didn't get paid at all.116
Bank goof creates millionaire. Howard Jenkins was a multimillionaire for about a half a day, when an ATM machine gave his balance in the tens of millions. Apparently, an error had resulted from his requesting a hold on his account after he lost his checkbook. He withdrew $4 million in cashier's checks and cash, but returned later with his lawyer to returned the money. The bank blamed a computer error.117
ATM problems in Canada. Toronto-Dominion Bank's automated teller machines crashed for most of the weekend in October 1996, affecting 2000 ATMs. Their debit payment system was also down.118
Chase Manhattan computer glitch affects thousands. As a result of a few missing keystrokes that would have properly defined the recipient list, about 11,000 out of 13,000 of Chase Manhattan Corp.'s secured credit-card customers received a letter intended for just 89 customers -- informing them that their accounts were in default and could not be converted to unsecured accounts. The screwup was blamed on an outside firm that administers the secured card program.119
Barclays Bank banks big-bang bump-up.
In one of the rare success stories that can be found in this book (primarily
because there are so few to report), Barclays Bank shut down its main
customer systems for a weekend, and seamlessly cut over to a new distributed
system accommodating 25 million customer accounts on Monday. This system
replaced three incompatible systems. It is rumored that Barclays spend at
least £100 million on the upgrade.120
Cases of accidental financial losses are given in Section , following intentional financial frauds in Section .
Intuit tax glitches. Flaws were reported in the PC and Mac versions of TurboTax and MacInTax 1040 for 1994. These flaws were triggered when transferring tax data to the tax package from other software, such as Quicken. Intuit Inc. estimated that the flaws would affect only about 1% of the users.121 Intuit also had a security flaw that could have enabled one user to download another taxpayer's returns, because the password for the Intuit master computer was embedded in a visible debug file.122
Tax preparation programs. PC Magazine did a comparison of twenty different tax-return packages and discovered that each one computed a different total tax due for the identical input data.123
Microsoft and Lotus spreadsheet errors. Microsoft and Lotus Development have admitted that their spreadsheet products may produce inaccurate results because of an inherent problem with the design of all computers (base conversion, rounding, etc.). Mistakes can occur in precision calculations, of the kind required by engineers and users in the scientific, banking and finance sectors. 124 Steve Bellovin recalled Fred Brooks describing a 1950s program for billing by petroleum usage, where the billing was legally constrained to conform to certain tables - which were incorrect. The solution was to compute another table defining the differences between the computed values and the legally required ones.125
Maryland Lottery Computer Glitch. A software error was blamed for two of the six winning numbers being reported incorrectly to 3,800 lottery outlets. Many people threw out their tickets thinking they lost, while others thought they had won.126
An unlosable casino game. Erling Kristiansen noted that the Dutch radio station Radio 538 set up a "Virtual Casino" on their web server, as a protest against legislation-in-the-making against Internet gambling. Playing is free of charge, and you can win real prizes, presumably paid by the sponsors whose company logos appear prominently. However, if you lose in a turn of the game, you just click on back on your Web browser, and you undo your loss! This reminded Hal Lockhart of computer-based gambling games that forget to check for negative bets: you make a negative bet and lose on purpose, and the game subtracts your bet from your winnings - that is, adds the absolute value of the bet!127
California lottery glitch. The California Lottery started issuing tickets for the following lottery 3 hours early on May 14, 1995, causing anger and confusion. An employee of Sacramento's GTECH, which runs the lottery computer, was conducting routine maintenance when he mistakenly entered a command that closed the draw pool. Lottery officials wisely decided that tickets in that 3-hour window would be eligible for both lottery drawings.128
U.K. lottery terminals downed by satellite network breakdown. National Lottery computer outlets crashed for 15 minutes throughout the United Kingdom on June 10, 1995, when part of the satellite network broke down before noon, ahead of the evening's expected record £20 million jackpot prize-draw.129
Ben & Jerry's first-ever loss. Ben & Jerry's Homemade Inc. reported its first quarterly loss (Q4 1994) since it went public -- due in part to recurring problems in their computerized handling system that delayed the opening of a modernized plant in St. Albans, Vermont.130
*** FOLLOWING EARLIER CASE, p.191. *** Enormous water bills - GIGO strikes again. James M. Politte of a Warrensburg, Missouri, reported receiving a water bill for $4,704.88. The water meter had been replaced with a new one, and being new, it read "000000". The previous reading from the last month, was "017060". The computer of course assumed that numbers on a water meter only go up, and thus assumed that "000000" was caused by the meter rolling-over after reaching "999999".131
The Absence of Good Software-Engineering Bites Security AgainSteve Bellovin noted at several recent meetings that 8 out of the 13 CERT
Advisories issued during 1998 involved security vulnerabilities caused by
buffer overflows. That alarming ratio deserves greater attention. CERT
Advisory CA-99-03 on FTP-Buffer-Overflows continues that tradition: "Remote
buffer overflows in various FTP servers leads to potential root compromise"
from Netect, Inc.
Gee whiz, folks, buffer and stack overflows have been with us for years.
For example, Robert Morris's Internet Worm exploited one in 1998. "When
will they ever learn?"
Electromagnetic interference on defense systems Patriot defenses and Predator unmanned aerial vehicles reportedly cannot work properly in certain foreign countries (Germany, Japan, South Korea and Bahrain are particular instances) because of frequency clashes. For example, Patriot missile system radios, radars, and data-link terminals clash with Korean cellular phones; U.S. force pagers clash with Japanese aeronautical systems; crib monitors used on U.S. bases clash with German telephone service. In Bahrain, SPS-40 and SPS-49 radars are unusable because of interference from the national telecommunications services. (See the Defense Week issue that came out on 26 Oct 1998.) "At least 89 telecommunications systems ... were deployed within the European, Pacific and Southwest Asian theaters without the proper frequency certification and host-nation approval." as noted by Roy Rodenstein, who reminds us of the HDTV interference with Baylor hospital equipment (R 19 62), and points out that quasi-ad-hoc spectrum use must be stemmed in the light of ever increasing uses of the spectrum.] Add to the end of Section 3.7, Classical Security Vulnerabilities
Limitations of cryptographic algorithms. Cryptographic systems are sometimes broken because of inadequate strength of algorithms or flaws therein. For example, the Netscape Commerce Server software uses 40-bit RC4 crypto to encrypt customer transaction data. Two efforts -- Damien Doligez (a French student) and a British team -- were independently able to crack the crypto, over the same period of time. It took the French student 8 days using 120 workstations and two parallel supercomputers to search exhaustively for the key - about what is predicted as 64 MIPS-years of processing.132 Subsequently, an MIT undergraduate, Andrew Twyman, used a single $83,000 ICE graphics computer to exhaustively attack Netscape's 40-bit encryption. It was reported that the cost to crack Netscape's exportable crypto thereby falls from $10,000 to $584 a pop.133
In response to a series of cryptography challenges announced by RSA Data Security, Ian Goldberg cracked 40-bit RC5 in 3.5 hours, using 250 machines to exhaust 100 billion would-be keys per hour. Germano Caronni cracked the 48-bit RC5 in 312 hours, using 3,500 computers to search 1.5 trillion keys per hour. The 56-bit DES challenge was broken after 4 months, by exhaustive search.134
Flaws in cryptographic implementations and embeddings. In most cases, it is easier to subvert a cryptographic system without breaking the algorithm or exhaustive search - typically because of weaknesses in the implementation or in the underlying operating systems. For example, two Berkeley computer-science graduate students identified a security flaw in the Netscape browsing software, exploiting the predictability with which a pseudorandom-number generator created the crypto seed (a unique offset). Knowledge of this weakness enables the key to be reverse-engineered with significantly less than exhaustive effort.135 A similar problem was discovered in Kerberos Version 4.136
Paul C. Kocher  described an attack that exploited the timing behavior of various cryptographic implementations including RSA, Diffie-Hellman, and the Digital Signature Standard (DSS), from which secret keys can be derived. This is a truly fascinating piece of work.137
Attacks on cryptographic implementations were also described by others as well, particularly involving smart-cards. Boneh, DeMillo, and Lipton explored the effects of introducing random faults through electromagnetic interference, and discovered they could determine private keys of public-key cryptosystems. Ross Anderson described similar attacks on smart-cards. Biham and Shamir found effective differential fault-induced analyses of symmetric cryptographic systems, including DES, triple DES, RC4, and IDEA.138 In addition, there are some rather efficient potential man-in-the-middle attacks on a variety of well-known authentication protocols (Sarvar Patel).
Risks in cryptographic key management. A 1996 National Research Council study report  Cryptography's Role In Securing the Information Society (a.k.a. the CRISIS report) presents a comprehensive review of U.S. cryptographic policy and an analysis of the risks associated with bad crypto and good crypto. A subsequent report authored by 11 cryptographers and computer scientists (Hal Abelson, Ross Anderson, Steve Bellovin, Josh Benaloh, Matt Blaze, Whit Diffie, John Gilmore, Peter Neumann, Ron Rivest, Jeff Schiller, and Bruce Schneier), The Risks of Key Recovery, Key Escrow and Trusted Third-Party Encryption , is also an important document.
RSA's RC5-56 challenge cracked by Bovine Cooperative (David McNett, R 19 43)
"It is a great privilege and we are excited to announce that at 13:25 GMT on 19-Oct-1997, we found the correct solution for RSA Labs' RC5-32/12/7 56-bit secret-key challenge. Confirmed by RSA Labs, the key 0x532B744CC20999 presented us with the plaintext message for which we have been searching these past 250 days.
The unknown message is: It's time to move to a longer key length
In undeniably the largest distributed-computing effort ever, the Bovine RC5 Cooperative (http://www.distributed.net/), under the leadership of distributed.net, managed to evaluate 47% of the keyspace, or 34 quadrillion keys, before finding the winning key. At the close of this contest our 4000 active teams were processing over 7 billion keys each second at an aggregate computing power equivalent to more than 26 thousand Pentium 200s or over 11 thousand PowerPC 604e/200s. Over the course of the project, we received block submissions from over 500,000 unique IP addresses. [...] Adam L. Beberg - Client design and overall visionary; Jeff Lawson - keymaster/server network design and morale booster; David McNett - stats development and general busybody"
Commerce Secretary calls U.S. encryption policy a failure (Edupage)
Distancing the Commerce Department from the position held by the Federal Bureau of Investigation, Commerce Secretary William M. Daley says that the Clinton Administration's controls on encryption technology are hurting America's ability to compete with other countries. "There are solutions out there. Solutions that would meet some of law enforcement's needs without compromising the concerns of the privacy and business communities. But I fear our search has thus far been more symbolic than sincere... The cost of our failure will be high. The ultimate result will be foreign dominance of the market. This means a loss of jobs here, and products that do not meet either our law enforcement or national security needs." (The New York Times, 16 Apr 1998; Edupage, 16 April 1998)
Ron Rivest's nonencryptive Chaffing and Winnowing (Mich Kabay)
Ronald Rivest has posted an interesting new model for maintaining
confidentiality without using encryption:
Ronald L. Rivest, Chaffing and Winnowing: Confidentiality without Encryption,
MIT Lab for Computer Science, 22 Mar 1998.
<http://theory.lcs.mit.edu/~rivest/chaffing.txt> for full details.
The method has the following key points: Sender and receiver desiring confidential communications agree on a basis for computing message authentication codes (MACs). Sender breaks message up into packets and authenticates each packet using the agreed-upon MAC algorithm. Sender introduces plausible "chaff" text, comparable to true message, and generates random MACs for these packets. Receiver with authorized method for verifying MACs can distinguish real packets ("wheat") from chaff by checking MACs and discarding chaff. Eavesdroppers, lacking a method for authenticating MACs, cannot distinguish wheat from chaff. (The chances of the random MAC on bogus chaff appearing valid are infinitesimal. PGN)
This method of enhancing confidentiality would not seem to qualify for regulation under the Export Administration Regulations of the U.S. Department of Commerce, nor would current proposals by the FBI and other elements of the Administration for mandatory key recovery appear to be applicable.
[Ron Rivest has a later version of the document than that which Mich saw when he wrote this, and has added some further clevernesses. This is really a very nifty piece of research. Incidentally, Ed Felten notes that he found a potential inference exploitation by monitoring packet acknowledgements, and has a fix that does not seriously detract from the advantages. PGN]
RISKS vol 19 number 87 includes the press release "EFF DES Cracker"
Machine Brings Honesty to Crypto Debate; Electronic Frontier Foundation
Proves DES Is Not Secure announcing the Deep Crack machine, which is the
result of efforts of John Gilmore, Paul Kocher and his Cryptography Research
company, the Electronic Frontier Foundation, and Advanced Wireless
Technologies. The chip design and related technical details are described
at length in EFF's book, Cracking DES: Secrets of Encryption Research,
Wiretap Politics, and Chip Design,, published by O'Reilly and Associates.
http://www.eff.org/descracker.) The machine was built for less
than $250,000. Even though the approach is essentially brute-force, this
is a remarkable piece of work. Deep Crack was able to solve the RSA DES-II
challenge in less than 3 days (the previous record was 39 days), as well as
another challenge posed by Matt Blaze for finding a DES key with a
particular interesting plaintext-ciphertext symmetry property (also noted in
R 19 87). This work demonstrates clearly that 56-bit keys are no longer
realistic against concerted attacks.
Risks of Key Recovery
The 1997 report, Risks of Key Recovery, by Hal Abelson, Ross Anderson, Steven M. Bellovin, Josh Benaloh, Matt Blaze, Whitfield Diffie, John Gilmore, Peter G. Neumann, Ronald L. Rivest, Jeffrey I. Schiller, and Bruce Schneier has been reissued with a new preface, including this paragraph:
"One year after the 1997 publication of the first edition of this
report, its essential finding remains unchanged and substantively
unchallenged: The deployment of key recovery systems designed to
facilitate surreptitious government access to encrypted data and
communications introduces substantial risks and costs. These risks
and costs may not be appropriate for many applications of encryption,
and they must be more fully addressed as governments consider policies
that would encourage ubiquitous key recovery." [See
In a further development on key-escrow, NSA has declassified the 80-bit Skipjack encryption algorithm and its 1024-bit key-exchange algorithm (R 19 84).
Incidentally, see (R 19 84-86) for comments on the unsuccessful NIST-sponsored effort to establish a key-recovery standard by the 22-member U.S. Government Technical Advisory Committee to Develop a Federal Information Processing Standard for the Federal Key Management Infrastructure (TACDFIPSFKMI).
Thirteen technology companies have proposed a different approach called "private doorbells", by which link encryption would leave unencrypted information tappable at the network nodes. This of course suffers from the inherent vulnerabilities at the nodes, and is not likely to eliminate the desire and need for end-to-end encryption (R 19 85).
Ill-Litt-er-ate comment on U.S. cryptography policy (Steve Crocker)
The 1998 Electronic Privacy Information Center (EPIC) Cryptography and Privacy Conference took place on 8 Jun 1998 in Washington D.C. It was an excellent program, but unfortunately the most memorable moment was a response from Principal Associate Deputy Attorney General Robert Litt. Litt appeared on a panel about US Encryption Policy. During the Q&A, he was asked about the National Research Council's report last year on cryptography policy, Cryptography's Role In Securing the Information Society ("CRISIS").
For those unfamiliar with the report, it's a monumental and thorough work. The committee included a former deputy Secretary of State (Kenneth W. Dam), a former deputy commander in chief of the European command in Germany (W.Y. Smith), a former deputy director of NSA (Ann Caracristi), a former Attorney General (Benjamin Civiletti). 13 of the 16 committee members had full security clearances and received the much touted behind-the-scenes briefings from the intelligence community. They concluded "debate over the national cryptographic policy can be carried out in a reasonable manner on an unclassified basis."
Nonetheless, Litt responded that it was written before he came on board and therefore he didn't feel obliged to read it. The audience gasped. Undersecretary of Commerce for Export Administration, William Reinsch, sitting with him on the panel looked disgusted. Jim Bidzos, president of RSA, later quipped it was "a gaff of EPIC proportions." The hallway talk the rest of the day reflected shock at the combination of naivete and arrogance that continues to pervade the Administration.
See also comments from Undersecretary Reinsch (R 19 81).
Flaws in Web browsers, servers, and Web-oriented programming languages. Numerous security flaws in Webware have been detected and reported by Drew Dean, Ed Felten, and Dan Wallach (Princeton), David Hopwood, Daniel Abplanalp and Stephan Goldstein, Stephen Anderson, John LoVerso, Bob Atkinson, Paul Greene, EliaShim, and others. See Section 3.1 for an analysis of Webware security.139
More Java woes (Edward Felten) We have found another Java security flaw that allows a malicious applet to disable all security controls in Netscape Navigator 4.0x. After disabling the security controls, the applet can do whatever it likes on the victim's machine, including arbitrarily reading, modifying, or deleting files. We have implemented a demonstration applet that deletes a file.
This flaw, like several previous ones, is in the implementation of the "ClassLoader" mechanism that handles dynamic linking in Java. Despite changes in the ClassLoader implementation in JDK 1.1 and again in JDK 1.2 beta, ClassLoaders are still not safe; a malicious ClassLoader can still override the definition of built-in "system" types like java.lang.Class. Under some circumstances, this can lead to a subversion of Java's type system and thus a security breach.
The flaw is not directly exploitable unless the attacker can use some other secondary flaw to gain a foothold. Netscape 4.0x has such a secondary flaw (a security manager bug found by Mark LaDue), so we were able to demonstrate how to subvert Netscape's security controls. We are not aware of any usable secondary flaws in Microsoft's and Sun's current Java implementations, so they appear not to be vulnerable to our attack at present.
Please direct any inquiries to Edward Felten at (609) 258-5906 or email@example.com.
Dirk Balfanz, Drew Dean, Edward Felten, and Dan Wallach,
Secure Internet Programming Lab, Department of Computer Science,
"SATAN" anticracker software released. Dan Farmer and Wietse Venema developed a Security Administrator Tool for Analyzing Networks (with a rather negative-sounding acronym of SATAN), when Dan was working at Silicon Graphics. SATAN is designed to scan Unix computers on the Internet for the presence of security vulnerabilities, all of which have known fixes. SATAN was released to the world on 5 Apr 1995. Despite a few predictions of disaster, and a security flaw in version 1.0 (which was quickly fixed in version 1.1), there seem to have been no serious negative consequences. (There is some background in The New York Times, March 11-12, 1995.) Kevin Mitnick also allegedly broke into Dan Farmer's Well account and offloaded a copy of SATAN. Dan was subsequently released by SGI (see "Dismissal of Security Expert Adds Fuel to Internet Debate" by John Markoff in The New York Times, March 22, 1995), and has now been rehired by Sun. RISKS has frequently seen discussions of the pros and cons of making such tools available. If the knowledge of vulnerabilities is not promulgated, the system flaws and configuration weaknesses do not seem to get fixed, and that knowledge seems to permeate the malicious-hacker community anyway. If the knowledge is promulgated, then the likelihood of exploitations tends to increase - although it certainly provides an added incentive to clean things up in a greater hurry.140
University of California computerized retirement system flawed. A University of California computer system administers U.C.'s $20 billion retirement program. A recent report by Infortal Associates concludes that the security of the system was very weak, and that the audit trails were flimsy and easily disabled. Furthermore, no one had responsibility for reviewing the audit trails. The system can be accessed from the Internet, and Infortal was able to do so using a widely known password. However, a University spokesman observed that no adverse penetrations had actually taken place, no money has been lost, and the security flaws have been fixed. (Well, how do you know?)141
Authentication in Lotus Notes. The authentication procedure using public-key systems in Lotus Notes (as described in its "Internals on-line book") has security flaws. Lotus's response was (1) the actual system does not work as described in the manual and (2) how it actually works is proprietary information. Li Gong commented that (1) is dangerous by itself, and if (2) is true, then why pretending to describe the procedure in the first place.142
Windows 95 security hole! Olcay Cirit noted a security hole in Windows 95 that prevents all 32-bit virus scanning programs from accessing specially named files. Because DOS deletes files by overwriting the first character of the file name with extended-ASCII character 229, and the entry is removed from the File Allocation Table (FAT). Although DOS lets you use such file names, Windows 95 claims that such files do not exist.143
Update on Windows NT denial-of-service attacks (Matt Welsh)
Last night, Microsoft posted a security bulletin at http://www.microsoft.com/security/netdos.htm describing the network denial-of-service attacks on Windows NT and 95 systems, commonly referred to as the "New Tear", "Bonk", or "Boink" attacks. The fix to the problem released in NT 4.0 Service Pack 3, and patches for Windows 95, are available.
From the Microsoft Knowledge Base information on this problem: "The modified teardrop attack works by sending pairs of deliberately constructed IP fragments which are reassembled into an invalid UDP datagram. Overlapping offsets cause the second packet to overwrite data in the middle of the UDP header contained in the first packet in such a way that the datagrams are left incomplete." Interestingly, the information on Microsoft's Web pages seems to be somewhat conflicting, and it's difficult to tell exactly which of the multiple known NT TCP/IP stack bugs are being addressed here, and which patches are needed to prevent them.
Ruminations on MS security (A. Padgett Peterson)
Before I launch this commentary, I need to make a couple of things clear: 1) I am speaking for myself only as a private individual. 2) I think the wizards at Redmond have produced some marvelous products but that like the certain letter agencies, their agenda is not necessarily the same as mine. At least letter agencies seem to have fewer lawyers.
I do have some experience with the second since 1990, when I sent a letter to MS that a simple routine placed into IO.SYS would eliminate all known MBR and boot sector viruses. The response was that it was not in their business interest. (The routine was simple - check the byte at 0000:004F for a value equal to or greater than C0, and if less, "Redmond, we have a problem". I generally use something a bit more sophisticated but was all that was needed. Note: this works only before the operating system - any operating system - loads.)
Since then, we have been granted such features as the ability to create Word macro viruses and a server operating system that was rated NCSC C2 so long as it was not connected to a network. However the new crop of offerings are even more innovative.
Suffice it to say that for years we have been able to tell users that "you cannot get a virus just by opening E-Mail". Well, that bug is being fixed. With the default installation of the just-released mail-reader product, coupled with the 98 version of the operating system (at least the current beta that contains a necessary .DLL), all of the factors needed to accomplish the above are present.
In fact, in recent days I have been able to drop an executable file both on
c:\ and into the startup directory just by opening the mail reader
("preview", which includes script execution for some reason, is a default
feature). True, a warning screen is presented if the applet is unsigned
(have heard that signatures are already floating around the Internet), but
the same screen is presented if Word is opened as well, so I suspect it may
become as quickly ignored as other such mechanisms have been in the past (as
with most security annoyances, there is an easy way to turn it off).
I have little expectation that MS will see the error of its ways and remove the single necessary construct. It is probably required for PUSH. It is entertaining though to find in the on-line language reference the statement that the scripting language has no File I/O. I'm sure that in some obscure legal language, that must be syntactically correct or it would not be there; however, I found it remarkably simple to drop an executable file on the hard disk that executed on the next boot. Times are about to become "interesting". Caveat Y'all. [Lightly edited by PGN]
This is not a new problem: people have always passed programs around. What is new is the scale and frequency of downloading, and the fact that it happens automatically without conscious human intervention. In one (admittedly unscientific) recent experiment, a person was found to have downloaded and run hundreds of Webware programs in a week. The same person ran only four applications from his own computer.
The danger in using Webware lies in the fact that simply visiting a Web page may cause you to unknowingly download and run a program written by someone you don't know or don't trust. That program must be prevented from taking malicious actions such as modifying your files or monitoring your on-line activities, but it must be allowed to perform its benign and useful functions. Since it is not possible (even in theory) to tell the difference between malicious and benign activity in all cases, we must accept some risk in order to get the benefits of Webware.
Despite the danger, Webware is popular because it meets a real need. People want to share documents, and they want those documents to be dynamic and interactive. They want to browse -- to wander anywhere on the Net and look at whatever they find.
There are two approaches to Webware security, the all-or-nothing model and the containment model. The all-or-nothing model is typified by Microsoft's ActiveX and by Netscape plug-ins. These systems rely on the user to make an all-or-nothing decision about whether to run each downloaded program. A program is either downloaded and run without any further security protection, or refused outright.
This decision can be made by exploiting digital signatures on downloaded programs. The author of a program, and anyone else who vouches that the program is well-behaved, can digitally sign it. When the program is downloaded, the user is shown a list of signers and can then decide whether to run the program.
The containment model is typified by Java from Sun Microsystems. Java allows any program to be downloaded, but tries to run that program within a contained environment in which it cannot do any damage. (For some reason this contained environment is called "the sandbox," though real-world sandboxes are good at containing neither sand nor toddlers.)
Both approaches have had problems. The problem with the all-or-nothing model is subtle but is impossible to fix: it puts too much burden on the user. Users are constantly bothered with questions, and they must choose between two equally unacceptable alternatives: discard the program sight unseen, or give the program free rein to damage the user's system. Experience shows that people who are bothered too often stop paying attention and simply say "OK" to every question -- not an attitude conducive to security. The all-or-nothing model causes trouble because it doesn't allow users to browse.
The main problem with the containment model is its complexity. In Java, for example, there is a large security perimeter to defend, and several flaws in both design and implementation have been found, leading to the possibility of serious security breaches. Though all of the known problems have been fixed at this writing, there is no guarantee that more problems won't be found. (For a general discussion of Java security issues, see McGraw and Felten .)
Another problem with the containment model is that it is often too restrictive. Java, for example, prohibits downloaded programs from accessing files. Though this prevents malicious programs from reading or tampering with the user's private data, it also makes legitimate document-editing programs impossible.
The restrictiveness problem can be addressed by making the security policy more flexible using digital signatures, as Sun, Netscape and Microsoft have done in their recent Java releases. When a person runs a program, the person's browser can verify the signatures and the person can decide whether to grant the program more privileges because of who signed it. In theory, this allows users to make finely calibrated decisions about which programs to trust for which purposes. In practice, this approach is likely to have some of the problems of the all-or-nothing model. Users will be asked too many questions, so they will get tired and stop paying attention.
Still, the containment model has some advantages. Granting only a few privileges may expose the user to less risk than letting down all security barriers. And containment at least allows the system to log a program's activities.
Webware security is difficult because of human nature. People want to browse without worrying about security, but browsing Webware is dangerous. Only a person can decide who or what is trustworthy and how to weigh the benefits of a particular decision against the risks, but human attention to security is a precious resource that we must spend carefully.
In several cases noted in Chapter , it was not just causes that were correlated, but also effects. Furthermore, in several widely propagating large-scale disasters, it was the effects of one problem that became contributing causes for the next stage in the propagation.
Effects of telecommunications, power, and air-traffic outages. Outages of local power and telephone service have had major effects on air traffic -- for example, in one case shutting down all three New York airports. However, the air-traffic problems resulting from the shutdown of a major air center can also propagate nationwide, affecting many other infrastructures as well.
In the case of the 1996 Western power outages (Section 2.10.2), the inability of a local portion of the power grid to handle its own shortfall tended to put surrounding regions into stressful operations, whereby other causal factors kicked in.
The 1996 Western power outages. If we accept the conclusion that a single tree coming in contact with a power transmission line can be blamed for triggering a massive power outage, it is clear that such an event could have been caused intentionally. Indeed, an FBI agent is said to have told an audience at the University of California at Davis that blaming the July 2, 1996, outages on a single tree was in fact a cover-up for malicious activities that were better left unrecognized. [*** Insert in Section 4.2.1, Accidents could be triggered, before Chernobyl ***]
*** ADD TO CONCLUSION OF THE CHAPTER
In the final analysis, it makes little difference a posteriori whether there was a single point of failure, or a combination of correlated causes, or a combination of uncorrelated causes, or a chain of causes and effects. A disaster is a disaster. However, it should make a significant difference in remediation, because single-point failures should be more easily recognized and hindered.
It is clear that overdependence on high-risk entities should be avoided wherever and whenever possible. On one hand, errors in a single Internet nameserver table should not be able to cause the two extensive Internet problems experienced during 1997 (Section 2.1.1. (However, note also that the MAE-East router implicated in the Netcom outage is located in a flimsy building in the middle of a parking lot, with abutting parking spaces.) On the other hand, the propagated effects of a single tree touching a power line in Idaho should not have been able to bring down power in 12 Western states. Much greater emphasis must be placed on defensive system design, preventive analysis, and proactive remediation.
San Francisco blackout blamed on sabotage (R 19 42)
126,000 customers in northern San Francisco experienced a power outage for up to 3.5 hours beginning at 6:15 a.m. on 23 October 1997, when five transformers stopped working at the power substation at Eighth and Mission. The FBI counterterrorism unit is investigating what it considers the likelihood of sabotage (for reasons not revealed, although 39 of the 42 switches were open).
Three Army Web sites hacked On the heels of the recent Cloverdale attack on unclassified Pentagon computer systems (R 19 60), three Army World Wide Web sites were hacked on 8 Mar 1998: the Army Air Defense Artillery School, the Army 7th Signal Brigade, and the Army Executive Software Systems Directorate. Official content was replaced with messages about the previous Pentagon attacks. One of the messages said, "For those of you in the security community, the so-called Pentagon hackers are using nothing more advanced than the `statd'. Get a list of 200 sites, and sit and try the same exploit to every one of them. [You're] going to get one out of 100 sites eventually." [Source: Security Information News Service]
Russian breaks Citibank security. A Russian, 24-year-old Vladimir Levin, penetrated Citibank's computer security, and was able to make about 40 funds transfers totalling more than $10 million, between June and October, 1994. He and five others (two in the U.S., two in The Netherlands, and one in Israel) were were caught as they were trying to move $2.8 million. However, only about $400,000 was actually transferred out of the system, and no customers lost any money. This was apparently another exploitation of reusable (fixed) passwords.145
Dartmouth prof spoofed. The night before Dartmouth Professor David Becker's scheduled Government 49 midterm exam on Latin American politics, someone masquerading as a department secretary sent e-mail announcing the cancellation of the exam ("because of a family emergency"). Half of the class did not show up.146
Man charged with e-mail stalking. A 31-year-old suburban Dearborn Heights man was arrested and charged under Michigan's anti-stalking law for alleged harassment of a 29-year-old Farmington Hills (another Detroit suburb) woman via e-mail. The article refers to this as "one of the first cases of stalking based primarily on e-mail."147
Mitnick vs. Shimomura. The Well and Netcom combined efforts to aid in the arrest of 31-year-old hacker Kevin Mitnick in Raleigh, North Carolina. Both companies discovered large caches of data being stored on their systems. At the same time, Tsutomo Shimomura discovered security breaches in his system on Christmas 1994. This led to vigorous efforts to track the hacker, and after 24-hour electronic surveillance and at least one cellular phone trace, law enforcement officials arrested Mitnick . Mitnick's early escapades are chronicled in the book Cyberpunk by Katie Hafner and NY Times reporter John Markoff. Mitnick is accused of breaking into Markoff's computer. (Mitnick faced up to 30 years in prison for various crimes, including allegedly breaking into NORAD computers, and was indicted for 23 counts of fraud including computer misuse in at least six different jurisdictions.)148
Internet Protocol security attacks of the type used by Mitnick have been known for a long time, including an early description by Robert Tappan Morris149.
Kevin Poulson. Federal prosecutors decided to drop espionage charges against computer programmer Kevin L. Poulson because the military document in his possession was obsolete, he was lawfully entitled to access it, and he had not shared it with anyone else. In exchange for the dropped charges, Mr. Poulson pleaded guilty to unrelated crimes involving illegal access into files of the Pacific Bell Telephone Company.150
Argentine hacker. U.S. officials used an unprecedented court-ordered wiretap of a computer network to charge Julio Cesar Ardita, 21, of Buenos Aires, Argentina, with breaking into computers at Harvard, University of Massachuetts, Caltech, Northeastern, U.S. Navy, and NASA, as well as other systems in Mexico, Korea, Taiwan, Brazil and Chile. The investigators used a program called I-Watch for Intruder Watch run on a government computer located at Harvard. The program searched the net for the targeted criminal among 16,000 university users. Ardita was charged with "possession of unauthorized devices" (illegal use of passwords), unlawful interception of electronic communications and "destructive activity in connection with computers." He remains free in Argentina because the charges are not covered by the existing extradition treaty.151
"Black Baron" gets 18-month sentence for virus activities. Christopher Pile, 26, the first person in Britain to be convicted of creating computer viruses, was jailed for 18 months. His creations (Pathogen and Queeg) were apparently rather sophisticated stealth viruses that used an encryption program (Smeg) to hide their presence.152
Stolen account used to send hate e-mail at Texas A&M. Someone broke into the electronic mail account of Professor Grady Blount at Texas A&M, and sent out racist messages to about 20,000 Internet users in four states. In response, Blount received death threats and other harsh responses from nearly 500 users.153
Cyberbandits in Europe. Computer crackers stole $150,000 worth of international phone calls from five U.K. companies and caused $400,000 in losses for a Dell Computer subsidiary that had to shut down a free customer-service phone line.154
Cellular phone scam. Clinton L. Watson, 44, was arrested on October 18, 1994, along with his son and a family friend, and charged in San Jose, California, with three counts of wire fraud and grand theft, with a possible prison sentence of 30 to 45 years. Watson allegedly altered and sold more than 1000 cellular phones with illegally acquired identifiers, whose use resulted in millions of dollars of phone calls being billed to unsuspecting persons. Legitimate cellular-phone identification numbers were allegedly captured using scanners, and entered into identity-reprogrammable clone phones that were fabricated from new programmable chips -- which allowed the identity numbers to be replaced as rapidly as their misuse was detected. (The Secret Service noted that Watson was currently on probation from a 1988 conviction on 14 counts of wire and mail fraud in Missouri.)155
Cellular telephone security. Cellular One of New York and New Jersey began requiring new customers to enter a four-digit security code before placing calls, which is transmitted over a separate frequency. This is in an effort to curtail cellular fraud, which was estimated at $482 million in 1994 (3.7% of the industry's revenue).156
The Eagle (the President) and the Eagle Beagle (David Wagner) (R 19 39-40)
An unidentified hacker announced on 19 Sep 1997 the interception of President Clinton's pager messages (along with pager messages destined for staff, Secret Service agents, and other members of his entourage) during his April 1997 trip to Philadelphia. The lengthy transcript of pager messagers was published on the Internet to demonstrate that the pager infrastructure is highly unsecure.
(Apparently the President's entourage relies a lot on pagers for communications. There are messages from Hillary and Chelsea; a Secret Service scare; late-breaking basketball scores for the President; staffers exchanging romantic notes; and other amusements.)
This comes at quite an embarrassing time for the administration, given their policy on encryption. Strong encryption is the one technology that could have protected the private pager messages, but the administration has been fighting against strong encryption. Top FBI officials have been giving many classified briefings to House members, asking them to ban all strong encryption in the US.
An anonymous White House staffer was quoted as saying that it would be "an expensive and complicated proposition" to put encryption into pagers and cellphones. This quote is interesting, because it's the White House's crypto policies that have made it so complicated and expensive to add strong encryption - the cellphone and pager industries have wanted to add strong encryption for privacy and security, but the administration has forcefully dissuaded them from doing so. [Adapted from a cypherpunks item, with David's permission. See RISKS-19.39 and 40 for more, and check out my Web pages for my recent Senate and House testimonies on risks in the computer-communication infrastructures. PGN]
Prosecution for pager interceptions (Steven Bellovin) (R 19 35)
A New Jersey company has been charged with illegally intercepting and selling messages sent via a paging service. The messages - the content of which was sold to news organizations - were intended for delivery to the offices of various senior New York City officials, including the mayor's office and various top police and fire department officers. (See R 19 35,36 for the rest of the story.)
Church cordless phone piggybacked. The telephone bill for a remote Irish Roman Catholic church included £800 of calls to a telephone sex service. Apparently, someone piggybacked on the dial tone from a cordless phone and placed the calls from outside the church. However, the church had to pay the phone bill.157
Stolen ATM card nets $346,770; limits inoperative. Thieves broke into a van and stole an Oregon woman's ATM card and discovered her PIN number written on her Social Security card. They made repeated withdrawals, covering 100 miles and visiting 48 ATM machines over a weekend. They were able to get $346,700 in cash, with the help of some questionable computer systems. Ordinarily there is a $200 daily limit for withdrawals. However, "because of a computer program change at the Oregon TelCo Credit Union, the limit was not in effect that weekend." When the account was down to zero, the thieves fed empty deposit envelopes into the machine and credited the account with bogus deposits of $825,000 - and then made withdrawals against this sum. However, at least five of the machines had taken photos of the people using the stolen card, three of whom were apprehended.158
Criminal hacker arrested in Winnipeg. In Canada, a 31-year-old man was arrested after an 8-month investigation, for illegally accessing an Internet system at the University of Manitoba. The Crackerjack program "was used to decrypt the access codes of legitimate users"; the accused is alleged to have stored stolen software and porn on-line.159
UK hacker ("Datastream") finally arrested. A 16-year-old British youth allegedly broke into sensitive U.S. computer systems, including an Air Force system at Rome Laboratory, over a period of seven months. He made various postings on the Internet, including information relating to the North Korean nuclear situation. He was finally detected because he remained connected overnight, and was arrested in the U.K. in July 1994.160
Computer crackers sentenced. Two computer crackers were sentenced to federal prison for their roles in defrauding long-distance carriers of more than $28 million. The two were part of a ring that stole credit-card numbers from MCI, where one was an employee. Ivey James Lay, who worked at MCI, was sentenced to three years and two months, and his accomplice, Frank Stanton, received a one-year prison term.161
Finnish executives jailed for software piracy. The two top executives of a Helsinki engineering company were given 60-day jail sentences and $72,000 fines for knowingly using illegal copies of AutoCAD computer-aided design software. The stiff punishment is a victory for the Business Software Alliance, which says that its member companies suffered $15.2 billion in global losses last year due to software piracy.162
Worldwide estimates of losses from piracy surpassed $15.2-billion for 1994. The problem is most rampant in Indonesia and Kuwait, where about 99% of all software is copied illegally.163
16-year-old boy cracks university computer security. A Lancaster, Pennsylvania, teenager was visiting with a student at Eastern Mennonite University, and was given the student's password. The teenager used that password to log on, download some hacking tools from a bulletin board, and gave supervisor privileges to everyone - including access to forthcoming final exams, private e-mail, and faculty documents. (Student and financial records are kept elsewhere.) EMU spokesman Jim Bishop said, "Apparently, none of the students rifled through material they shouldn't have seen."164
E-mail tap nets criminals. The first-ever court-approved wiretap of an e-mail account resulted in the arrest of three people charged with running a sophisticated cellular-fraud ring. The alleged mastermind, a German electrical engineer, advertised his illicit wares on CompuServe, where they caught the attention of an engineer at AT&T's wireless unit. The Secret Service and the Drug Enforcement Agency got into the act and obtained the Justice Department's permission to intercept e-mail messages between the alleged perpetrator and his accomplices.165
Health card used to rip off ATM. Due to a software glitch, a Vancouver Island man was able to use his health-care card to steal $100,000 from a Bank of Nova Scotia automated teller machine. However, the ATM recorded the data on his card, and he wound up with a year in jail.166
MoD hackers claim major U.S. defense system cracked (R 19 69)
A Reuters article by Andrew Quinn 22 Apr 1998 notes that a group calling
itself Masters of Downloading (a new MoD, including members in the U.S.,
Britain, and Russia) claims that it has been able to obtain secret files
from a computer system used to control military satellites, via the Defense
Information Systems Network (DISN). The files include the DISN Equipment
Manager (DEM), which controls the U.S. network of military Global
Positioning System (GPS) satellites. MoD members apparently informed John
Vranesevich (who runs the computer security website AntiOnline
<www.antionline.com>) of their exploits.
Pentagon to take stronger computer security measures (Edupage)
Learning of numerous vulnerabilities in the security of the computers accessed by its 2.1 million users worldwide, the Department of Defense is formulating new plans to tighten security systems. In a recent military exercise called "Eligible Receiver," cyberattacks were able to access the military's command and control structure in the Pacific (and could have shut it down); the attacks also could have turned off the entire electrical power grid in the U.S. (Washington Times, 17 Apr 1998; Edupage, 19 Apr 1998)
[Eligible Receiver used mostly well-known penetration techniques. The WashTimes article quoted Pentagon spokesman Kenneth Bacon saying "Eligible Receiver was an important and revealing exercise that taught us that we must be better organized to deal with potential attacks against our computer systems and information infrastructure." This should have been no surprise to anyone except perhaps whomever in the Pentagon doesn't read RISKS and security newsgroups. See also a fascinating Department of Energy memo (R 19 70), and why don't they just use PGP? (R 19 71). PGN]
AOL "Trojan Horse" alerts. America Online issued a warning to its users about a destructive file attached to an e-mail message that had been circulating through its service and across the Internet. The message itself is benign, but trying to run an attached "Trojan Horse" file called AOLGold or "install.exe" could crash a hard drive.167
Webware presents many opportunities for Trojan horses with Java, HotJava, Microsoft Word, Microsoft e-mail binaries, PostScript files, and so on, as noted in Section 3.1.168
A bogus ATM installed on High St in London netted £120K for its proprietors.169
Computer misuse by IRS employees. Hundreds of Internal Revenue Service employees were disciplined in 1994 for improperly browsing through tax returns. Such employee behavior ranged from out-and-out fraud to simple curiosity.170
A former IRS employee, Walter C. Higgins of Salem, New Hampshire, was indicted on wire-fraud charges for illegally browsing through IRS computers to gather information on Thomas Quinn, a candidate in an election for the House of Representatives. (Quinn lost the election to Martin Meehan, D-MA from Lowell; Meehan says he knew nothing about Higgins.)
An IRS employee, consumer representative Richard W. Czubinski, of Dorchester, MA, was indicted for misusing his computer access privileges to obtain information on 30 taxpayers, including members of the campaign committee of South Boston City Council President James Kelly. He apparently also accessed the tax files of a Suffolk County assistant district attorney who had unsuccessfully prosecuted Czubinski's father. (Czubinski was also described as a sometime political candidate and a member of the Ku Klux Klan.)171
Reuters computer tech brings down trading net. A disgruntled computer technician at Reuters in Hong Kong caused the financial-information provider deep embarrassment by sabotaging the dealing-room systems of five of the company's investment bank clients, for up to 36 hours. He apparently visited the client sites and initiated deferred commands to subsequently delete specific operating system files. The attack crippled the computer systems bringing market prices and news to traders at NatWest Markets, Jardine Fleming, Standard Chartered, and two other banks. The banks were able to resort to alternative systems. The incident was reportedly the most serious breach of security disclosed in Reuters' corporate history.172
Taco Bell-issimo: salami-attack variant. Willis Robinson, 22, of Libertytown, Maryland, was sentenced to 10 years in prison (6 of which were suspended) for having reprogrammed his Taco Bell drive-up-window cash register -- causing it to ring up each $2.99 item internally as a 1-cent item, so that he could pocket $2.98 each time. He amassed $3600 before he was caught. This is the inverse of the old salami attack, where the long end of the salamis disappeared instead of lots of small slices.173
Japanese bank workers steal 140 million yen by PC. A Japanese bank employee and two computer operators have been arrested and charged with allegedly using a personal-computer money-transfer system to steal 140 million yen ($1.4 million). The money was sent in December 1994 from Tokai Bank Ltd to an account in another bank (using a settlement system operated by personal computers) and then withdrawn on the same day. On the next day, an additional 1,490 million yen ($14.9 million) was transferred to accounts in several other banks - although these transfers were detected before the money could disappear. Insiders were suspected among bank employees and computer-services suppliers. The scheme was apparently driven in part by debts owed to organized-crime groups.174
New Massachusetts password law invoked on hospital technician. Mark L. Farley was arrested on April 9, 1995. Working as an orthopedic technician in the Newton-Wellesley Hospital, he allegedly accessed a former employee's computer account to search through 954 confidential files of patients (mostly young females) for telephone numbers, which he then used to make obscene calls. (He had pleaded guilty in 1984 to raping an eight-year-old girl in Erving.) He is apparently the first person to be charged under a new Massachusetts statute that makes it a criminal offense to use someone else's password to gain access to a computer system. He is also accused of stealing hospital trade secrets, and making obscene or annoying telephone calls - apparently from the hospital.175
UK reports dramatic increase in computer misuse. The U.K. National Audit Office reported a 140% increase in hacking activities involving Government computers, from 1993 to 1994. Civil servants and outsiders conspired to defraud a Government department of £1,500,000. A civil servant obtained personal details of colleagues to blackmail them. A Government official obtained the private address of a married couple, possibly to assist in the kidnapping of the wife. Two staff members were prosecuted and fined £3,750 after leaking computer data. Theft was also a major problem.176
Software backdoor in emission testing. The Denver area's car-emission testing program was discovered to have two passcodes, 00010 and E35E, with which the computerized system would pass any car regardless of the emissions results.177
Worker cleared of deleting files. Vina Windes, a former secretary for the Boulder County Planning Department, was arrested after she had resigned from the department. She was charged with computer crime, a felony, and abuse of public records, for deleting files. However, the charges were subsequently dropped; she noted that the files had never been deleted, and that the situation had arisen because no one else in the office knew how to find the files after she had left.178
Intel CD-ROM hoses hard drives. Intel distributed a press kit that crippled reporters' computers, changing the configuration specs on hard drives.179
The Ford Motor Company released a promotional floppy disk that accidentally contained a nasty executable - the monkey virus.180
The final report of the President's Commission on Critical Infrastructure Protection concluded in the fall of 1997 that roughly 500 new viruses are appearing each month, and that approximately 3000 viruses are active at any time.
More recent numbers suggest that there were over 10,000 viruses by the end of 1998.
Melissa Macro Virus
With all the furor over the Melissa virus-like Trojan horsed e-mail propagation, deeper issues seemed lost in the shuffle. The vulnerabilities exploited in the MS Word macro virus in Microsoft Outlook and Outlook Express have been around for a long time and are likely to be around for a long time. Although some palliative fixes are available, the fundamental problems remain. (For example, filters deleting e-mail with "Subject: Important Message from ..." are only partially useful, in light of recent versions of Melissa with blank Subject lines.) The basic system infrastructure is incapable of adequately protecting itself against all kinds of misuses, and this particular exploit is just another reminder that many folks need to wake up. The situation could have been much worse, but unfortunately many folks who depend on systems that are inherently inadequate do not get the proper messages when the situation is not a terrible disaster. On the other hand, even if we had terrible disasters, it does not seem to be enough. Many of the constructive lessons that should have been learned from the Internet Worm over 10 years ago are still unlearned.
See my Web site (http://www.csl.sri.com/neumann/house99.html) for my testimony for the 15 Apr 1999 hearing of the House Science Committee subcommittee on technology, in which I consider Melissa as the tip of a very large iceberg - the abysmal states of computer-communication security and system development practice.
For further RISKS items on Melissa, see a report by Robert M. Slade (R 20 26); hidden risks (R 20 28); effect on a UK bank (R 20 30); risks of monocultures (R 20 26) and more virulent macro viruses (R 20 26); further analysis (R 20 30); role of the GUID in identifying David Smith as the purported culprit (R 20 26,28,30-34), with wrap-up from Richard M. Smith (R 20 33); mainframe viruses (R 20 30-32) and origin of virus vulnerabilities (R 20 29); false virus detection while searching for Melissa (R 20 40).
An incorrect pointer in a program pointed to the wrong file, and resulted in the Demon Internet service provider putting out encrypted password file as the message of the day.181 44.
Dan Cross wanted to play a Hare Krshna CD for his girl-friend over the phone, but discovered that it triggered her answering machine remote-access sequence.182
Conviction overturned. When John Munden, a policeman in the United Kingdom, returned from vacation and discovered half of the money in his bank account had been withdrawn, he complained to his bank - which insisted its security was impeccable, and accused him of attempted fraud. He was convicted. After a four-year ordeal, he was finally acquitted by a judge who noted that the bank had not given the defense access to its computer systems to be able to determine their impeccability.183
Ramsay case confusion. There was an initial claim by police in the murder of Jon-Benet Ramsay that hackers with physical access to the computer had penetrated the case file. Much to official chagrin, it was discovered that the real problem had been a dead CMOS battery.184
Mistaken accusation of Congressman having two SSNs. In accessing out-of-state computer databases, the New Hampshire Telegraph discovered that New Hampshire Congressman Bill Zeliff apparently had two Social Security Numbers, and ran with that (mis)information in a front-page story. The next day's headline suggested that maybe it was a database error, and quoted database purveyors who admitted that errors in databases were common. (In fact, the second SSN belonged to a four-year-old child.)185
DA's computer chief victim of clever sabotage. Ralph Minow ran a family-support computer system for the San Mateo County District Attorney in California. The system crashed in March 1996. His assistant Paul Schmidt wanted Minow's job, and apparently rigged the evidence to show that Minow deliberately caused the crash. Minow nearly lost his job because of the sabotage, but Schmidt made enough mistakes that he was detected and was fired in February 1997.186
April Fool's Day 1997. Although not in a class with the Chernenko and Spafford masquerades, 1997 brought a few new pieces of April Fool's e-mail to RISKS. John O'Connor reported that the French computer systems would be immune to the Year-2000 problem, because of the way the French count. In particular, the two-digit-equivalent representation of the year 1999 as quatre vingts dix neuf (that is, 4x20+10+9) will effortlessly roll over to the ensuing representation of the year 2000 as cinq vingts (that is, 5x20). Microsoft will adopt a similar strategy, with "Windows ninety-ten" being made available in the year 2002.187
Martin Minow discussed a proposal to lengthen the second by 0.00001312449483 to eliminate leap years, while Mark Fineman alternatively suggested slowing down the Earth's rotation so that a year is exactly 365 days. (Please don't take this as seriously as some of the ensuing discussants did.)188
[[[*** Merge in the entire former section 5.3.4 into the end of this section on April Fools' spoofs, formerly subsectionRisks of Minimizing Entropy***]]]
1998 April Fools' Items:
Funding for a new software paradigm, removing rarely used code producing routinely ignored diagnostics, to combat software bloat (R 19 64, with follow-up in 19 65-66)
Quantum computer cracks crypto keys quickly (R 19 64)
The Computer Anti-Defamation Law protecting computer system developers against criticism, with analogy to the Texas cattleman's suit against Oprah Winfrey and Nike's action against Doonesbury (R 19 64)
1999 APRIL FOOLERY
The Y9Z problem, Mark Thorson; 99 rolls over to 9A; 199Z (the year 2025) rolls over to 19A0; then 19ZZ can roll over to "2000" (R 20 26)
Y2K bug found in human brain (R 20 26)
Vatican announces all computer systems ready for new millennium; Roman numerals are the answer! (R 20 26)
Historical retrospective analysis of the Y10K problem, dated 1 Apr 9990 (R 20 26)
RFC2550 - Y10K and Beyond: marvelous RFC on solving the Y10K problem by Steve Glassman (R 20 27)
Linus Torvalds starts for-profit LinusSoft; open-source advocates SlashDot launch SlashDot Investor; Richard Stallman of the Free Software Foundation now Senior Vice President for Ideology (R 20 26)
Professor wants Y2K jokes banned on the Net (Edupage item, R 20 28)
Congress votes to move Daylight Savings cutover to Monday to avoid Easter confusion (R 20 28)
Tuxissa Virus creator (Anonymous Longhair) modifies Melissa to download and install Linux on infected Microsoft systems (R 20 29)
Running out of time on Y2K? Add a month to the calendar (Martin Minow)
Zurich loses citizens files on 31 Mar 1999 after Y2K upgrade test crash (R 20 29); was this real or April Foolish? Doesn't matter. The lesson is the same: keep backups (R 20 30)
Australian Securities & Investment Commission's April Foolery: Millennium Bug Insurance (R 20 37)
The (f)e-mail of the PCs is more deadly than the bail. The case involving Adelyn Lee and Oracle's CEO Larry Ellison resulted in Ms. Lee being found guilty of perjury and falsification of evidence. She had previously won a $100,000 settlement against Oracle, using as evidence an e-mail message ("I have terminated Adelyn per your request.") supposedly sent to Ellison by her former boss, Oracle VP Craig Ramsey. The prosecutor claimed that Lee had sent the message herself from Ramsey's account. She faces up to four years in jail. Subsequently, the judge ruled that she may not use any of that settlement money to pay her bail.189 This is another case involving the credibility of digital evidence in penetrable, tamperable, and spoofable environments.
E-mail scam from Global Communications. Another e-mail scam informs you that you are a would-be victim and have "only 24 hours to settle your outstanding account" and suggesting that you can call an 809 number to avoid subsequent court action. The call goes to a Caribbean telephone company (in Tortola in the British Virgin Islands) and costs you $3 to $5 (and presumably more if you are dumb enough to hang around for their strategy of putting you on hold with a sequence of creative recorded messages). The From: address "Global Communications"@demon.net is bogus. This is a cheaper variant on an 809-900 pager scam, which costs you $25 if you return the call.190
Bogus message on PGP. A message apparently from Fred Cohen was widely circulated, indicating that "PGP has been cracked." However, the message was forged, and was not from Fred. Worse yet, if you pursued the link given in the message, it pointed to the telnet or NNTP ports of a particular Internet Service Provider, which is regarded as an attempted breakin and generally treated quite harshly!191
German intruder forges White House messages. A message purportedly from Bill Clinton accused recipients of attempting to penetrate the White House computer security. The message had a forged From: line.192
Counterfeit Dartmouth graduation tickets. Corby Edward Page, 23, a Dartmouth alumnus, admitted creating bogus computer-generated tickets to the Dartmouth College graduation exercises in June 1994, and selling them for $15 each.193
Caltrans freeway offramp sign spoofed. The late Herb Caen observed that someone had altered an electronic Caltrans sign on route I-80 in Richmond, California, transforming "Off Ramp Closed" to "Boo OJ."194
London Underground display. One of the trainees of the London Underground hacked into the control system and posted an offensive message ("All signalmen are wankers") on the electronic displays at Piccadilly, Elephant and Castle, and Regent's Park stations, on August 16, 1995. The message remained for more than 12 hours, when it was observed and deleted by the tube staff. Surprisingly, it reappeared 13 days later -- because it had been saved, and was randomly selected.195
NYPD phone system cracked. Callers to New York Police Department headquarters for 12 hours ending 6 a.m. on April 16, 1996, heard a bogus recording that included the following: "You have reached the New York City Police Department. For any real emergencies, dial 119. Anyone else - we're a little busy right now eating some donuts and having coffee." It continued "You can just hold the line. We'll get back to you. We're a little slow, if you know what I mean. Thank you." The NYPD had no immediate comment, but unnamed police sources believe hackers broke access controls and changed the message.196
Another Cable-TV Sting Operation: Ireland. Cork Communications, a cable television supplier to 30,000 homes, has been beaten for two years by a black-market operation selling burgled black-box decoders. Cork broadcast a message that could be received only on illegally bypassed unscramblers, offering a free T-shirt from YFBT Promotions. This was followed by a blitz of warranted house raids. Incidentally, three letters of the acronym YFBT stand for Your, Box, and Tampered.197
McVeigh confession? The Dallas Morning News posted a news story alleging the existence of a report stating that Timothy McVeigh had admitted to his attorney that he (McVeigh) was responsible for the Oklahoma City bombing that killed 168 people at the Alfred P. Murrah Federal Building in April 1995. Jones at first denied the existence of the report, but later admitted that the report exists -- although he denied its representing a confession. Jones then charged that the News stole the document in January 1995 as a result of a computer break-in that enabled access to confidential defense materials.198 However, there were subsequent statements that the alleged report was bogus, and had been planted in an attempt to trap a witness.
FBI Medicare fraud sting backfires. FBI agents set up a sting by obtaining 35 legitimate Medicare cards and selling them to a suspected fraud operation. However, the FBI lost control of the cards, which could not be cancelled -- because they used real social insurance numbers rather than the usual practice of bogus identifiers created for the sting; cancelling those numbers would have wiped out legitimate benefits. The cards have been in use for 16 months, and have been used to buy and fence expensive medical equipment. For 10 of the cards investigated, the losses totalled $163,745 for services that the real card holders say they never used.199
FBI sting nabs man trying to sell credit-card data. Carlos Felipe Salgado Jr. ("Smak", 36, Daly City, CA) was arrested at San Francisco Airport on May 21, 1997, after he sold an encrypted diskette with personal data on more than 100,000 credit-card accounts to undercover FBI agents -- who paid him $260,000, checked out the validity of the data, and then nabbed him. He reportedly had obtained the information by hacking into various company databases on the Internet or by packet-sniffing an unidentified SanDiego-based ISP. He faced up to 15 years in prison and $500,000 in fines.200
Other cases from which identity thefts could arise include thefts of a Visa International database of data on 314,000 credit-card accounts,201 Caltrain's ticket-by-mail commuter database202 and a computer containing Levi Strauss' database of Social Security Numbers for 40,000 employees and retirees -- inlcuding bank-account information for the retirees.203
"IP spoofing" SYN flooding attacks. Public Access Networks Corporation (Panix) was inundated with a massive attack on its network, flooded with up to 150 bogus "electronic handshake" SYN requests per second. Network tables overflowed because the SYN transactions were intentionally never completed.204 A 200-message-per-second SYN-flood attack was launched against WebCom, a large WorldWideWeb service provider in San Francisco Bay Area. The denial of service affected more than 3000 Web sites for 40 hours, during most of what was otherwise a very busy shopping weekend. The attack began on December 14, 1996, shortly after midnight PST.205
Vandalism disrupts service at Stirling University. Early on April 18, 1994, a vandal exploiting a not-unknown security hole started disrupting services and corrupting files at Stirling University in the UK. Stirling University is a SuperJanet site, with a microwave link to Edinburgh. The entire site was affected. The site was basically unreachable by Internet services for over 24 hours; telnet and ftp services were seriously degraded for 3 to 5 days. E-mail service was unavailable for 2 to 3 days. Peter Ladkin estimated that at least 6 person-weeks of expert time were required to discover and repair the damage, and that the extent of the disruption was considerable.206
Denial-of-service attack. A student at Monmouth University in New Jersey was charged with disrupting the school's electronic mail system for five hours by bombarding two administrators with 24,000 e-mail messages. The student's computer access had been terminated on November 9, 1995, because of posting advertising and business-venture solicitations to "inappropriate sections of the Internet" (presumably, Usenet groups). It took 44 hours to trace the source of the attack through a service provider in Atlanta, Georgia, and back to an account based in Red Bank, New Jersey, shared by the student. The student is being charge with a federal crime because of using interstate communication to deny service. Carl Stern of the Justice Department is said to have remarked that this was the first time the federal computer-fraud act had been used for an act of this type.207
Websites infiltrated: DoJ, CIA, USAF, NASA. The U.S. Department of Justice Web site (http://www.usdoj.gov/) was spoofed on August 16 or 17, 1996, when crackers broke in and altered the main Web page to include swastikas, obscene pictures and criticism of the Communications Decency Act. For example, DoJ became the Department of Injustice. The site was shut down following the discovery.208
The CIA website (http://www.odci.gov/cia) was penetrated by a group of Swedish hackers, on the same day that the Justice Department reopened its home page - on September 18, 1996. On the next day, the CIA disconnected the altered home page, which included "Welcome to the Central Stupidity Agency" as well as valid links to Playboy and hacker netsites, and fictional links to "news from space" and "nude girls". Apparently, the Swedish intruders were protesting a Swedish court case against a group of youths who were caught breaking into computers in 1991. The CIA later restored its earlier web pages, which included spy-agency press releases, speeches, and other publically available data, including CIA's World Fact Book - all of course unclassified.209
The main U.S. Air Force Web page (http://www.af.mil) was hacked on December 29, 1996, at Fort Belvoir, Virginia. The bogus page included various antiGovernment slogans, plus a suggestive graphic of what the Government is doing to you.210
NASA's Website (http://www.nasa.gov) was similarly hacked on March 4, 1997.211
Other sites cracked as well. The National Collegiate Athletic Association (NCAA) Web site was altered, including various racial slurs. A 14-year-old high-school freshman was apparently implicated.212 The Lost World Web site was transformed into The Duck World: Jurassic Pond.213 The Los Angeles Police Department also had its Web site compromised. A Swedish meat packer's Web site was penetrated and replaced.214 Vulnerabilities in AT&T WorldNet were also noted.215
Plumber call-forwards his competitors' phones. Michael Lasch, a plumber in Levittown, NY, is accused of calling Bell Atlantic to order Ultra Call-Forwarding for at least five of his company's competitors, which enabled him remotely to redirect their calls to his phone. He was charged with theft by deception, criminal attempt, unlawful use of a computer, criminal trespass and impersonating an employee. The scam was detected when a customer complimented another firm on work performed over Christmas weekend - when in fact no one had been working!216
Are you flooded with Internet spams (unsolicited e-mail advertisements) from hustlers, scammers, and purveyors of smut, net sex, get-rich-quick-schemes, and massive lists of e-mail addresses? (The term derives from the World-War-II ubiquitous canned-meat product dramatized by Monty Python.) Some of us -- particularly moderators of major mailing lists -- typically receive dozens of spams each day, often with multiple copies. We tend to delete replicated items without reading them, even if the subject line is somewhat intriguing. (Many spammers use deceptive Subject: lines.) Unmoderated lists are particularly vulnerable to being spammed. Some spammers offer to remove you from their lists upon request. However, when you reply, you may discover that their From: and Reply-to: addresses are bogus and their provided "sales" phone number may be valid only for a few days. Some of them are legitimate, but others may be attempting credit or identity fraud; it can be hard to tell the difference.
E-mail spamming spoofs. In a massive attack, many sites were flooded with bogus subscriptions to various newsgroups. The RISKS BITNET server additions included vice-president@WhiteHouse.gov and Georgia6@HR.House.gov (GA06.Gingrich). At least some of the hacks came through Netcom. Unfortunately, e-mail spoofing is still vastly too easy.218
A spammer named Craig Nowak created a masquerading spam that used the legitimate From: address of Tracey LeQuey Parker. When Tracey received 5,000 bounces, she -- together with the Electronic Frontier Foundation and the Texas ISP Association -- sued Nowak.219
Elisabeth Arnold, an employee of a New Jersey Internet service provider, attempted to block a particular spammer by cutting off his ISP account. In retaliation, the spammer created two masquerading revenge spams using her From: address and giving the company 1-800 telephone number. One spam included a 200K file of "animal sounds" and the other announcing that the recipients were now on a list of future spamees, and would receive many more such messages. As a result of this "flame bait," her e-mail address was flooded with bounces and hate mail, her http ports were "SYN attacked" and "ping stormed" (see Section ) and the 800 number was so swamped with protests that it had to be disconnected.220
E-mail spam differs somewhat from postal mail. You must pay (one way or another) for the storage of e-mail you receive (or else delete it as fast as it comes in!), whereas the sender pays for postal junk mail. The spam sender pays almost nothing to transmit, especially when hacking into an unsuspecting third-party server site (which is increasingly common). Simson Garfinkel's vineyard.net was hacked into by a spammer who managed to send about 66,000 messages.221
Defensive moves. What might you do to stanch the flow? Some folks suggest not posting to newsgroups or mailing lists -- from which spammers often cull addresses, but this throws out the baby with the bathwater. Other folks suggest using the spammer's trick of a bogus From: address, letting your recipients know how to generate your real address. But this causes grief for everybody (recipients, administrators, and even you if the mail is undeliverable), and is a bad idea.
Filtering out messages from specific domains may have some success at the IP level (e.g., via firewalls and TCP-wrappers) against centralized spammers who operate their own domains and servers. But filtering based on header lines is generally not effective, because the headers are subject to forgery and alterations. Also, many spammers route their junk through large ISPs, or illicitly through unwitting hosts. Complaining to those site administrators is of little value. Filtering out messages based on offensive keywords is also tricky, because it may reject e-mail that you really want. However, various filters are being developed in response to massive spamming.
In filtering efforts, AOL created a mechanism for blocking up to 53 particular domains. CompuServe blocked certain types of multiple-address mailings. Cyber Promotions was accused of spamming, had a restraining order issued against them by Earthlink, and settled for $65,000. Someone retaliated by blasting Cyber Promotions with a 20-hour retaliatory spam.222
Technical options are of limited value in the real world, tending toward an offensive-defensive escalation of technical trickery. Although servers such as majordomo can be used to invoke manual processing of suspicious would-be subscriptions, particularly when the From: address and the given address differ, forged From: addresses can still defeat simple strategies. One possible way to reduce spamming activity would be to impose a level of authentication -- for example, whereby a sender must first acquire an authorized certificate to send you e-mail. This would be impractical and undesirable for many individuals. It would certainly hinder newsgroups that seek worldwide contributions and subscriptions, and could make it much harder to use e-mail in general. However, in certain cases, it might be justified.
Alternatively, legislation might be contemplated, for example, to require an individual's permission for the release of certain personal information to third parties, and to treat unsolicited e-mail more like unsolicited junk faxes. On the other hand, there is a serious risk of legislative overreaction with draconian laws that might kill the proverbial golden goose.
Many such problems exist because the Internet has cooperative decentralized control; but that's also its beauty. It has very limited intrinsic security (although improving), and relies heavily on its constituent systems. In the absence of meaningful authentication and authorization, clever perpetrators are not easy to identify or hold accountable. But swinging too far toward forced authentication impacts privacy and freedom-of-speech issues. What a tangled Web we weave!
Asking what you can do individually may be the wrong question; the technical burden must ultimately fall on ISPs and software developers, as they continue to pursue approaches such as blocking third-party use of SMTP mail-server ports and requiring authentication for mass mailings. As should be quite evident from the examples in this book, fully automated mechanisms are always likely to have deficiencies, and security is always a weak-link problem.
On the legislative front, the State of Maryland attempted to outlaw e-mail that is annoying or embarrassing, and Nevada contemplated outlawing unsolicited junk mail. Various legislation has been proposed in the U.S. Congress.223
Spamming will ultimately be dealt with through a combination of legislation, ISP administrative changes, further technological developments, and individual efforts. We must find ways to protect ourselves without undermining free enterprise, freedom of speech rights, and common sense, and without encumbering our own normal use -- a difficult task indeed! In the meantime, perhaps the best you can do yourself is to never, ever, respond positively to a spammer's ad!
Many of the reliability cases noted in Chapter can be characterized as unintentional denials of service - including communications outages, air-traffic slowdowns and airport closures, rail and ship delays, massive power outages, and financial system outages. In some of those cases, the problem was triggered as a result of the failure or absence of a security control. There are even a few quite credible rumors that activities of penetrators or malicious insiders were involved in some of the extensive outages noted in Chapter , despite official attribution to other causes. See Section .
Another Netscape attack. The Berkeley "Cypherpunks" discovered a denial of service flaw in Netscape's Internet software, due to a missing bounds check. Overly long numbers would cause the Navigator browser software to crash when browsing a file that had been Trojan-horsed.224
Social Security employees sold 11,000 SSNs. Several Social Security Administration employees sold to a credit-card fraud ring personal information (such as Social Security Numbers and mothers' maiden names) on more than 11,000 people, enabling credit cards stolen in the mail to be activated and used.226
Massive cell-phone identifier interception. Two people in Brooklyn NY (Abraham Romy and Irina Bashkavich) were charged with stealing over 80,000 cellular phone numbers, along with corresponding identifying serial numbers and personal identification numbers, using a scanner (digital data interceptor) from their 14th-floor windowsill above the Belt Parkway in Brooklyn. Police seized two handguns, six computers, 43 cellular phones, and the scanner. Cellular-phone fraud reportedly amounts to losses of $1.5 million per day.227
TILT! Counterfeit pachinko cards send $588M down the chute. Two Japanese firms lost about 55 billion yen when criminals counterfeited the stored-money cards used to play pachinko. Interestingly, the cards had been promoted by police as a means to track the flow of cash and stop money laundering. The convenience of the new cards initially boosted profits because it was so much easier to play with the cards that automatically kept track of your money. But scam artists quickly figured out how to beat the cards, used throughout Japan's 18,244 pachinko parlors.228
UK ATM Scam. A stolen automatic teller machine helped crooks establish a bogus finance store in London's Bethnal Green district, which enabled them to capture legitimate IDs and PINs, and resulted in the theft of £250,000 ($374,400).229
Embezzlement at Beijing Hotel. China jailed four managers of the Beijing Friendship Hotel for cheating guests out of $9,000 by manipulating computerized billing records. The prison terms were 7, 7, 3, and 1 years. Apparently computer fraud is on the increase in China.230
Software pirate nabbed in Los Angeles. A big-time software pirate was arrested in Los Angeles last week and charged with two felony counts of fraud and trademark violations. Authorities seized an estimated $1 million in illegally copied software, high-speed duplicating equipment and $15,000 in cash. Thomas Nick Alefantes, who calls himself "Captain Blood," allegedly sold his wares through advertising in trade publications and a mail order business.231
Massive NY City tax fraud. New York City workers, in exchange for bribes from property owners, falsified computer records to eliminate nearly $13 million in unpaid taxes in a scheme called the largest tax fraud case in New York City history. The author makes the following key points: Some tax records were erased. Other records were falsely marked as paid using funds from legitimate payments by innocent victims. So far, 29 people have been charged in federal court. 200 more are expected to be charged. $13M of debts have been erased. $7M in interest was lost. The fraud is thought to have started in 1992. The investigation started in 1994. In a section particularly intriguing for RISKS participants, the author writes, "Three employees of the city collector's offices exploited computer "glitches" to make it appear that unpaid taxes had been paid, officials said.232
Risk management system too late to prevent Barings' collapse. Despite all sorts of computerized controls on investments and trading, Nick Leeson was able to cause the collapse of Barings, the United Kingdom's oldest investment house. Barings bellied up in February 1995, when Leeson lost more than £750 million of clients' funds on the Singapore derivatives exchange. A Cash Risk Management System was supposed to flag cash positions, but failed to do so whenever settlements were not processed according to the bank's procedures. Ironically, Barings had already installed a new risk management system (BORIS, Barings Order Routing & Information System) in London, Tokyo, and New York, beginning in January 1994, and the Far East offices were next in line. It is explected that the new system might have prevented or at least detected Leeson's activities.233
ATM Fraud in Israel - The Polish Gang. A judge in Tel Aviv has ordered the remand in custody of two additional suspects in a major ATM fraud case, who will join five businessmen from Poland. The gang are suspected of having prepared thousands of counterfeit ATM cards. The police claim they had purchased tens of thousands blank plastic cards in Greece, on which they recorded the magnetic stripe and on each there was a sticker with the PIN. A Israeli computer expert, Daniel Cohen of Ramat Gan, also in custody, obtained the codes and manufactured the cards. The Polish businessmen financed the operation, and planned to bring foreign workers from Poland to use the cards to withdraw money from ATMs. The police have photographs of suspects standing next to ATMs holding quantities of forged cards. They had used them to withdraw 1,500 Israeli Sheqels (500 US Dollars) each, to a total of IS 600,000 (US$200,000).234
Czech hackers allegedly robbed banks electronically. Hackers stole 50 million Kc ($1.9 million) during attacks upon unnamed Czech banks and, in another incident, obtained and posted to bulletin boards a file of Czech citizens' personal information, according to an interview at INVEX (Brno, 22-26 Oct 1996) with Jiri Mrnustik, CEO of the Brno-based anti-virus and encryption software developer AEC s.r.o.235
Ghost account nets $169K embezzlement (R 19 26)
While working as a civilian military pay supervisor in the Army finance and accounting office at Fort Myer from 1994 to 1997, Teasa Hutchins Jr. caused regular military paychecks to be deposited to a bank account in the name of a bogus officer, and accumulated $169,000 for himself. He has pleaded guilty and faces up to 10 years in prison and a $250,000 fine. [Source: An item in The Washington Post, summer 1997.]
24 more California DMV clerks fired in fraudulent license scheme (R 19 27)
The California DMV has fired 24 more clerks who accepted bribes to issue driver's licenses fraudulently. This brings the total to 79 in the current statewide probe, Operation Clean Sweep. The going rate was $200 to $1000 a pop for not checking the applicant's identity, typically paid by illegal aliens, felons needing new identities, and drivers with revoked licenses. [Source: San Francisco Chronicle, 1 Aug 1997, A25]
Largest computer error in US banking history: US$763.9 billion When Jeff Ferrera and Cindy Broadwater checked their checking balance at the First National Bank of Chicago, the automated voice gave it as $924,844,208.32. More than 800 other folks had similar stories to tell. The sum total for all accounts was $763.9 billion, more than six times the total assets of First Chicago NBD Corp. The problem was attributed to a "computer glitch".236 Louis Koziarz noted that the `glitch' was apparently the result of a programming change intended to support the new out-of-area ATM fees being proposed by various banking groups. When the new transaction messages were introduced to the network, some systems took the strange new codes and transformed them into something they could understand: a posting of a huge credit to one's account.
Add to Replicated transactions already in bsec:
Civilian employees of the U.S. Army in Germany received double salaries in November 1994. The payroll tape is usually run on the 24th of a month, but because of the Thanksgiving holiday on the 24th, someone ran the payroll tape on the 23rd. Someone else also ran the tape on the 25th, knowing that the 24th was a holiday. This was detected when a $3.5 million shortfall was discovered.237
Due to a computer mistake of Banque de France, civil servants and other people working for the French government were paid twice for the month of November 1995. Apparently some others were paid not at all.238
*** Most of the cases already in the book are described discursively, rather than via unindented titles. BE CONSISTENT.
Computer disk crash causes misprinted ballots. The Hawaii Republican Party was up in arms on September 13, 1994, when it was discovered that a "hard disk crash" caused a number of absentee ballots on Hawaii's island of Maui to inadvertently omit two candidates running for state legislature. The crash reportedly occurred at a California company which had been contracted to print the ballots, and means that over 140 Maui residents were asked to re-cast their ballots. This printing error was discovered when people couldn't find their friend's name on the ballot.239
Tampering blamed for rebuffed candidacy in Peru. Susana Higuchi, deposed wife of Peru's President Alberto Fujimori, had her presidential hopes quashed by Peru's electoral board because of a shortage of valid signatures. Higuchi claims to have submitted about 130,000 signatures, while the board claims only 11,851 were valid, short of the required 100,000. Higuchi claimed some 150,000 signatures were erased from her party's computers during a blackout that affected only the block in which her offices were located. She blamed her husband's cronies for high-tech fraud. (How do you spell "backup?")240
1995 San Francisco elections. The 1995 San Francisco election process caused a lot of unhappiness. First, thousands of absentee ballot pamphlets were in error (pamphlets are printed on a per-district basis, with differing orders of candidates), which could have resulted in miscast ballots. (Replacements were subsequently mailed out.) Second, on election night, November 7, 1995, the computer systems kept malfunctioning, seriously delaying the results. It took quite a while to discover that the new energy-efficient screens were incompatible with the building. When the air conditioning was on, power surges caused the computers to crash. Of course, everything worked fine during the earlier testing (in the absence of air conditioning).241
Punch-card ballots overturn primary election result. Punch-card hanging chaff was responsible for uncertainty about the results of the Democratic Primary for the 10th Congressional District of Massachusetts, on September 17, 1996. The official count two days later indicated Philip Johnston had defeated William Delahunt by 266 votes out of 49,371 ballots cast. Delahunt called for a recount, citing some 1000 punch-card ballots that were counted as blanks by the mechanical vote counter. On October 2, the results of the recount were announced that showed Johnston the winner by 175 ballots. During the recount, the questioned ballots were examined by an election official. (The legal standard requires that the intent of the voter should govern the vote count. Thus if a ballot is not punched through, but is indented, then a vote should be counted.) Delahunt took the dispute to state court. A state court judge examined 956 ballots in chambers and ruled that only about 50 were actually blank. On October 4, the judge declared Delahunt the winner by 108 votes - which was then upheld by the Massachusetts Supreme Court on October 8.242
Cat-a-login. "It's not clear whether Morris Feline Stuart is a Democat, a Fat Cat or a Republicat -- or even a fan of Ross Purr-O -- but Morris is now a registered voter in Cuyahoga County." Normalee Stuart of Shaker Heights, Ohio, says that she registered her cat for the 1994 election to prove there are few, if any, safeguards against voter fraud in the county. She got the idea when a a neighbor had told her of receiving a voter registration card addressed to a woman who had been dead for 12 years. Ohio law does not require people - or cats -- to identify themselves when registering to vote. In fact, state law doesn't require identification from people when they vote, and it allows mail-in registration.243
Alberta vote-by-phone fiasco. A system designed to enable citizens of Alberta, Canada, to vote by telephone seems to have run into serious difficulties. The voting had to be halted for 40 minutes due to computer problems. Furthermore, some voters did not receive their required PIN numbers, and thus could not vote. Some voters were told their PINs had already been used. Some claimed to have needed over an hour to complete their votes. Some were falsely informed that their votes had not been tabulated. The first and second ballots had inconsistent touchtone standards.244
Further election problems. Pennsylvania's Montgomery County used the MicroVote voting machines in the election of November 7, 1995, in which there were reports of extensive breakdowns, delays, long waiting lines, phantom vote tallies, erroneous incomplete tallies, and so on. However, the final results went unchallenged. It was evidently a Murphian field-day, with printers, copiers, elevators, even radios used by the repair crews failing.245 Similar confusion was reported in a Cape Town election at the end of May 1996, in which 2,000 votes were given to the wrong party.246
*** Insert after last sentence of Section 5.8.1 -- voting official said that "they only deal with the totals ..."
Richard Foster, jailed for driving with a suspended licence, was set free from South Carolina's Richland County jail - based on a fax with an "official-looking sheriff's letterhead". The fax stated that Georgia's Augusta-Richmond County Sheriff's Office had no interest in Foster. (Actually, at that time he was wanted on assault and weapons charges.) The fax had been sent from a public fax machine at a Kroger grocery store in Augusta GA, and had the Kroger name and phone number on the fax. As a result of this spoof, the jail supervisor has been demoted from captain to sergeant. [Source: San Francisco Chronicle, 23 Jul 1997, A2] [Similar cases are recorded in Florida (RISKS-18.94) and Tucson AZ (RISKS-12.70).]
Another computer-miscontrolled jail (Scot Wilcoxon) (R 19 44)
The Minneapolis Star Tribune reported on 27 October 1997 on the likely reasons behind the escape of a prisoner from the Carver County jail on 2 Oct. When a guard pressed buttons to let another guard through a door, he also bumped the button for an external emergency exit. The external door became unlocked, and air pressure popped it open. Several prisoners chose to stay in the room, and one escaped for a day. Opening that external door was supposed to require pressing a "door open" button, two "interlock open" buttons and then the button for the specific door. Somehow that door did unlock when its door button was bumped while an internal door that requires only pressing two buttons was being opened. Authorities were later able to open the door that way several more times.
An internal investigation has not been completed, but three explanations
1. Reprogramming of operational software controlling internal doors may have inadvertently changed functions affecting the door.
2. Lightning struck the jail this past summer, which resulted in a power failure and a computer-system crash. Some of the software may have been damaged when the system was rebooted.
3. All the functions were tested when the system was installed over two years ago, but tests were not made to see if the door could be opened by hitting other buttons.
Doors are also serviced after they've been opened 5,000 times, which makes it easier to detect if one isn't working. But this external emergency door has only been opened five times, with a key, for maintenance.
Philadelphia jail keeps 100 despite case dispositions. As many as 100 citizens were kept locked in Philadelphia's overcrowded jails for weeks or even months after judges had ordered them freed. Sentences, paroles, work releases, and drug rehabilitation orders remained unrecorded for several weeks, mostly from a single courtroom (Rocket Docket) specializing in disposing of something like 150 cases each day. Apparently, the paperwork was generally OK. However, after a lot of finger pointing, it seems that the paper orders were not being entered fast enough into the new computerized system, and that that system was the driving force for putting orders into effect.248
Survey responder harassed. An Ohio grandmother who had completed a Metromail survey received a "sexually graphic and threatening letter" from a Texas prisoner convicted of rape who had been hired to enter her survey results. She sued Metromail and R.R. Donnelley and Sons249
Data entry omission extends prisoner's sentence. John O'Valle was convicted in 1987 on charges of cocaine and weapons possession, and was expected to serve 20 years in prison. However, the sentencing judge reduced the sentence on technical grounds, making him eligible for release in 1992. This outcome was noted in O'Valle's written file, but not in the Department of Corrections' computer records. (Neither O'Valle nor his lawyer was notified of the change.) In January 1996, prison officials reviewed his records and discovered the discrepancy. However, there is now a new problem. In 1995, O'Valle was convicted for possession of marijuana while in prison, a felony. Had he not been in prison at the time, he would have been guilty of a misdemeanor and not subject to jail time. So, O'Valle is serving time for a felony, which might not have happened had he been released on time.250
Baltimore police computer problems. Baltimore new "central booking" facility was supposed to process people who have been arrested. Unfortunately, there were many problems, such as people mistakenly being held for many days for minor charges, bail being posted but lost, prisoners being lost, and so on.251
New Pittsburgh jail. The new jail in Pittsburgh took 2.5 years and $147 million to build, and opened in May 1995. But, there are apparently many problems with the new facility, including these: (1) Dozens of computer terminals that are unusable because, while the data jacks were connected and wired, nobody bothered to put electrical outlets in. (2) A computer system to track inmate information is still off-line for two reasons. One, the software is from a Canadian company and is not formatted to the American justice system (whatever that means - AT). Two, nobody has been trained on how to use the system. (3) Guards carry an electronic personal alarm. These alarms are supposed to send out signals when there is a security problem, but are prone to false alarms. In another incident with these alarms, a female guard had to work an entire shift last week without an alarm because her battery went dead and there were no spares. (4) There was another electrical malfunction which left jail employees unable to unlock the doors to three pods, leaving one guard isolated with 56 inmates in each pod. (According to a TV report, the malfunction not only locked the guards in, but the cells were left unlocked!) The malfunction lasted about two hours and knocked out the air circulation system on half of the second floor. (5) The ventilation system occasionally shuts off for no apparent reason. (6) The fire alarms go off at all hours for no apparent reason. (I guess that means there's a faulty switch somewhere, but they haven't been able to figure out how to find it.) (7) The employee elevator in the high-rise jail only works sporadically. (8) In an emergency, guards could use the pod phones to dial 911. But it wouldn't do them any good. The outside lines to each pod have been disconnected. In fact, jail officials mistakenly had the phone company block all but a few phones from being able to place or receive outside calls."252
Stolen computers from the United Nations. "U.N. officials said four computers containing most of the data on human rights violations in Croatia were stolen in New York. Officials said the theft was a `very heavy blow' to efforts to prosecute war crimes."253
UK cabinet secrets on National ID Card Found in Surplus Store. Plans for the UK national ID card system were found in a second-hand file cabinet sold at a government surplus store for about £35 -- including memos on investigation into the feasibility of smart-card technology, a detailed card design, as well as cabinet-level letters exchanged on this topic.254
Naval commanders managed to leave classified disks on London train, after stopping off at a pub on the way back to their unit. In this case, the disks were found and returned - but who knows if they might have been copied!255
Sex, lies, and backup disks. Democrats were given a computer disk with evidence to be provided by the Republican's star witness, Jean Lewis, who had made assertions about some Clinton Administration activities. What Ms. Lewis did not realize was that the disk still contained the text of a letter she had written that called Bill Clinton a "lying bastard" and that she had deleted - which a lawyer for the Democrats introduced as evidence that her motivation was a political vendetta.256
Cookies. The browser notion of "cookies" can present a residue problem, whereby software running on your system can squirrel away information about your browsing habits, and then retrieve that information on subsequent use, and in some cases transmit that information to a Web server.257
Laptop evidence in plan to kill 4,000. U.S. prosecutors accused Islamic militant Ramzi Ahmed Yousef (a.k.a. Abdul-Basit Balochi) and two others of plotting to bomb 12 U.S. jet planes in two days during 1995. Some of the evidence is based on a file found in Yousef's laptop computer, stating that the purpose of the bombings was "vengeance and retribution" against the United States for its financial, political and military support of Israel. (Yousef was later tried for masterminding the 1993 World Trade Center bombing that killed six and injured more than 1,000 people. He was also accused of placing a bomb on a Philippine Airlines flight from Manila to Tokyo on 11 Dec 1994, which killed one passenger and injured 10 others.)258
*** ONE-LINER SUMMARIES from RISKS (some to be included):
· Randal Schwartz convicted after finding security flaws in Intel (R 17 23,28)
· Sony satellite dishes remotely reprogrammable? (R 17 33)
· Microsoft Network e-mail binaries can contain executables (R 17 31,32,33)
· Emergency call-boxes ripped off, cell-phone serial nos. reused (R 17 35)
· German telephone card system cracked, many free calls made (R 17 36)
· British Telecom replaces payphone software after flaw exploited (R 17 36)
· Cardiff software shipped Teleforms 4.0 with self-destruct timebomb (R 17 36)
· Computer Systems Policy Project estimates $60 billion market-share loss in year 2000 resulting from current U.S. export controls on crypto products (R 17 61)
· More on Windows security bugs (R 17 62)
· Reporting and misreporting on successful Internet penetration of Navy battleship in exercise by U.S. Air Force (R 17 56-58)
· Russian Citibank cracker pleads guilty (R 17 61)
· Sony TV remote control turns Apple Performa 6300 on/off (R 17 95)
· Discussion on Technology Deterioration resulting from cutting corners in development [[Lauren Weinstein] (R 17 94,96)
· Security hole in SSH 1.2.0 permits remote masquerading (R 17 66,68)
· Racist cracker trashes BerkshireNet (R 17 83)
· Tower Record credit-card info scam (R 18 02)
· Risks of credit-card numbers being sniffed (R 17 69,71,76)
· Nov 1995 report on minimal key lengths for symmetric ciphers (R 17 69)
· ITAR to allow personal-use export ("Matt Blaze exemption") (R 17 75)
· Risks of being indexed by search engines (R 18 15)
· Risks in altered live video images: L-vis Lives in Virtual TV (R 18 18-21)
· National Research Council crypto study report available from National Academy Press; see http://www2.nas.edu/cstbweb for summary (R 18 14,17)
· UK libel writ served overseas by e-mail (R 18 09)
· Discussion of New Orleans police chief murdering accuser despite wiretap (Shabbir Safdar, R 18 01)
· Intruder hacks into Cambridge University systems (R 18 09-10)
· Thieves ransack 55 government computers in Australia (R 18 14)
· St. Louis teenager Christopher Schanot arrested for computer fraud (R 18 01)
· Two convicted: 1,700 Tower Record credit-card numbers offloaded (R 18 02)
· Australian insurance company builds household database from electoral rolls (R 18 02)
· Software piracy considered enormous, Hong Kong, worldwide (R 18 12-13)
· Cyber-terrorists blackmail banks and financial institutions (article with considerable hype) (R 18 17,24)
· Information Security: Computer Attacks at Department of Defense Pose Increasing Risks, GAO/AIMD-96-84, in Senate hearings (R 18 15)
· South Korea clamps down on Canadian home page on North Korea (R 18 21)
· French police raid leading Internet service providers (R 18 21)
· Laptop with unspecified data stolen from London police car (R 18 24)
· U.S. intelligence reportedly hacked into European systems (R 18 30)
· Princeton team finds Java security bugs in Microsoft Internet Explorer 3.0beta3 and Netscape Navigator 3.0beta5 (R 18 32); More on Java security (Mueller, R 18 50)
· Flaws in and Microsoft's warning on Internet Explorer 3.0 (R 18 36,38)
· Microsoft again distributes a Word Macro Virus: WAZZU.A (R 18 53)
· Discussion on the strength of 56-bit crypto keys (R 18 26,27)
· More risks of core dumps (R 18 42,43,44)
· Risks of VeriSign digital certificates - legalese (R 18 47)
· AIDS database compromised in Pinellas County, FL (R 18 48,53)
· Florida nuclear controls "vandalized"? Switches glued (R 18 35)
· DMV security code breached at hospital in New Haven (R 18 28)
· Actress' breakdown triggered by computer virus' lost files (R 18 46)
· Rhode Island "disgruntled employee" arrested for "e-mail virus" (R 18 50)
· Stolen computer contains ophthalmology certification exam (R 18 53)
· "Key Recovery" replaces "Key Escrow" in U.S. encryption plan (R 18 50,54)
· Anticracking bill S982 passes senate; between $2 and $4 billion in losses in 1995 reported (R 18 48)
· Plot to tap British bank/credit card information by higher-tech gang revealed by coerced software expert in jail (R 18 70)
· Palisades Park NJ school employs 16-yr-old to break into locked-up computer system - need for key recovery mechanisms? (R 18 70,71)
· Nasty scam exploiting Y2K authorization expirations (R 18 68)
· Intel LANDesk Manager reaches directly into networked workstations (R 18 59)
· Microsoft Java/COM integration support does automatic upgrades (R 18 64)
· China strengthens control over "cultural rubbish" on the Internet (R 18 73)
· Danish government puts its own records on the Web, illegally (R 18 63)
· Irish rock band U2 unreleased songs pirated from demo video, distributed on the Internet (R 18 62,63)
· Making good ActiveX controls do bad things (R 18 61); more risks (R 18 62)
· Good Java security vs good network security (R 18 61)
· NT passwords bypassable by overwriting hashed password (R 18 62)
· Cryptography Policy and the Information Economy, Matt Blaze (R 19 71)
· Security flaw in NCSA httpd phf (R 18 69,70); CERN httpd (R 18 71)
· Risks of CT fingerprinting system to catch welfare recipients (R 18 69)
· Justice Dept wants to scrutinize parolee computer use (R 18 70)
· · U.S. program export controls ruled unconstitutional by Northern California federal judge, Marilyn Hall Patel (R 18 69)
· Emeryville Ontario cyberstalker (Sommy) (R 19 08) turns out to be family's son (R 19 10,11)
· David Salas, former subcontractor on a Calif. Dept of Info.Tech., was arrested on 3 felony charges for "allegedly trying to destroy" the Sacramento computer system (R 18 75,76)
· 3 Croatian teenagers cracked Pentagon Internet systems. Classified files allegedly stolen (?). Zagreb Daily suggests damaging programs could cost up to $.5M (?) (R 18 84)
· Dutch electronic-banking direct-debit scandal: Friesian church minister discovers surprise privileges (R 18 81)
· Microsoft Network (MSN) users risk credit theft from fraudulent e-mail (R 19 08)
· Satellite monitoring of car movements proposed in Sweden (R 18 81)
· Swedish narcotics police demand telephone card database (R 19 07)
· Another risk of reusable passwords: sharing them to avoid Web fees (R 18 85)
· Dan Farmer's security survey [2 Jan 1997] catalogs attacks on government sites, banks, credit unions, etc. See http://www.infowar.com. (R 18 74)
· AOL4FREE.COM virus report started out as yet another hoax, but such a virus was created within 24 hours (R 19 11)
· More on risks in Netscape browsing histories (R 18 79)
· More on Java security (R 18 77,79,87); Another Java security flaw (R 19 11)
· Security problems in ActiveX, Internet Explorer, Authenticode (R 18 80-86,88-89); in particular, see detailed comments from Bob Atkinson (R 18 85) and subsequent responses (R 18 86-89); Paul Greene at Worcester Poly finds IE flaw (R 18 85); EliaShim notes two more IE flaws (R 18 88); Another ActiveX flaw (R 19 06,09)
· More on NT security (R 18 82,84,86-88); Another Windows NT security flaw (R 19 02)
· Chaos Computer Club demonstrates ActiveX/Quicken flaw on TV (R 18 80,81)
· More on Microsoft WORD macro security problems (R 18 70-72,75-77,79-89) Sf Bug in Microsoft Word 6.0, 6.0a releases unintended info (S 20 1, 19)
· Myths about digital signatures discussed by Ed Felten (R 18 83,84)
· Maryland attempting to outlaw `annoying' and `embarrassing' e-mail (R 18 81)
· Nevada contemplating outlawing unsolicited junk e-mail (R 18 87)
· Vineyard.NET spammed by VC Communications, sending 66,000 messages (R 18 79)
· More risks relating to spamming and spam blockers (R 19 02,05,10,13); legal implications (R 19 10)
· Phone calls to Moldova result from porn scam (R 18 80,83,84,87)
· More spamming: Newmediagroup anti-spam measures draw retaliation (R 19 16,17,21); Anti-spam bills in U.S. Congress and Senate (R 19 18,21);
· Spam filtering (R 19 24)
· Oregon DMV lost $15K photo licensing equipment (R 19 16)
· Swedish teen-aged hacker fined for U.S. telephone phreaking etc. (R 19 13)
· Swedish meat packer Website penetrated and replaced (R 19 14)
· WorldNet security flaw (R 19 19, correction in R 19 20)
· Internet Explorer runs arbitrary code: MIME type overridden (R 19 14)
· More on Web browser risks (R 19 18)
· Netscape flaw allows reading of entire hard drive (R 19 22,23)
· ftp://agn-www.informatik.uni-hamburg.de/pub/texts/macro/ and ftp.informatik.uni-hamburg.de/pub/virus/macro/macrolst.* Macro virus lists from Klaus Brunnstein (R 19 24)
· 17 in Asian syndicate indicted for May 1995 theft of $10M in Pentium chips (R 19 21)
· 2,300 credit-card numbers stolen from ESPN Sportszone, NBA.com (R 19 24)
· Armed theft of $800K in chips thwarted (R 19 23)
· Calif. PG&E power substation attacked, linked to McVeigh verdict (R 19 21)
· Lost World Website hacked into Duck World: Jurassic Pond (R 19 20,21)
· UK's MI5 phone recruitment hotline spoofed by KGB impersonator (R 19 20)
· Database misuse by 11 prison guards in Brooklyn (leaking names of informants to prisoners, warning about searches, etc.) (R 19 20)
· Texas driver database on the Internet (R 19 22)
· Kansas sex-offender database full of incorrect entries (R 19 14); Also true in California DB: 2/3 of entires incorrect (R 19 24)
· Spreadsheet Research documents enormous operational error rates (R 19 24)
· Draconian controls on Chinese Internet usage (R 19 23)
· Cracker exploits flaw in MS Internet server software (R 19 23)
· RSA's DES challenge broken after 4 months (http://www.rsa.com) (R 19 23)
· Sun exploits loophole in crypto ban for SunScreen SKIP E+ (R 19 17)
· MD5 weakness and possible consequences (R 19 14,16,24)
For more recent one-liners, consult the Illustrative Risks file.
Privacy is a strange concept; you often do not realize you had it until after you have lost it. (PGN or someone else?)
Ten years later, it was the New York Yankees' turn for the World Series champion's ticker-tape parade. However, by then there was no longer any stock-market ticker tape. Instead, any available paper was dumped - including confidential records from the New York City Housing Authority and the Department of Social Services.259
Secret Service pagers compromised. At the Hackers on Planet Earth (H.O.P.E.) conference in 1994, there was a demonstration that intercepted pager messages. Three years later, the Secret Service was still using unencrypted pagers. On September 19, 1997, a transcript of a day's worth pager messages for President Clinton and his staff appeared on the Internet.260
Newt Gingrich's teleconference compromised by cell phone. Newt Gingrich was overheard in a telephone conference call to other House bigwigs on December 21, 1996, plotting strategy on how to deal with his ethics problems and possible attacks from opponents. This despite his promise, made the same day to the ethics subcommittee by his lawyer, that he would not use his office or his allies to orchestrate a counter-attack to the charges. One party to the call had been on a cellular phone, which had been intercepted and recorded using a Florida couple's scanner.261
Bad "prank" resulting from privacy violation in Florida hospital database. The 13-year-old daughter of a hospital records clerk in Jacksonville, Florida, used her mother's computer during an office visit and printed out names and numbers of patients previously treated in the hospital's emergency room. According to police, she then telephoned seven people and falsely told them that they were infected by the HIV virus. One person attempted suicide after the call. Upon arrest, the girl told police the calls were just a prank.262
Prodigy held liable for contents because they exert editorial control. A New York state trial court ruled on May 24, 1995, that Prodigy is responsible for the libelous statements of its users because it exercises editorial control over their posts. In the case, an anonymous Prodigy user made statements against New York Investment firm Stratton Oakmont accusing it of criminal and fraudulent acts. Stratton Oakmont sued Prodigy and the volunteer moderator of the forum where the statements were published. The Court found that Prodigy was acting as a publisher and therefore was responsible for the content of the posts. The Court distinguished the case from the earlier Cubby v. Compuserve decision, which found that Compuserve was subject to the standards of a bookstore or library. It that case, the US District court ruled that Compuserve had no editorial control over the text. 263
Suit over computer use. The University of Wisconsin-Madison faced a sexual harassment lawsuit claiming that a former medical professor used campus computers to copy hundreds of pornographic pictures from the Internet. (Another employee is suing the professor because he propositioned her.)264
Killers sue over phone taps. Eight convicted murderers are suing Nynex and Massachusetts (on behalf of 10,000 state prisoners) to prevent the state from monitoring inmates' telephone calls.265
*** in Section 6.4.3, add to the para on Risks of automated doc aids: A Reuters automated news service for human sexuality included an item on Canadian international rugby star Karl Svoboda, because he plays the position of hooker.267
Mispeling Corecters and Spelling Miscorrectors [add to p. 192, before the para with Laserjet]
The RISKS archives are full of horror stories relating to the use of spelling correctors whose overzealous changes have caused amusing, or in some cases monstrous, errors. A related problem concerns global changes in which a context editor also makes extraneous alterations to the text; trying to be exceedingly careful in specifying global changes is unlikely to succeed all the time. The only sensible strategy when using spelling checkers and context editing is to supervise each potential change manually, and to remember that many semantic errors are difficult to catch mechanically - such as homonyms, omitted words ("not" is a particularly nasty case), botched word orders, and misplaced modifiers (see the next item). Also, Peter Ladkin noted in the on-line Risks Forum that the jargony term spellchecker is a flagrant misuse of natural English - unless, of course, you want to check for spells and curses.
[*** Mafia enforcer to Laserjet here ***] Alek O. Komarnitsky noted FrameMaker flags the word Interleaf and recommends you change it to FrameMaker.
A spelling corrector being blamed for transforming ACLJ into ACLU in an article about the American Center for Law and Justice, causing complaints to the ACLU for apparently having dramatically changed its position.268 An article by Nicholson Baker in The New Yorker on the Annals of Scholarship mentioned that library automation had replaced all instances of "Madonna" with "Mary, Blessed Virgin, Saint" - including references to a Madonna also known as Ms. Ciccone.269
"Atomic bombers criticize Enola homosexual exhibit" was a newspaper headline that evidently eschewed the word "Gay".270
Martin Virtel wrote a wonderful article "Fehler, Fehler, Feler"271 observing the 10th anniversary of the on-line Risks Forum. In response, he received a note from a German teacher, Thiomir Glowatzky, of Bamberg, who had used his computer in writing something about three German authors, Kafka, Musil and Schnitzler. The spelling corrector wanted to replace the authors' names with more digestible words from its dictionary: Kaffee, Müsli, and Schnitzel. It must have been strictly from Hungery.272
A Microsoft Word spellchecker was observed rejecting various common words, recommending interment instead of Internet, internee's instead of Internet's, emboli instead of e-mail. Perhaps not surprisingly, it had never heard of Netscape. Amusingly, the Dallas Morning News ran an article in which Microsoft had been transformed into Microvolts and Intel into Until.273 In the Danish users' guide for Windows for Workgroups 3.11, standard became stogard and andre (other) became ogre in a botched effort to transform and into its Danish equivalent og.274
A grammar checker in Microsoft Word for Windows was given the sentence, "I graduated from the University of Notre Dame." Its response was this: "Sexist expression. Avoid using Dame except as a British title." The New Yorker's customary retort was quite worthy: "They don't call them P.C.s for nothing."275
An editorial in The Washington Post transformed boss into the expanded version of DOS in the following text: "... Senator Alfonse D'Amato, the state's leading Republican and Pataki's Disk Operating System (DOS), said he would not seek revenge on Giuliani."276
The following note appeared in the Radford University sports pages: "Corrections: Because of an overzealous computer spellchecker, a number of names in a story on Radford University sports in the Welcome Students section appeared incorrectly and were not caught by a sports-ignorant editor. Phil Leftwich is the former Highlander now in the pros. Chris Connolly plays ball in Wilmington, Del., not Laminating, Del., and there's no such place as Educator, Ga. - Eric Parker is from Decatur. Chibi Johnson is not in the least bit Chubby, and Done Staley is legendary, not Don Stellae. Meanwhile, Paul Beckwith, who is no relation to Paul Backwash, departed for Cornell."277
Add to end of "Beware of imitations"
Kathryn Rambo in San Jose, California, was plagued by a Doppelgänger (taking on the identity of the victim for malevolent purposes) who acquired a $35,000 sports utility vehicle, a $3000 loan, new credit-card accounts, and a rented apartment - all in her name. (In this case, a suspect and alleged accomplice were apprehended.) In another case, Caryl Fuller's purse was stolen, and the thief opened up and maxed out three credit cards despite having a face that obviously did not match Fuller's picture.278
Mistaken Identity. Michael W. Klein, an attorney in New Jersey, was falsely identified as a reckless driver because a police clerk made a mistake on an address search. Law enforcement personnel were prepared to lock him up even though the physical descriptions did not match, even though the demographics did not match, just because the warrant had come out of the computer.279
Swedish court fines parents for son's overly long name. Many computer programs are so lame that they cannot handle special cases of people's names. In some other cases, there are real physical (and other) constraints that are hard to code around, such as in the case of some colleagues whose long names risk falling off the edge of their company badges. Sometimes people just push things to far to the extreme: A Swedish court fined a couple $660 for breaking the law by naming their son Brfxxxcccxxmnnpcccclllmmnprxxvvclmnckssqlbb11116 -- or Albin for short.280
Australian court bars family name. New parents of a baby boy were unable to give their child an ancestral name because of the accents on some of the characters. The reason given was that the computer system of the Registry of Births and Deaths could not accept the name because it was not a standard ASCII character. The solution was not as easy as removing the accents because that substantially changes the pronunciation of the name. The child remains unnamed!281
Perhaps you lead an honest life and have nothing to hide? Does an invasion of privacy seem more or less irrelevant to you? Maybe you want to publicize everything you do on your Web page, and don't care about security? The World-Wide Web can make information instantaneously accessible globally.
Unfortunately, there are also some social and technological risks to your personal well-being and integrity. One potential risk is that of computer-aided identity fraud and its extreme form, identity theft - illustrated in Section . Identity-related misuse can range from a one-time event to someone acting pervasively as a Doppelgänger. Although computer access is not essential for such activities, remote, global, and possibly anonymous access can greatly increase the risks. The infrastructure is inherently weak with respect to system and network security, Website integrity, personal authentication, and accountability. Masquerading may be easy -- particularly in the presence of fixed passwords or easily captured PINs. Although it may not seem to be a serious problem yet, identity-related misuse has been increasing in the past few years, and has the potential to escalate dramatically unless checked.
Identity and authentication of identity for Website users. Whenever misuse is a potential problem, the absence of strong user authentication throughout most of the Internet and many of its host systems makes it very difficult to ascertain a perpetrator's true identity. It is relatively easy for one user at one site to masquerade as another user at another site. Of course, even if some sort of strong authentication were to be invoked, most Websites do not enforce any differential access controls - once you are there, you typically have implicit permission to access everything that is accessible to any other Web browser. Thus, we make a careful distinction among identity, authenticity, and authorization.
Inference, aggregation, and secondary use. A serious risk arises in databases containing individuals' identities and personal information that can be used for purposes other than those for which it was intended. Also, an individual's information in different databases can be easily combined to provide detailed dossiers that may be detrimentally misused, either via further computer manipulation or by "social engineering" (the manipulation of people using partial knowledge and clever subterfuges). Collections of information may be more sensitive than the individual data items.
Identity-related misuse. Theft of one's identity is a risky form of malicious masquerading. For example, knowledge of your Social Security Number (SSN) and mother's maiden name may be sufficient for someone else to dishonestly manipulate your financial accounts and to obtain credit in your name - with or without computers. If this ever happens you, your life may be permanently altered, and efforts to regain your credit rating, your livelihood, and indeed your mental stability may be very difficult.
System and data integrity risks. A different kind of risk to individuals and organizations arises when information is maliciously altered (or even unintentionally corrupted). In various cases, serious harm has resulted from incorrect data. Also, Website penetrations have resulted in the insertion of bogus Web pages for the CIA, NASA, the Justice Department, the Air Force, and even the National Collegiate Athletic Association. However, subtle changes that are less immediately obvious can be much more insidious - for example, implanted Trojan horses that trap users into yielding passwords and other sensitive information.
In general, many people seem oblivious to these risks; I hope that regular readers of this column are exceptions. Risks involving your identity should be particularly important to you. Identity-related misuse represents a significant threat to the fabric of our existence. Greater awareness as well as technological, social and legal approaches are needed to minimize the risks.284
Anonymity is a sticky wicket in the on-line world, especially in digital commerce and electronic mail. Profound social and technological risks arise, both from anonymity itself and from the loss of anonymity. An important challenge confronts us to improve our understanding of these risks and to anticipate the effects of future problems.
There is a spectrum of identity masking, with respect to individuals and computerized agents. True anonymity means that no one knows who you really are. ("On the Internet, no one knows that you're a dog.") Pseudo-anonymity means that your identity is not generally known, but can be obtained - perhaps only under prescribed (and carefully controlled?) circumstances. For example, your identity could be known to a pseudo-anonymizing e-mail service or an identity-masking escrow agent. It could be compromised by someone penetrating the database of associations between real and pseudo identities. Pseudonymity means that alternative identities may be used, either anonymously or pseudo-anonymously. (For example, America Online allows any customer to have up to six identities, all of which may be aliases.)
Some form of anonymity is clearly desirable for people who are seriously threatened in one way or another - whistle-blowers, victims of violence and hate crimes, and so on. However, anonymity can easily be abused - for example, by false accusers and perpetrators of hate crimes, frauds, and pranks - perhaps seeking to evade responsibility and accountability. As usual, the presence of electronic media can considerably escalate the risks - geographically, chronologically, and consequentially.
Pseudo-anonymous remailing services typify one way in which anonymity can be attempted in electronic mail. Perhaps the most popular was Johan Helsingius' anon.penet.fi remailer in Finland, which provided each sender with a unique aliased e-mail address through which mail could be sent, and at which replies could be received - without revealing the actual address or identity. Helsingius has closed down his free remailer, after having experienced a variety of problems - threats from law enforcement, pressures from The Church of Scientology to identify a source, and false accusations of having transmitted pornographic images. (The remailer was designed to reject graphical images.)
Anonymity in electronic commerce represents another huge conflict. Although it may seem desirable to have anonymous electronic cash and anonymous financial transactions, some significant measure of accountability is absolutely essential to prevent misuse. For example, the Internal Revenue Service seeks to prevent unaccountable large transactions, and law enforcement seeks to prevent money laundering. Similarly, anonymous contributors presumably want assurances that their contributions are actually going to the proper charity. Indeed, anyone sending virtual money would like to ensure that payments are not being diverted to some untraceable recipient. However, the mere existence of accountability logs is always a potential source of risks - as illustrated by the ability to track an individual's activities through credit-card purchases, telephone charges, and other records. In general, the absence of accountability and the presence of anonymity suggest the need for mutual suspicion rather than blind trust.
Cryptology provides some interesting protocols that can enhance both pseudoanonymity and accountability, by providing authentication of users and systems, as well as confidentiality and integrity of content. However, we must be very skeptical if those protocols are embedded in an infrastructure that is not as well conceived - for example, implemented on a seriously vulnerable operating system, or on a smartcard whose keys can be compromised by one trick or another, or on network sites whose identities can be forged. There are many real misuses that can compromise identities, including opportunities for insider collusion, deceptive aliases, tampering with the controls, malicious alterations of audit trails and accountability information, and surreptitious tracking of individuals through inferences drawn from logs, databases, and unencrypted headers. From a realistic electronic-system point of view, true anonymity is both riskful and unachievable.
Technology for pseudoanonymity must not be easily subvertible and should support both good accountability and good anonymity; in addition, the laws and social conventions must meaningfully discourage misuse. This requirement transforms the problem back into security problems of operating systems and networks, seamless incorporation of sound cryptography, and trustworthiness of operational procedures and people (including those in any key-recovery or key-escrow processes). Unfortunately, that is the classic technique of "reduction to a previously unsolved problem." As usual, the risks abound.
The erstwhile anonymously authored novel "Primary Colors" raised some interesting issues relating to the risks of trying to remain anonymous, the difficulties inherent in lying consistently and undetectably, and the trustworthiness of computer evidence. There can be risks both directions - in believing or in doubting that digital evidence is truthful. However, technology will generally not allow you to hide your own duplicity from other people.
Relevance to computer-related risks began when when New York Magazine reported that Professor Donald Foster at Vassar College had performed a computer study apparently attributing the writing style of the novel to that of Joe Klein, a Newsweek columnist and CBS commentator (and formerly political columnist for New York Magazine).285 Klein on several occasions vigorously denied authorship. Subsequently, David Streitfeld of the The Washington Post noted286 that Maureen Casey Owens, past president of the American Academy of Forensic Sciences, had studied the handwritten notes on amended typescript pages for the novel and concluded that the handwriting was most certainly that of Joe Klein. Random House then finally admitted that Klein was indeed the author. Joel Garreau, the Post editor in charge of the story, contributed a very illuminating item to the on-line RISKS.287 Among other things, the Post had dug into the available computer records and in short order found that Klein had paid cash for half of the price of his new house. "It's amazing what you can do when you have a person's social security number and date of birth, and equally sobering how easy it is to get that information. Only our sense of journalistic propriety prevented us from pursuing and using further information that was readily available." They also discovered that Foster had not concluded unequivocally that Klein was the (only) author, whereas New York Magazine had put the sole authorship spin on the article.
Garreau noted that it was not the computer analysis that had smoked out Klein so much as serendipity involved when Streitfeld discovered the annotated manuscript in a hard-copy catalog of a small used-book store. "In short, we put an extraordinary amount of computer effort into this story, including a passworded spreadsheet to keep track of all our reporting. But the cyberheroics ended up at best a sideshow if not a distraction. [The case] finally was cracked and developed by old-fashioned means." As a result, Klein took a lot of flak relating to his integrity as a journalist, primarily because he lied about his authorship.
Thus, we see that anonymity can be very difficult to achieve, but also can entail serious risks irrespective of the extent to which it is achieved - whether computers are involved or not. However, the presence of computers may be altering the balance against anonymity rather than toward it.
The only consistent course is not to put yourself in such a position in the first place. Although that might seem to preclude perpetrating April Fools' spoofs, note that in the most well-known cases (for example, the Chernenko and Spafford e-mail spoofs, Section ) the prankster has unabashedly 'fessed up when confronted with evidence (Piet Beertema and Chuq von Rospach, respectively) and generally been admired for his cleverness. Similarly, Robert Morris never denied his involvement in the Internet Worm experiment that went seriously awry. So, does that mean it might have been acceptable for Klein to publish anonymously if he had admitted his authorship when first challenged? To expect that he could remain anonymous forever is totally unrealistic in our information-laden world. Future authors may want to take note.
Communications Decency Act. In a ruling likely to have a significant impact on the future of the Internet, a special three-judge federal court declared on June 12, 1996, that the Communications Decency Act is unconstitutional on its face. The landmark decision came in a legal challenge initiated by the ACLU, EPIC and 18 other plaintiffs. EPIC is both a plaintiff and co-counsel in the litigation. The ACLU/EPIC case was consolidated with a subsequent action filed by the American Library Association and a broad coalition of co-plaintiffs. Their lengthy ruling consists of separate opinions authored by the three members of the federal court panel. While the three judges differed in their approaches to the legal issues raised in the case, they were unanimous in their strong conclusions that the CDA constitutes a clear violation of the First Amendment. A complete copy of the opinion, as well as selected excerpts and related news items, can be found at http://www.epic.org/.288
All nine Supreme Court Justices expressed opinions on the unconstitutionality of the CDA, on June 26, 1997. In the majority opinion written by Justice Stevens, seven Justices ruled that the CDA violated free-speech rights in attempting to protect children from sexually explicit material on the Internet. The remaining two Justices (in an opinion written by Justice O'Connor, with Chief Justice Rehnquist concurring) wrote that they would invalidate the law only insofar as it interferes with the First Amendment rights of adults.289
Exon anti-cyberporn bill. Senator James Exon (D-Neb.) introduced legislation calling for two-year prison terms for anyone convicted of sending obscene or harassing e-mail. Commercial providers have protested, noting their service is more like a telephone company or common carrier, which is not held responsible for the conversations carried over its conduits. Exon remains unmoved: "If I were against this, if I didn't want to be bothered with it, if I felt it might complicate my ability to make money on the superhighway, that's the argument I would make." Meanwhile, the Center for Democracy and Technology is pushing for more sophisticated filters that users could customize to block specific types of messages: "You could have the Pat Robertson rating system, the Motion Picture rating system, the Playboy rating system, ... ."290
Screening software blocks access to *sex-y New Jersey counties. An AOL filter intended to prevent access to the offensive word sex blocked access to information on the Sussex County Fair, Middlesex County College, Essex County College, and the Essex County Clerk's office.291
AOL censors British town's name! Doug Blackie related an experience he had in trying to register with America On Line. He entered his name "Blackie" and his home town "Scunthorpe", and found that AOL's (indecency-filtering) registration program would not accept that combination. After various discussions with the AOL folks in Dublin, he discovered that he could register properly if he entered the town as "Sconthorpe". As a result of this curious situation, AOL announced that the name of the town would henceforth be known as "Sconthorpe".292
The "finger" command and "Paul Hilfinger". When Paul Hilfinger was at Carnegie-Mellon University, the CMU computer-center staff was ordered by the CMU administration to change the name of the "finger" command (despite it being an ARPAnet standard). They changed "finger" to "where" and also took it upon themselves to change Paul's name to "Paul Hilwhere" (initially intending it to be temporary). Paul actually approved of the change (as a kind of gentle protest), and it remained that way for some time.293
Quoth the Maven, Livermore! Lawrence Livermore National Laboratory (one of the U.S.'s three nuclear weapons labs) discovered one of its unclassified Internet computers was being used to provide a repository for 50 gigabytes of 90,000 erotic images. (Oh, you thought a picture is worth 1000 words? More like .56 megabytes?)294 One employee of LLNL was accused of misusing lab computers by accessing 33 photos of bikini-clad women from an Internet site, but was subsequently found to be innocent. He said he thought the photos were related to engineering. (He has subsequently been promoted.)295
*** New section on encryption policy, refer to Whit Diffie and Susan Landau on cryptography policy , NRC study , 11-authored report , Peter Wayner on Steganography , etc.
Computer encryption codes ruled protected speech. U.S. District Judge Marilyn Hall Patel released a ruling on April 16, 1996, that mathematician Daniel Bernstein could try to prove that the U.S. export controls on encryption technology are too broad and violate his right to communicate with other scientists and computer buffs - a right protected by freedom of speech. (Bernstein's cryptographic programs are called Snuffle and Unsnuffle. The U.S. State Department decided in 1993 that Bernstein's written article and programs required export licenses [because crypto purveyors are considered as being international arms dealers under ITAR], but later backed down on restricting the article; Bernstein then had sued for release of the programs.) David Banisar of the Electronic Privacy Information Center (EPIC) is quoted in the news item: "It's important to recognize that computerized information has the same kind of legal protection that printed information has."296
*** ADD TO END OF SECTION 7.1.1: Many risks result because the big picture is ignored. In each of these examples, we must understand the overall operating environment to properly assess the risks, including dependencies on telecommunications and electric power. Furthermore, numerous nontechnical issues can have an enormous impact - for example, human limitations, social problems, financial realities, Mother Nature, absence of long-term planning, and historical contexts (including international implications).
*** Merge with 7.3.2. The buzzphrase "system of systems" is popular these days, ostensibly as a way of overcoming complexity. Unfortunately, building systems out of subsystems can present serious problems because the properties of the combined system may not be at all obvious, even if properties of the individual components are well understood (which they are often not). Similarly, the notion of adding redundancy and diversity is touted as a way of overcoming reliability risks. Unfortunately, the added complexity itself often introduces new risks. Furthermore, distributed control could be a way of overcoming the risks of centralized systems. Unfortunately, distributed control often introduces many further risks.
What can be done to ensure that a particular system avoids serious risks, and moreover satisfies its stated requirements? People designing, building, and using complex systems must have a strong sense of the entire system and how it relates to its total environment. Furthermore, there must be a commitment to disciplined development and careful documentation. We suggest here the use of "principles of good software engineering practice" such as abstraction, encapsulation, information hiding, strong typing, separation of mechanism in design, separation of duties in operation, allocation of least privilege, trusted computing bases, client-server architectures, rigorous aspects of the object-oriented paradigm, formal methods, and serious testing. The notion of analytical compositionality is also fundamental, being able to combine subsystems and derive the overall system properties predictably. However, lip-service to these principles is not enough; adherence requires considerable commitment plus enlightened management to avoid overruns and schedule delays.
Our educational processes are lacking as well; students and practitioners learn how to program in the small, but very rarely how to think in the large - especially how to cope with real systems. There is relatively little emphasis on the development of human attributes and abilities suitable for people who must develop and use systems with critical and stringent requirements such as are necessary for security, reliability, fault tolerance, high availability, robustness, and real-time responsiveness. There is even less emphasis on nontechnical issues such as the sociopoliticoeconomic risks and responsibilities, and the real needs of end-users. We must learn to think in the large without being blinded by the details.
*** LOTS OF OVERLAP MUST BE REMOVED.
When computers are used to control potentially dangerous devices, new issues and concerns are raised for software engineering. Simply focusing on building software that matches its specifications is not enough. Accidents occur even when the individual system components are highly reliable and have not "failed." That is, accidents in complex systems often arise in the interactions among the system components, each one operating according to its specified behavior but together creating a hazardous system state. In general, safety is not a component property but an emergent property as defined by system theory. Emergent properties arise when system components operate together. Such properties are meaningless when examining the components in isolation -- they are imposed by constraints on the freedom of action of the individual parts. For example, the shape of an apple, although eventually explainable in terms of the cells of the apple, has no meaning at that lower level of description.
One implication of safety being an emergent property is that reuse of software components, such as commercial off-the-shelf software, will not necessarily result in safer systems. The same reused software components that killed people when used to control the Therac-25 had no dangerous effects in the Therac-20. Safety does not even exist as a concept when looking only at a piece of software -- it is a property that arises when the software is used within a particular overall system design. Individual components of a system cannot be evaluated for safety without knowing the context within which the component will be used.
Therefore, solutions to software safety problems must start with system engineering, not with software engineering. In the standard system safety engineering approach, system hazards (states that can lead to accidents or losses) are identified and traced to constraints on individual component behavior. Hazards are then either eliminated from the overall system design or they are controlled by providing protection (such as interlocks) against hazardous behavior. This protection may be at the system or component level or both. Building the software for these systems requires changes in the entire software development process and integration with the system-level safety efforts. (See  for more information about this approach).
One of the most important changes requires imposing discipline on the engineering process and product. Computers allow more interactive, tightly coupled, and error-prone designs to be built, and thus may encourage the introduction of unnecessary and dangerous complexity. Trevor Kletz suggests that computers do not introduce new forms of error, but they increase the scope for introducing conventional errors by increasing the complexity of the processes that can be controlled. In addition, the software controlling the process may itself be unnecessarily complex and tightly coupled.
Adding even more complexity in an attempt to make the software "safer" may cause more accidents than it prevents. Proposals for safer software design need to be evaluated as to whether any added complexity is such that more errors will be introduced than eliminated and whether a simpler way exists to achieve the same goal.
Besides the software process itself, new requirements are needed for the training and education of the software engineers who work on safety-critical projects. Most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices. Engineering has accumulated much knowledge about how accidents occur, and procedures to prevent them have been incorporated into engineering practice. We are now replacing electromechanical devices with computers, but those building the software often know little about basic safety engineering practices and safeguards. It would be tragic if we had to repeat the mistakes of the past simply because we refused to learn them. The most surprising response to my new book has been complaints from software engineers that it includes analysis of accidents not caused by computers.
Finally, safety is a complex, socio-technical problem for which there is no simple solution. The technical flaws that lead to accidents often can be traced back to root causes in the organizational culture. Concentrating only on technical issues and ignoring managerial and organizational deficiencies will not result in effective safety programs. In addition, blaming accidents on human operators and not recognizing the impact of system design on human errors is another dead-end approach. Solving the safety problem will require experts in multiple fields, such as system engineering, software engineering, cognitive psychology, and organizational sociology working together as a team.
With the dramatic growth of the Internet and World Wide Web, computer-communication activities are becoming increasingly distributed -- including processing, control, data, network management, accountability, and security. Unfortunately, the risks tend to increase and become more insidious as distributivity increases. In highly distributed environments, it is even more difficult to protect against all possible threats and fault modes than in centralized systems.
The October 27, 1980 ARPAnet collapse and the January 15, 1990 AT&T long-distance collapse (Section ) are just two examples of how local events can propagate. In each case, a seemingly isolated problem in one node managed to contaminate all of the other nodes in a widespread network, with devastating consequences. The widespread power outages of the 1960s were similar in their effect, with each outage leading to further successive outages. Those power outages resulted in the imposition of extensive new measures designed to prevent their recurrence. Despite those measures, people in many western U.S. states experienced power outages in July and August of 1996; although the causes were somewhat different, the basic problem remains: large-scale outages can still result from the propagation of local events, despite controls intended to prevent such occurrences.
Distributed control has considerable potential advantages. Unfortunately, it can be very unstable. Failures of seemingly isolated components can result in global outages or improper behavior. Communications collapses can isolate some subsystems from others, even in the presence of redundant communication paths. Information consistency and version consistency are difficult to achieve, especially in highly dispersed systems. Identification, authentication, and authorization of individuals and subsystems can be difficult to control. Crises and desired recovery strategies are often more difficult to analyze.
In addition, management and oversight can be much more complex, particularly in distributed environments with commercial competition. For example, power transmission and distribution represent just one sector of our economy in which there is a strong desire to minimize operating costs; too much excess capacity can be nonprofitable, and unused redundancy can be costly. It is difficult to justify in advance the provision of defensive measures, such as fault tolerance and security, whose benefits are not evident in normal operation. But after an emergency, hindsight often makes it clear that those measures would have been desirable and cost effective. We must be careful in deciding which attributes to optimize when the risks are not well understood.
*** SEE SECTION 7.7, which already has vonNeumann/Shannon/Moore One of the important challenges facing us is to be able to develop dependable systems out of less dependable components, especially where those systems may have to rely on the behavior of people whose dependability is uncertain. The technological part of that challenge is an old problem in the research community, and goes back well before early work on reliability (von Neumann, Shannon, and Moore), fault tolerance, and error-correcting codes. Reliability and system security both depend on the absence of significant weak links. However, most distributed systems today are riddled with weak links, even in supposedly dependable systems in which fault modes or misuses may occur that exceed the coverage of the fault tolerance or the security protection. In addition, procedural weaknesses can completely undermine the intended robustness. The human part of the challenge is perhaps more perplexing, because anticipating all possible human behaviors is so difficult, and because some people are truly devious or malicious.
*** COMPARE with Section 7.3: Another important challenge is to use constructive design, formal methods, formally based testing, and other serious analytical techniques to constrain the expected global behavior of a distributed system, and to be able to derive its expected behavior as a function of the local behavior of its components. That approach has the potential to detect vulnerabilities that cannot otherwise be detected. (Some steps in that direction are discussed in our July 1996 column.) As distributed systems increase in complexity, it is becoming impossible to analyze them without structural and functional analyses.
*** INSERT INTO 7.3? The lessons of the past suggest that highly distributed systems are likely to be inherently unstable in the absence of extremely careful development and operation. For example, the presence of complex feedback loops and their resulting delays often transforms collections of locally stable components into globally unstable systems, and greatly increases the stress put on weak links. (Oscillation modes are major concern in systems engineering and control theory.) Developments that attempt shortcuts and cost savings by oversimplifying algorithms, avoiding built-in checks and diagnostics, cutting corners on security, and avoiding detailed system analysis -- in the large and in the small -- are likely to greatly increase the risks involved.
IRS modernization. In early 1997, the IRS abandoned its Tax Systems Modernization effort, on which it had spent $4 billion, after extensive criticism from the General Accounting Office and the National Research Council, and reevaluation by the National Commission on Restructuring ("reinventing") the IRS. The effort was deemed incapable of satisfying its requirements. A system for converting paper returns to electronic form was also cancelled.298 The IRS also dropped its Cyberfile system, which would have enabled taxpayers to file their returns electronically directly over the Internet or via dial-ups, without third parties. A GAO report blamed mismanagement and shoddy contracting practices. It also identified security problems for taxpayers as well as the IRS, and noted that the central computer was located in a dusty subbasement of the Agriculture Department subject to flooding, the computer-room doors had locks installed backwards (to keep the bad guys in?), and sprinkler pipes were too low.299
FBI system cancellation. The FBI abandoned a $500-million fingerprint-on-demand computer system and its crime information database, the State of California spent $1 billion on its nonfunctional welfare database system, along with more millions spent on a new DMV system.300
California deadbeat-parents database. By early 1997, California had already spent $300 million on its Statewide Automated Child Support System (SACSS). The projected costs had escalated from the 1991 estimate of $99M. The Assembly Information Technology Committee considered scrapping the system altogether.301 Finally, the California Health and Welfare Agency announced on November 20, 1997, that the Lockheed-Martin IMS contract was cancelled altogether.302
California DMV. The California DMV system upgrade spent more than $44 million on a system that was never built, and would have been obsolete before completion anyway.
The Confirm system. Kweku Ewusi-Mensah  analyzed the cancellation of the Confirm reservation system development - after five years, many lawsuits, and millions of dollars in overruns. Confirm was a joint effort among Hilton Hotels, Marriott, Budget Rent-A-Car, and American Airlines Information Services (the Intrico consortium). Ewusi-Mensah provides some important guidelines for system developers who would like to avoid similar fiascos in the future.
Upgrading an existing system can also entail comparable development risks.
Bell Atlantic 411 outage. On November 25, 1996, Bell Atlantic had an outage of several hours in its telephone directory-assistance service, due apparently to an errant operating-system upgrade on a database server. The backup system also failed. The result was that for several hours 60% of the 2000 telephone operators at 36 sites had to take callers' requests and telephone numbers, look up the requested information in printed directories, and call the callers back with the information. The problem - reportedly the most extensive such failure of computerized directory assistance - was resolved by backing out the software upgrade.303
San Francisco 911 system woes. San Francisco tried for three years to upgrade its 911 system, but computer outages and unanswered calls remain rampant. For example, on October 12, 1995, the dispatch system crashed for over 30 minutes in the midst of a search for an armed suspect (who escaped). The dispatch system was installed two months before as a temporary fix to recurrent problems, and it too suffered unexplained breakdowns. Screens froze; vital information vanished; and roughly twice a week the system crashed. Dispatchers were not able to answer between 100 and 200 calls a day. Many nonemergency calls were also lost. The system 911 collapsed again on November 4, 1995, for an hour; the absence of an alarm left the collapse undetected for 20 minutes.304
Social Security Administration. The SSA botched a software upgrade in 1978 that resulted in almost 700,000 people being underpaid an estimated $850 million overall, as a result of cutting over from quarterly to annual reporting.305 Subsequently, the SSA discovered that its computer systems do not properly handle nonAnglosaxon surnames (for example, with spaces as in de la Rosa, or that do not appear at the end, as in Park Chong Kyu) and married women who change their names. This glitch affected the accumulated wages of $234 billion for 100,000 people, some going back to 1937.306
NY Stock Exchange. The New York Stock Exchange opened an hour late on December 18, 1995, after a weekend spent upgrading the system software. At 9:15 a.m. on Monday, it was discovered that there were serious communications problems in the software between the central computing facility and the specialists' displays. The problem was diagnosed and fixed by 10:00 a.m., and the market reopened at 10:30 a.m. It was the first time since December 27, 1990, that the exchange had to shut down. The Chicago Mercantile Exchange, Boston Stock Exchange, and Philadelphia Stock Exchange all waited until the NYSE opened as well. (The monster snow storm on January 8, 1996 subsequently caused a late start and an early close.)307
Interac. On November 30, 1996, the Canadian Imperial Bank of Commerce Interac service was halted by an attempted software upgrade, affecting about half of all would-be transactions across eastern Canada.308
Year-2000 upgrades. We must wait until January 2000 to adequately assess the successes and failures of some of the ongoing efforts to overcome the Y2K problem (see Section 2.11.1) - even though problems are already arising in software that deals with termination dates beyond 1999.
Barclays Bank successful upgrade. In one of the rare success stories that can be found in the RISKS archives (primarily because there are so few real successes), Barclays Bank shut down its main customer systems for a weekend to cut over to a new distributed system accommodating 25 million customer accounts. This system seamlessly replaced three incompatible systems. It is rumored that Barclays spend at least £100 million on the upgrade, and it seems to have been worth it.309
The causes of these difficulties are very diverse, and not easy to characterize. It is clear from these examples -- and from many others in this book -- that deep conceptual understanding and sensible system- and software-engineering practice are much more important than merely tossing money and people into system developments. All of the examples here suggest that we need much greater sharing of the bad and good experiences.
*** INSERT at the end of Section 7.6.1 It is appropriate to end this section with a seemingly simplistic but surprisingly subtle addendum to Einstein's statement, contributed by Jim Horning: "Nothing is as simple as we hope it will be." The risks of easy answers can be very considerable, and we can avoid them only by learning to grapple better with complexity.
Developing computer systems is intrinsically tricky. There are numerous cases in the RISKS archives in which seemingly minor misunderstandings arose in requirements, design concepts, specifications, implementation, and system operation, involving both hardware and software, sometimes with major consequences.
**** THE ABOVE PARA IS REPETITIVE WITH OTHERS ELSEWHERE... Various efforts have been made to provide formalization of requirements, specifications, and programs, and to be able to reason formally about properties of programs in the small and dependable systems in the large. Formal methods have long had enormous potential in the development of highly dependable computer-communication systems, and are increasingly finding practical uses today. In the security community, efforts focused on properties of multilevel security kernels and supposedly small trusted computing bases have had some success in formalizing requirements and encouraging improved system architectures. Some other recent efforts in other fields are noteworthy -- for example, relating to reliability, fault tolerance, and safety of critical protocols. See  for an excellent roundtable forum on the viability of formal methods today, including discussions of `light' formal methods, practical uses, and the roles of formal methods in engineering mathematics and education.
There have been many sceptics of formal methods, claiming many objections: The techniques are inaccessible to programmers who are not computer scientists and are expensive to use. The tools are not user-friendly enough, because they were developed by and for specialists. The use of formal methods (and indeed disciplined system development) is inconsistent with commercial needs to get software to market before the competitors do (even if the software is lousy). The supposedly small trusted computing bases on which security depends are too large (rendering the analysis impractical) and incomplete -- because they can in practice be circumvented or otherwise compromised. Most efforts have been research oriented rather than applied to real developments. There is some truth in all of those objections.
On the other hand, some significant positive results are emerging. For example, there is a fascinating body of work relating to Byzantine algorithms that can tolerate arbitrary failures of some of the constituent subsystems; early efforts  have led to some successful work in critical algorithms for synchronization and distributed system soundness (e.g., [10, ]). There is also some recent work on using formal methods for designing and reasoning about commercial hardware, microcode, and low-level software, such as AAMP and AAMP-FV [10, 34], Motorola 68020 , AMD K5 divider microcode, and recent activity at Intel (in the aftermath of the Pentium bug!). Algorithmic model checking is a powerful approach for hardware (e.g., cache coherence in multiprocessor designs), protocols, and finite-state systems generally . State-space analysis has always been a useful tool, and recent efforts are making it more formal . There is also renewed interest in applying formal methods to security.
If you believe in formal methods, don't give up now. There is real progress, including applications to real systems and real hardware. If you don't believe, then you might look again. The tools and techniques have become more effective, formal-methods advocates have become more practical, and some commercial organizations are realizing that complexity is otherwise unmanageable. Although the formal methodists still have a way to go, the informalists have even farther to go -- particularly for critical systems.
A fundamental conflict exists. The anticracking toolkits pose an immediate threat in that they incorporate significant detail about known system vulnerabilities and generic types of weaknesses. If the knowledge of vulnerabilities is not promulgated with a sense of urgency to those people who must do something constructive to improve security, the system flaws and configuration weaknesses do not seem to get fixed. Even if that knowledge is available to developers and system administrators, vulnerabilities do not seem to get fixed very rapidly, for a variety of reasons.310 On the other hand, if the knowledge is widely available, then the likelihood of widespread exploitations tends to increase, especially when that knowledge is embedded in readily exploited toolkits.
Even when system developers do try to remove or reduce their known system vulnerabilities, the dissemination of vulnerability knowledge among would-be attackers seems to outpace the installation of corrective measures. Indeed, that knowledge seems to propagate to the bad-hacker underground even when it is not known to those who must defend against attacks.
Some sort of incentives are needed to encourage developers and system administrators to make their systems and networks less vulnerable. It seems to be a necessary evil that penetrations and serious misuses must occur frequently enough and with enough corporate and media visibility to apply pressure for intelligent action. Emergency Response Centers such as the CERT at CMU also need to be more proactive.
Various other toolkits exist, including one developed by Tsutomo Shimomura, and another by Dan Farmer and Wietse Venema -- whose SATAN (a Security Administrator Tool for Analyzing Networks) was released free to all comers on April 5, 1995. SATAN searches only for vulnerabilities for which fixes exist, and thus its impact is limited to those system administrators who have been inattentive to the announced fixes. (Strangely, SATAN's rather pejorative name relegates its beneficial aspects to the underworld. Furthermore, a security flaw found in the first release of SATAN reminds us that there are always iatrogenic risks inherent in antidotes.) All three of these folks deeply understand the intrinsic nature of the vulnerabilities, and have been developing toolkits for the detection of security vulnerabilities because they genuinely want to see security improved.
Farmer's belief in the importance of open dissemination cost him his job at Silicon Graphics. A quote from John Markoff311 is appropriate here:
In the Internet community, Farmer's case is seen as symbolizing the conflict between a time-honored ideal -- the free flow of information in cyberspace -- and the harsh new reality that corporations and government agencies must protect their computer systems against intruders and vandals armed with increasingly sophisticated break-in software.
Ironically, Kevin Mitnick (Section ) was able to penetrate the systems of Shimomura312 and Farmer, and apparently offloaded their tools.
ADD A REFERENCE in Section 7.10 on Risks of Risk Management to Charette97 .
(... with a little intro in general, and then an illustration in the specific area of aviation... to be extended by Peter Ladkin...)
Human beings are flexible, inventive and adaptable. Even in error. Certain kinds of human error are resilient enough to overcome technology specifically designed to avoid them. First, the cautionary tale.
Consider the shootdown of Korean Air Lines Flight 007 over Sakhalin island in August 1983. The 1993 International Civil Aviation Organization report says that the identification was bungled: "... the pilot of one of the USSR interceptor aircraft ... had been directed, by his ground command and control units, to shoot down an aircraft which they assumed to be a United States RC-135." While there remains considerable uncertainty about many aspects of the accident, this is one of the most well-substantiated, in part by recent interviews with the pilot. Why was such a grievous mistake made?
Cognitive psychologist James Reason uses the term confirmation bias for the partly-unconscious ability to value evidence which confirms a hypothesis, no matter how wrong-headed, while ignoring evidence that contradicts it. This was a very low-tech incident. Identification was made visually. Interceptor pilot and controllers were expecting a military aircraft; identification was incomplete and the decision process was rushed because of urgency; the pilot perceived what he thought were evasive maneuvers; sensitivity could have been heightened by nearby secret military tests; no commercial flight should have been within hundreds of miles. Clearly, extra electronics on the USSR side would have helped to improve identification - or so one might think.
Or would they? In July 1988, the USS Vincennes, an `Aegis'-class warship, bristling with electronics to manage a complex air battle extending over hundreds of square miles in the open ocean, shot down Iran Air Flight 655 on a scheduled flight to Abu Dhabi. The Vincennes was fighting small Iranian boats, and made a high-speed maneuver which caused chaos in the fighting systems control room. IR655 was off-schedule, and flying towards the fight. It was first identified as an F-14 attack aircraft, on the basis of a momentary F-14-compatible transponder return. The crew then experienced a form of confirmation bias: the transponder return was consistently thereafter misreported, and the aircraft was misperceived as descending towards the Vincennes, whereas it was in fact climbing. The report shows that the decision to fire was made notwithstanding persistent contrary evidence from the electronics, concluding that "stress, task-fixation and unconscious distortion of data may have played a major role..." We can see similar cognitive features in this case - expectations, urgency, a sensitive military situation, (mis?)perceived maneuvering of the suspect aircraft, confirmation bias, ultimately misidentification. So much for sophisticated electronics solving the identification problem.
Now to a current theme. More than half of the 2,200 airliner fatalities during 1988-1995 were due to `controlled flight into terrain' (CFIT), in which the pilots are unaware of imminent collision. Most CFIT cases happen on approach to an airport, and usually involve human error. Airline accidents on nonprecision instrument approaches (NPAs) occur with a frequency five times greater than on precision approaches. CFIT is a big killer. So what to do?
Well, put in more electronic helpers. For example, Boeing 757 pilots are trained to use the flight-management system (FMS) to determine position and course. But this can also go wrong. The report on the CFIT crash at Cali in late 1995 included as causal factors an FMS database ambiguity and an FMS function that caused pertinent course information to be erased, which would likely have highlighted a misapprehension by the pilots. No wonder the FMS didn't help. Connoisseurs also see evidence of confirmation bias in the crew's behavior and communication with air traffic control.
But some electronic helpers seem to be almost foolproof. A solution has been proposed to CFIT in the form of electronic equipment called an Enhanced Ground Proximity Warning System (EGPWS) to warn pilots of dangerous terrain ahead of the aircraft. Its predecessor GPWS looked down but not ahead, has been in use in the U.S. for 20 years, and seems to have helped. It didn't help at Cali, and a recent model didn't help in Guam. But the enhanced version must surely be much better. How could it fail?
One can take a cue from the shootdown incidents. What could go wrong is what won't change. An airline-pilot colleague who has written handbooks for FMS's summarizes the views of many professional pilots: "Shooting an approach is generally easier in a steam-gauge airplane that in a hi-tech airplane. Less training, less monitoring, less information to sort." More high-tech devices will not alleviate this particular situation, no matter how wonderful they seem.
There appears to be near-unanimity in the aerospace industry on the value of EGPWS. But note that it's treating the symptoms, not the cause. How can we judge how well EGPWS will work, unless we thoroughly understand CFIT? And that's a question of knowing about human error, not of fancy technology.
Responsibility, n. A detachable burden easily shifted to the shoulders of God, Fate or Fortune, Luck, or one's neighbor. Ambrose Bierce, The Devil's Dictionary
Determining what constitutes a proper standard of behavior for the computer professional is a difficult endeavor, as the articles in Communications December 1995 special issue on ethics and computer use illustrate. The questions involved are vexing, particularly concerning the responsibilities that computer system designers' may have in regard to their creations. Johnson and Mulvey ask in their article, are designers causally responsible for the consequences of a system's use, at fault or to blame if the system is defective, or legally liable for acts of commission or omission in their role of system designer?315 Another way of putting it is, "Are designers responsible for the risks inherent in a computer system?"
Not surprisingly, the answer varies between yes and no. It is no in the sense that agreed norms of behavior for designers (known as "standards of care" in medicine) do not yet exist, as the computing field hasn't yet matured into a discipline with certifiable standards.316 Without such norms, the market rule "caveat emptor" governs assignment of responsibility, i.e., unless otherwise noted, the customer assumes all risk.
However, the answer is also yes in the sense that responsibility will be assigned regardless of whether agreed norms exist. Computer systems affect daily, for better or for worse, millions of individuals. People expect any and all effects to be beneficial. When not, cries ring out to know why. Likewise, "Who is responsible?" is heard. Answering back "you are" is unacceptable. "Caveat venditor" is now both the expectation and the demand.
Johnson and Mulvey warn that system designers collectively, not just personally, better begin defining "standards of care" before others such as lawyers or politicians do so for us. Trying to shift responsibility for system failure to the shoulders of "technology" carries no weight. Unless we, individually and as a community, are forthcoming and forthright about such issues as the risks being accepted by customers of computing systems, we should expect to be held accountable in every negative connotation that term imaginably possesses.
Johnson and Mulvey suggest that codes of ethics can help define these standards. Regrettably, when not being busily ignored, codes of conduct are used more to enhance the image of those supposedly following the code or for creating closed shops . Impact on day-to-day behavior is slight.
Thus, the problem is two-fold: (1) how to define an acceptable "standard of care" when our field has not yet reached a point where certification a la medicine is possible, and; (2) how can such a standard be made relevant and operational not only for system designers, but for customers of computer systems? A partial solution to both questions is found in comprehensive, continuous and shared assessments of the risks involving both system designers and customers.
First, data collected from risk assessments can serve to mark the boundaries of what currently constitutes a reasonable "standard of care." Lessons gleaned from assessments of successes and failures can help paint a realistic picture of what system designers can reasonably create, and what customers can reasonably expect, given the state of technology, resources available, market conditions, designer competence, etc. Building systems outside these limits should be seen by all as experiments, like trials of new medical procedures, necessary but prone to failure.
Second, risk assessments make standards operational through continually asking the simple question "why?" The process of assessing shared risks illuminates the doubts, the problems, and the potential consequences a system's design, development and future operation entail to all involved. What is more important, assessments communicate the degree of ignorance existing in both the designer's and customer's minds, again acting to balance expectations against risk.
Risk assessments are not panaceas by any means (see ). However, they do offer an interim step until we learned enough to define true standards. They also encourage each of us to shoulder our own responsibilities for the risks we may be creating.
When discussing the risks of using computers, we rarely mention the most basic problem: most programmers are not well educated for the work they do. Many have never learned the basic principles of software design and validation. Detailed knowledge of arcane system interfaces and languages is no substitute for knowing how to apply fundamental design principles.
The "year 2000 problem" (Section 2.11.1) illustrates the point. Since the late 1960s, we have known how to design programs so that it is easy to change the amount of storage used for dates. Nonetheless, thousands of programmers wrote millions of lines of code that violated well-accepted design principles. The simplest explanation: those who designed and approved that software were incompetent!
We once had similar problems with bridges and steam engines. Many who presented themselves as qualified to design, and direct the construction of, those products did not have the requisite knowledge and discipline. The response in many jurisdictions was legislation establishing Engineering as a self-regulating profession. Under those laws, before anyone is allowed to practice Engineering, they must be licensed by a specified "Professional Engineering Association". These associations identify a core body of knowledge for each Engineering speciality. Accreditation committees visit universities frequently to make sure that programs designated "Engineering" teach the required material. The records of applicants for a license are examined to make sure that they have passed the necessary courses. After acquiring supervised experience, applicants must pass additional examinations on the legal and ethical obligations of Engineers. Then, they can write "P.Eng." after their name. Others who practice engineering can be prosecuted. This applies to all specialities within Engineering (Mechanical, Electrical, etc.) although formal registration is most common with Civil Engineers and not required for all jobs.
When NATO organized two famous conferences on "Software Engineering" three decades ago, most engineers ignored them. Electrical Engineers, interested in building computers, regarded programming as something to be done by others - either scientists who wanted the numerical results or mathematicians interested in numerical methods. Engineers viewed programming as a trivial task, akin to using a calculator. To this day, many refer to programming as a "skill," and deny that there are engineering principles that must be applied when building software.
The organizers of the NATO conferences saw things differently. Knowing that the engineering profession has always been very protective of its legal right to control the use of the title "Engineer", they hoped the conference title would provoke interest. They had recognized that:
Unfortunately, communication between Engineers and those who study software hasn't been effective. The majority of Engineers understand very little about the science of programming or the mathematics that one uses to analyze a program, and most Computer Scientists don't understand what it means to be an Engineer.
Today, with bridges, engines, aircraft, power plants, etc., being designed and controlled by software, the same problems that motivated the Engineering legislation, are rampant in the software field.
Over the years, Engineering has split into a number of specialities, each centered on a distinct area of engineering science. Engineering Societies must now recognize a new branch of Engineering, Software Engineering and identify its core body of knowledge. Just as Chemical Engineering is a marriage of Chemistry with classical engineering areas such as thermodynamics, mechanics, and fluid dynamics, Software Engineering should wed a subset of Computer Science with the concepts and discipline taught to other Engineers.
"Software Engineering" is often treated as a branch of Computer Science. This is akin to regarding Chemical Engineering as a branch of Chemistry. We need both Chemists and Chemical Engineers but they are very different. Chemists are scientists; Chemical Engineers are Engineers. Software Engineering and Computer Science have the same relationship.
The marriage will be successful only if the Engineering Societies, and Computer Scientists come to understand that neither can create a Software Engineering profession without the other. Engineers must accept that they don't know enough Computer Science. Computer Scientists will have to recognize that being an Engineer is different from being a Scientist, and that Software Engineers require an education that is very different from their own.
The previous two sections consider primarily our computer-communication infrastructures. This section considers from an integrative perspective many of the critical infrastructures that underlie our entire civilization -- electrical power generation, transmission and generation; water supplies; sewage; telecommunications; air, rail, automotive, and other forms of transportation; worldwide financial marketplaces; and many other functions on which our daily lives depend.
Not surprisingly, many of these infrastructures are themselves increasingly dependent on our computer-communication infrastructures, as illustrated in Chapters and . Thus, we reexamine here the risks related to our critical infrastructures overall.318
The risks that must be considered are still very considerable and very varied. They encompass reliability, security, survivability, safety, and general human well-being (for example).
It is clear that many of these infrastructures are becoming massively interconnected, particularly electric power, telecommunications, transportation, and banking. The desire for autonomous controls is being offset by the economies of scale of large-scale operations. Worldwide electronic commerce is becoming a reality. All of these infrastructures are greatly in need of high reliability, high availability, and stringent security.
It may be surprising to many people that the computer systems involved in these critical infrastructures are often vulnerable to attack. For example, telephone switching systems and various control systems in other infrastructures are accessible by dial-up lines, the Internet, and private networks, and in some cases have been penetrated by intruders. Overall, the risks of widespread if not global problems -- accidental outages, massive denial-of-service attacks, and extensive frauds -- are becoming more likely. Above all, we must recognize that protecting our critical infrastructures is becoming an international problem, and that our existing computer-communication infrastructures are not yet up to the challenge.
Unfortunately, despite claims of the developers and entrepreneurs, and despite some very significant technological advances in cryptographic techniques that could contribute to security, privacy, and relative anonymity, the existing computer-communication infrastructure is not yet ready for prime time. The existing network communications are largely unencrypted, which means that passwords and other sensitive information are susceptible to capture. Most of the existing operating systems are flawed with respect to security. Much of the networking software is flawed. Many application software products are flawed, particularly with respect to their security. Security being a weak-link phenomenon, these vulnerabilities in the infrastructure are all potentially devastating.
There has been a flurry of Web security flaws. (1) The Netscape Commerce Server software uses 40-bit RC4 crypto to encrypt customer transaction data for its exportable product. A French graduate student and a British team were independently able to crack the crypto, over roughly the same period of time. It took the student 8 days using 120 workstations and two parallel supercomputers to search exhaustively for the key -- about 64 MIPS-years of processing. (A MIPS-year is the computational capacity of a million instructions per second over one year, which once was a lot of computing power.) (2) Berkeley computer-science graduate students identified a security flaw in the Netscape browsing software, resulting from a supposedly pseudorandom number generator whose use was too predictable. (3) The Berkeley folks also discovered a missing bounds check that permitted overly large numbers to crash the Netscape Navigator browser software. These three problems have all been corrected, but they are illustrative of the risks that are encountered in going on-line worldwide, and the difficulties of designing and administering secure systems.
In August 1995, a Russian programmer was arrested along with five other people from three other countries, after the discovery of unauthorized transfers of more than $10 million using the Citibank electronic funds transfer system. Citibank was able to recover most of the funds, and stated that no customers lost any money. However, the ease with which these misuses were perpetrated suggests that deeper problems exist.
Policy issues are also important. For example, law-enforcement and national-security interests are upset by anonymous transactions and encrypted communications. This might lead them to demand that all information (or the crypto keys) be available to them (perhaps when authorized by court orders), so that suspicious transactions and information flows could be tracked. In such a world, privacy would become a very engangered commodity. Security would also be compromised if keys were misappropriated and built-in trapdoors exploited.
Indeed, these cases represent the tip of an enormous iceberg relating to inherent and potential risks in the emerging digital world. This iceberg has appeared to be much less frightening than it actually is, because we have still not experienced enough large-scale fiascos. Major disasters, such as massive computer-based frauds and malicious acts that bring down a megacorporation or the economy of a small nation, are still waiting in the wings. Existing flaws could lead to the implanting of Trojan horses and trap doors for subsequent exploitation. Alternatively, a newly detected flaw could become known worldwide in a matter of moments via the Internet, with possible global effects.
We urgently need radical improvements in an infrastructure that was not designed for large-scale commerce. We need, among other things, better operating systems, consistent use of good cryptography in operating systems and application software, operational practices that do not compromise the potential benefits of the technology, strict monitoring of operations, and a much greater awareness of the risks on the part of everyone involved. These are all essential to reduce the risks of digital commerce to realistically acceptable levels. However, in the interim, we can expect somewhat risky small-scale forays into electronic businesses, and we can expect to see dramatic exploitations that add new grist to the RISKS mill.
U.S. research and education are both being widely criticized, often within the same breath. The criticisms are mixtures of politics, budget tightening, and genuine problems that must be corrected.
Most university research programs in computer and information science support graduate students, including many at the masters level, immersing them in developing technologies. There are serious technological and social risks if the squeeze on this system becomes too great, because fewer new professionals will understand the complexities of modern systems or the design of tools that help manage systems. These risks would take a long time to manifest.
Most people favor federal R&D funding, which has served us well for many years.320 Today's questions concern the appropriate roles of government and private business. How should federal, state, and local governments treat basic research, applied research, education policy, and educational practice? How can private industry achieve what is needed in research and development? What would happen to our graduate education system if federally supported research assistantships were curtailed? Such questions come up as politicians attempt to set priorities for spending freezes or reductions.
Few people dispute the value of education, especially when it enables meaningful employment and national competitiveness. Few people dispute the value of R&D, because the ongoing process of posing and answering questions is at the heart of economic growth, and because many useful results have come from past R&D. Beyond these agreements, however, lies much controversy.
Many critics openly wonder whether private enterprise has enough of a focus on long-term questions to permit it to take up the slack after reduction of federal funds. Many industry research labs are being redirected from basic to applied research. With few exceptions, industry looks to the university system to produce competent software engineers. (One of the notable exceptions was the privately funded Wang Institute for software engineering; sadly, it was one of the first things to go when Wang encountered financial difficulties.) Others worry whether the university system is responding to public pressure to train software engineers who can build reliable, fault-tolerant, secure, and dependable systems. There are few penalties for bad systems and bad administration, as readers of this column are well aware. There is increasing public pressure for certification of software engineers (especially those developing critical systems), even though no one seems to know what to certify!
Improved computer education is not enough. Reform is also needed in the public procurement process. By assuming that a system is completely determined by its specifications, this fails to accommodate the continual interactions between customer and performer that are characteristic of successful systems. Instead it holds the customer and performer "at arms length," sometimes with threats of collusion charges.
To those who ask whether we need new research in computing, we answer that new questions come up all the time, triggering new investigations. Software engineering is a fine example: it confronts many practical problems and continually inspires research questions. One of the latest is the notion of "patterns" for good design, inspired by the pattern languages of the architect Christopher Alexander [3, 13]. Software engineers are beginning to look for the common patterns appearing in such systems as air-traffic control, fly-by-wire avionics, medical treatment, banking, and inventory. They hope to identify a small set of building-block patterns from which all systems are composed. With answers to such questions, it may be possible to overcome the setbacks we suffered in software engineering from the personal-computer revolution, which attracted a lot of new, inexperienced people into programming.
The current budget debate raises issues about R&D and computer-system related education that must be faced. We must acknowledge the problem areas and proactively take steps to correct them. We still have a long way to the goal of routinely ensuring the safety and dependability of systems -- a central theme in the Risks Forum. Let's confront the mysteries of system design that challenge us.
Past experiences with computer-related disasters are not particularly encouraging, despite the presence of many techniques for risk avoidance and risk management. There is a widespread lack of understanding and foresight. So, here are a few desiderata for avoiding risks in the future -- or at least for dramatically reducing them.
Is there any realistic hope that these desiderata can be fulfilled? ("Hope" implies desire, with some not necessarily reasonable expectation of success.) My native optimism wishes that enough long-time readers of this column are by now sufficiently aware of the risks, and willing and able to collectively transform our reality-based pessimism to a justifiably reality-based optimism. However, the social, managerial, and technological changes necessary to accomplish such a transformation are evidently radical. In their absence, my native pessimism (based on the ongoing RISKS experiences) suggests that even the relatively modest desiderata outlined above are evasive. Although I would certainly like to believe that we can step up to the plate and learn how to adapt to some of the curve balls that Murphy keeps throwing at us, we'd better wear batting helmets and not hold our breaths while waiting for a hit.
*** INSERT ON p. 308 after the Table 9.1 PARA on Peter Mellor: The cases in the table involving deaths are predominantly in medical systems and in commercial aviation. However, cases involving serious risks to life also have occurred significantly in defense systems, space applications, military aviation, electrical power systems, and communication systems. Financial systems have been particularly prone to losses; there is much empirical evidence that many additional cases of financial loss remain unreported, and fraud is increasingly becoming a considerable problem area. Security and privacy problems are by far the most prevalent of all types we have recorded, although the effects to date have been much less serious than they might have been. Indeed, many of the discoveries of serious security flaws have been made by folks who are friendly rather than malicious, and have resulted in local improvements in security (as in the recent flurry of Web browser and server flaws). Furthermore, many of the attempted frauds (at least among those that are known) were detected before severe losses could occur (as was true in the Citibank case in which a Russian managed to subvert the funds-transfer security). Unfortunately, the infrastructure is still rather shaky for high-stakes electronic commerce and other advanced uses of the technology.
We tend to underestimate the effort, time, and cost required for complex development efforts --- especially if the resulting systems have critical requirements for security, reliability, fault-tolerance, high performance, or real-time behavior. Many techniques exist by which the risks can be dramatically reduced --- for example, structured design, good software engineering practice, formal methods for critical functionality in software and hardware, extensive testing, and risk analysis. Also desirable are the participation of people with experience, intelligence, wisdom, patience, adequate resources, and anticipation of the likelihood that systems will be risky and therefore must be flexible enough to be readily (and sensibly) modified.
What is perhaps most striking is that there are so many different types of risks that must be anticipated. No matter how clever and how careful you are, there are always problems that will escape you. Consequently, the collected experience that can be drawn from past disasters must be assimilated by everyone involved in developing and using computer-communication systems.
· Philip E. Agre, Computation and Human Experience .
· Ulrich Beck, Risk Society: Towards a New Modernity .
· David Burnham, The Rise of the Computer State, Random House, New York, 1982 .
· Steven M. Casey, Set Phasers on Stun, and Other True Tales of Design Technology and Human Error, Aegean, 1993 .
· Peter and Dorothy Denning, Internet Besieged, Addison-Wesley, 1997 .
· Mark Dery, Escape Velocity: Cyberculture at the End of the Century, Grove Press, 1996.
· Dietrich Dörner, The Logic of Failure: Why things go wrong and what we can do to make them right, Metropolitan Books, Henry Holt, 1996.
· John H. Fielder and Douglas Birsch, The DC-10 Case: A Case Study in Applied Ethics, Technology, and Society, State University of New York Press, 1992.
· Deborah G. Johnson and Helen Nissenbaum, Computer Ethics and Social Values, Prentice Hall, Englewood Cliffs, New Jersey, 1995.
· Rob Kling, Computerization and Controversy: Value Conflicts and Social Choices (2nd Edition), Academic Press, San Diego, February 1996.
· Jerry Mander, In the Absence of the Sacred: The Failure of Technology & the Survival of the Indian Nations, Sierra Club Books, 1991.
· Steven E. Miller, Civilizing Cyberspace: Policy, Power and the Information Superhighway, Addison-Wesley Publishing Co. and ACM Press, 1996.
· Charles Perrow, Normal Accidents, Basic Books, NY, 1984 .
· Ivars Peterson, Fatal Defect: Chasing Killer Computer Bugs, Times Books (Random House), New York, 1995 .
· Joan Stigliani, The Computer User's Survival Guide, O'Reilly & Associates, Inc., 1995. bullet Steve Talbott, The Future Does Not Compute, 1994 . · Lauren Wiener, Digital Woes: Why We Should Not Depend on Software, Addison Wesley, 1993. In addition, several books that are primarily oriented on how to avoid risks are worth considering, first concerning safety, and then security.
· E. Lloyd and W. Tye, Systematic Safety: Safety Assessment of Aircraft Systems, Civil Aviation Authority, London, England, 1982, reprinted 1992.
· Nancy G. Leveson, Safeware: System Safety and Computers, Addison-Wesley, Reading, Massachusetts, 1995 .
· Scott D. Sagan, The Limits of Safety: Organizations, Accidents, and Nuclear Accidents, Princeton University Press, 1993.
· Ralph Nader and W.J. Smith, Collision Course: The Truth About Airline Safety, TAB Books, McGraw-Hill, Blue Ridge Summit, Pennsylvania, 1994 .
· C.V. Oster, J.S. Strong, and C.K. Zorn, Why Airplanes Crash: Aviation Safety in a Changing World, Oxford University Press, New York, 1992 .
· Brent Chapman and Elizabeth Zwicky, Building Internet Firewalls, O'Reilly & Associates, Inc., 1995.
· William R. Cheswick and Steven M. Bellovin, Firewalls and Internet Security: Repelling the Wily Hacker, Addison-Wesley, Reading, Massachusetts, 1994.
· David Clark et al., Computers at Risk: Safe Computing in the Information Age, National Research Council report of the System Security Study Committee, National Academy Press, December 1990.
· Ken Dam et al., Cryptography's Role in Securing the Information Society, Final report on U.S. cryptography policy, National Research Council report of the Committee to Study National Cryptography Policy, National Academy Press, 1996.
· David Icove, Karl Seger, William VonStorch, Computer Crime: a Crimefighter's Handbook, O'Reilly & Associates, Inc., 1995.
· Bruce Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C (2nd Edition), John Wiley and Sons, New York, 1996.
Anyone seriously interested in avoiding risks in whatever applications are considered should also make a serious study of books on system engineering, software engineering, and the art of programming. Perhaps we will consider those items separately in another column, although the following are particularly relevant:
· Frederick P. Brooks, Jr., The Mythical Man-Month (2nd edition) Addison-Wesley, 1995 .
· Nathaniel S. Borenstein, Programming As If People Mattered: Friendly Programs, Software Engineering, and Other Noble Delusions .
· John Gall, Systemantics: How Systems Work and Especially How They Fail .
· Robert Glass, Software Creativity, Prentice-Hall, Englewood Cliffs, New Jersey, 1995.
· Henry Petroski, To Engineer is Human: The Role of Failure in Successful Design, St. Martin's Press, 1985 .
· Henry Petroski, Design Paradigms: Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.