|
Reconfiguration and Transient Recovery in State-Machine Architectures
by Dr. John Rushby.
From Fault Tolerant Computing Symposium 26. IEEE Computer Society, Sendai, Japan. June, 1996. Pages 615.
Abstract
We consider an architecture for ultra-dependable operation based on synchronized state machine replication,
extended to provide transient recovery and reconfiguration in the presence of arbitrary faults.
The architecture allows processors suspected of being faulty to be placed on "probation." Processors in this status
cannot disrupt other processors, but those that are nonfaulty or recovering from transient faults are able to remain
synchronized with the other processors and with each other, can participate in interactively consistent exchange of
data (i.e., Byzantine agreement), and can restore damaged state data by loading majority-voted copies from other
processors. The processors that are not on probation are able to coordinate membership of their group and to take
processors on and off probation. These properties are achieved even if all the processors on probation and some of
the others exhibit Byzantine faults, provided a majority of all processors are nonfaulty.
Key elements of the architecture are modified treatments for the problems of interactive consistency, clock
synchronization, and group membership. Classical algorithms for these problems that tolerate t Byzantine faults
among n processors are extended to tolerate t+p faults among n+p processors, partitioned into n "core members" and
p "probationers," provided no more than t faults occur among the core members.
BibTEX Entry
@inproceedings{Rushby96:FTCS,
AUTHOR = {John Rushby},
TITLE = {Reconfiguration and Transient Recovery in State-Machine Architectures},
BOOKTITLE = {Fault Tolerant Computing Symposium 26},
YEAR = {1996},
PAGES = {6--15},
ADDRESS = {Sendai, Japan},
MONTH = {jun},
ORGANIZATION = {{IEEE} Computer Society},
URL = {http://www.csl.sri.com/papers/ftcs96/}
}
Files
|
|