The Rewrite-Rule Machine (RRM) is a Multiple Instruction, Multiple Data/
Single Instruction Multiple Data (MIMD/SIMD) massively parallel computer
being designed, simulated, and prototyped at SRI International.
The RRM project is unique because it emerged from an initial
design search space which was primarily focused on software issues.
The outcome of this high-level design effort has been coupled with a
bottom-up quantitative approach resulting in an architecture which,
while trying to balance complexity, performance and cost in an optimal way,
still inherits the important guidelines of the initial theoretical work.
Two main characteristics of the overall design are the use of the concurrent
rewriting model of computation, and the use of active memory.
Our parallel programming paradigm diverges from the standard von
Neumann model of computation where every execution step requires some
interaction between the CPU and data memory. One way of describing the RRM
architecture is to imagine a parallel system whose
computational units are in its first-level caches. One can think of the
SIMD processors as a self-modifiable programmable active store, and of the data
memory as conventional passive memory. This organization blurs the distinction
between the computational agent and memory, and thus limits the negative effects
of random memory access.
As displayed in the above figure,
the RRM is a 7-tiered hierarchical
architecture. The most basic unit is a 16 bit processing element
with 16 registers called a cell. Four cells, which share local
communication buses, make up a tile, and 144 tiles operating in SIMD
mode make up an ensemble, which is expected to fit on a single die.
A node consists of a collection of hardware devices which constitute
a self-contained computational building block.
In our case
the node is a tightly coupled design which is tuned to supply the
ensemble SIMD processors
with enough resources to efficiently sustain computation.
A node contains an ensemble, data and instruction memory, and I/O
and network interfaces,
and is expected to be realized as a multichip module.
A cluster consists of 64 or more nodes connected on a high-speed network,
and fitting on a single board.
The Rewrite Rule Machine as a whole is a collection of clusters connected
on a network and sharing a common host, which runs a standard operating
system and handles user interaction.
We view an RRM system with a single cluster as an attractive symbolic accelerato
r
for applications such as event-driven simulation, image
processing, neural networks, artificial intelligence, and symbolic computation i
n general.
Such single-board system has a raw peek performance of 3.6 teraops
and, as explained in the paper The Rewrite Rule Machine Node Architecture and its Performance, is flexible enough to achieve very
good performance on an heterogeneous variety of applications.