A description of the Rewrite Rule Machine

The Rewrite-Rule Machine (RRM) is a Multiple Instruction, Multiple Data/ Single Instruction Multiple Data (MIMD/SIMD) massively parallel computer being designed, simulated, and prototyped at SRI International. The RRM project is unique because it emerged from an initial design search space which was primarily focused on software issues. The outcome of this high-level design effort has been coupled with a bottom-up quantitative approach resulting in an architecture which, while trying to balance complexity, performance and cost in an optimal way, still inherits the important guidelines of the initial theoretical work. Two main characteristics of the overall design are the use of the concurrent rewriting model of computation, and the use of active memory. Our parallel programming paradigm diverges from the standard von Neumann model of computation where every execution step requires some interaction between the CPU and data memory. One way of describing the RRM architecture is to imagine a parallel system whose computational units are in its first-level caches. One can think of the SIMD processors as a self-modifiable programmable active store, and of the data memory as conventional passive memory. This organization blurs the distinction between the computational agent and memory, and thus limits the negative effects of random memory access.

As displayed in the above figure, the RRM is a 7-tiered hierarchical architecture. The most basic unit is a 16 bit processing element with 16 registers called a cell. Four cells, which share local communication buses, make up a tile, and 144 tiles operating in SIMD mode make up an ensemble, which is expected to fit on a single die. A node consists of a collection of hardware devices which constitute a self-contained computational building block. In our case the node is a tightly coupled design which is tuned to supply the ensemble SIMD processors with enough resources to efficiently sustain computation. A node contains an ensemble, data and instruction memory, and I/O and network interfaces, and is expected to be realized as a multichip module. A cluster consists of 64 or more nodes connected on a high-speed network, and fitting on a single board. The Rewrite Rule Machine as a whole is a collection of clusters connected on a network and sharing a common host, which runs a standard operating system and handles user interaction. We view an RRM system with a single cluster as an attractive symbolic accelerato r for applications such as event-driven simulation, image processing, neural networks, artificial intelligence, and symbolic computation i n general. Such single-board system has a raw peek performance of 3.6 teraops and, as explained in the paper The Rewrite Rule Machine Node Architecture and its Performance, is flexible enough to achieve very good performance on an heterogeneous variety of applications.