This ideas are derived from
"Efficient Instruction Level Simulation of Computers" by "R. M. Fujimoto and W. B. Campbell" in "Trans. of the Society for Computer Simulation, April 1988"

(Unfortunately, I do not think that there is an electronic copy of this paper and, in my experience, it is hard to find. If you want me to fax you a copy send me email)

I make a sharp distinction between behavioral simulation and performance simulation. Behavioral simulation does not rely on a global view of virtual time to maintain coherent distributed event causality relations. The simulator correctly executes a given parallel application that observes a particular synchronization model of choice without a notion of virtual time. In parallel with the behavioral execution, at run-time, we also perform a performance simulation with which we estimate a global virtual time in which the behavioral execution could have been carried out. This ideas have been applied to the simulation of the RRM.
In the technical report Execution-driven Distributed Simulation of Parallel Architectures and in the paper A Technique for the Distributed Simualtion of Parallel Computers we describe the simulation technique in detail and give some simulation performance results.