Abstracts for publications related to SimICS

Performance Debugging and Tuning using an Instruction-Set Simulator

SICS Technical Report T93:05>

Peter S. Magnusson and Johan Montelius

{psm,jm}@sics.se

June 1008

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably.

We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems.

Keywords: instruction set simulation, profiling, software engineering, performance debugging, SIMICS


Efficient Memory Simulation in SimICS

Published in the 28th Annual Simulation Symposium

Peter S. Magnusson and Bengt Werner

{psm,werner}@sics.se

April 1995

We describe novel techniques used for efficient simulation of memory in SimICS, an instruction level simulator developed at SICS. The design has focused on efficiently supporting the simulation of multiprocessors, analyzing complex memory hierarchies and running large binaries with a mixture of system-level and user-level code.

A software caching mechanism (the Simulator Translation Cache, STC) improves the performance of interpreted memory operations by reducing the number of calls to complex memory simulation code. Major data structures are allocated lazily to reduce the size of the simulator process. A well-defined internal interface to generic memory simulation simplifies user extensions. Leveraging on a flexible interpreter based on threaded code allows runtime selection of statistics gathering, memory profiling, and cache simulation with low overhead.

The result is a memory simulation scheme that supports a range of features for use in computer architecture research, program profiling, and debugging.

Keywords: interpreter, simulator, multiprocessor, SimICS, memory simulation, memory hierarchy, cache simulation


System Level Interpretation of the SPARC V8 Instruction Set Architecture

SICS Research Report R94:23

David Samuelsson

davids@sics.se

August, 1994

An implementation of a system level interpreter of the SPARC V8 instruction set architecture is described. The goal is that the simulator, SimICS, should be sufficiently accurate to run an operating system on top of the simulator. The simulation is performed by direct threaded interpretation of an intermediate code.

Simulation of condition codes is performed quickly and can handle all combinations of condition codes. The condition codes are evaluated lazily and unnecessary computations are avoided. Access to registers in a register window is as efficient as in a flat register file. To optimize instructions specialized variants are identified that can be executed faster.

SimICS is tested using a comprehensive test suite. The suite exercises the instruction set using interesting combinations of input parameters and operands and compares the result to a reference implementation. A validation of the results is performed with SPEC benchmarks. The result is a stable and correct system level interpreter of SPARC Architecture Version 8 that runs 15 times slower than the real hardware.

Keywords: SPARC. Interpretation. Simulation. Emulation. Condition codes. Register windows. SimICS.


A Compact Intermediate Format for SimICS

SICS Research Report R94:17

Peter S. Magnusson and David Samuelsson

{psm,davids}@sics.se

September 1994

Instruction set architecture (ISA) simulators are an increasingly popular class of tools for both research and commercial purposes. Common applications include trace generation, program development, and compatibility support. A major concern with ISA simulators is performance and memory overhead. A common technique for achieving good performance is to use threaded code, which involves translating the target object code to an intermediate format which is subsequently interpreted. We describe such an internal format, which we call the 64-bit format, that is compact and meets a range of requirements in terms of flexibility and simplicity. We show how a simulator using this format can be implemented efficiently by taking advantage of extensions to the C language supported by the GNU C compilers. We have used the format to write the core interpreter in SimICS, a system level multiprocessor simulator that supports the Motorola 88110 and the SPARC V8 instruction sets.

Keywords: Intermediate Representation. Interpreter. Simulator. Instruction Set. Architecture. SPARC. m88110. C. GCC.


Some Efficient Techniques for Simulating Memory

SICS Research Report R94:16

Peter S. Magnusson and Bengt Werner

{psm,werner}@sics.se

August 1994

We describe novel techniques used for efficient simulation of memory in SimICS, an instruction level simulator developed at SICS. The design has focused on efficiently supporting the simulation of multiprocessors, analyzing complex memory hierarchies and running large binaries with a mixture of system-level and user-level code.

A software caching mechanism (the Simulator Translation Cache, STC) improves the performance of interpreted memory operations by reducing the number of calls to complex memory simulation code. A lazy memory allocation scheme reduces the size of the simulator process. A well-defined internal interface to generic memory simulation simplifies user extensions. Leveraging on a flexible interpreter based on threaded code allows runtime selection of statistics gathering, memory profiling, and cache simulation with low overhead.

The result is a memory simulation that supports a range of features for use in computer architecture research, program profiling, and debugging.

Keywords: Interpreter. Simulator. Multiprocessor. SimICS. Memory Simulation. Memory Hierarchy. Cache Simulation.


Partial Translation

SICS Technical Report T93:05

Peter S. Magnusson

psm@sics.se

October 1993

Traditional simulation of a target architecture by interpreting object code can be improved by translating the object code to an intermediate format. This approach is called interpretive translation. Despite a substantial performance improvement over traditional interpretation, a large part of the overhead is unnecessary. An alternative approach is block translation, where one or more simulated instructions are translated to directly executable code. This approach has several drawbacks.

We discuss the problems with block translation, analyse the overhead of interpretive translation, and describe a hybrid approach---partial translation---that combines the benefits of both approaches. Partial translation implements an intermediate format that supports the addition of run-time generated code whenever appropriate. The performance limit (slowdown) of interpretive translation is around 15, and real implementations have achieved 20-30. Partial translation will perform considerably better. Finally, we present results from an aggressive implementation of interpretive translation, and results from a proof-of-concept implementation of partial translation.

Keywords: Partial translation. Simulator. Interpreter.


A Design for Efficient Simulation of a Multiprocessor

International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

Peter S. Magnusson

psm@sics.se

January 1993

Instruction-level simulators, also called register-transfer level simulators, are a crucial component in developing and analyzing computer architectures and system software.

Simulating a multiprocessor presents some special problems, notably code expansion and efficient time slicing of processors. Also, modern processors have aggravated the memory bottleneck, and the internal formats used by a simulator must be compact.

This paper presents a design for a unit-delay simulator for a shared-memory multiprocessor that comes far in meeting these requirements.

The simulator interprets at system level, i.e., it faithfully reproduces the interfaces of the principal devices.

Previous work in the area is discussed.


Efficient Simulation of Parallel Hardware

MSc Thesis in Computer Science, Royal Institute of Technology, Sweden

Peter S. Magnusson

psm@sics.se

March 1992

Instruction-level simulators, also called register level simulators, are a crucial component in developing and analyzing computer architectures and system software. This thesis describes the essential components of a Tadpole multi-processor simulator, a Motorola 88000 RISC-based computer. The simulator is sufficiently accurate to boot the monitor program, and runs approximately 30 times slower than the real machine. Possible extensions to improve this slowdown, and some of the issues that will arise when the simulator is extended to simulate a shared-memory multiprocessor, is discussed.

The thesis of the author is, in part, to demonstrate for a particular parallel architecture that several of the uses of a simulator can be partially or fully satisfied in a single program. In other words, functionality need not compromise efficiency to the extent previously supposed.

A discussion and critique of previous work in the area is presented. Efficient ways of simulating MC88100 instructions and of representing them internally is dealt with in some detail.

Keywords: Interpreters. Simulation. Emulation. Instruction set design. Virtual machines. Register transfer level simulation. Processors. Computer Architecture. MC88000. Multiprocessors.