Accurate Prediction of Program Execution Times
with a Flexible Memory-Hierarchy Simulator

ZIB-Logo

Microbenchmarks determine architectural data

Latencies of an Alpha processor with 600 MHz

Latencies of an Alpha processor with 600 MHz

To measure latency timings of different memory levels, contiguous memory areas of different sizes can be used. Accesses to these memory areas with different strides give information about the latencies, capacities and block sizes of the different memory levels.

The determined data are used to configure a memory hierarchy simulator.

Simulation of memory hierarchies

The cache simulator calculates the number of accesses to the different memory levels which is needed to estimate the actual execution time.


Memory hierarchy
The results of the simulation are used for the LDA-Model.

Latency-of-Data-Access Model (LDA-Model)

The Latency-of-Data-Access Model determines program execution time by adding the latencies of the different memory levels.

Execution times are calculated with the LDA-Model.

Program Execution Times

The simulation of an unblocked matrix-matrix multiply with the memory hierarchy of an RS6000 shows a good correspondence of the calculated results with the measured execution. The lower chart illustrates that the accesses to matrix B are dominating the whole execution time for bigger matrices.

Comparision between measured and simulated execution times

Florian Schintke

More Details on the LDA-Simulator

General

Cost models are used to determine the execution time of programs, to compare the efficiency of algorithms, and to analyse the behaviour of memory hierarchies. The Latency-of-Data-Access (LDA) model that takes into account multiple hierarchical memory levels with different latencies, is a newly proposed, innovative cost model. In this diploma-thesis, a simulator for memory hierarchies is presented that allows the calculation of execution times using the LDA-model. The simulator is used to prove the claim that the execution time of a program can be accurately estimated with the LDA-model. With the simulator, it is for the first time possible to determine the execution time of programs with this model in a practical way. The simulator can be configured for systems with various cache architectures and supports shared memory (SMP) multiprocessor systems with different cache coherence protocols. The simulator can be configured with the results from microbenchmarks which measure the architectural properties of a memory hierarchy.

The results confirm the claim not only for single processor systems, but also for SMP systems, where concurrently interacting processors influence each others cache access sequence. Additionally, a new field of usage of the LDA-model was developed to determine execution times of program parts. Single access costs can be assigned to one of several parallel running models. As an example the costs of accesses to different memory areas can be split and determined separately. This profiling technique allows to optimise data structures and memory access patterns of sequential and parallel SMP programs by precise production of information.

This simulator is written in C++ and is published as open source under the GNU General Public License.

Papers

Florian Schintke, Jens Simon, Alexander Reinefeld:
A Cache Simulator for Shared Memory Systems
In Computational Science - ICCS 2001, volume 2074 of Lecture Notes in Computer Science, pages 569-578. Springer 2001.
Abstract and BibTeX entry, © Springer-Verlag, Complete paper, Talk.

Florian Schintke (Diploma-Thesis):
Ermittlung von Programmlaufzeiten anhand von Speicherzugriffen, Microbenchmarks und Simulation von Speicherhierarchien, Technical Report ZR-00-33, Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB), 2000.
URL: http://www.zib.de/PaperWeb/abstracts/ZR-00-33/

Jens Simon, Jens-Michael Wierum:
The Latency-of-Data-Access Model for Analyzing Parallel Computation, IPL - Information Processing Letters, Juni 1998.

Downloads

ldasim-1.0.tar.gz (version used for the simulations in the diploma thesis)
ldasim-1.2.tar.gz (prepared for gcc 3.0 and Intel C++ Compiler 5.0.1)

Related Work

You can find other simulators on this page.

Contact

Florian Schintke