Accurate Prediction of Program Execution Times
|
|
Microbenchmarks determine architectural dataLatencies of an Alpha processor with 600 MHz
To measure latency timings of different memory levels, contiguous memory areas of different sizes can be used. Accesses to these memory areas with different strides give information about the latencies, capacities and block sizes of the different memory levels.
Simulation of memory hierarchiesThe cache simulator calculates the number of accesses to the different memory levels which is needed to estimate the actual execution time.
|
Latency-of-Data-Access Model (LDA-Model)The Latency-of-Data-Access Model determines program execution time by adding the latencies of the different memory levels.
Program Execution TimesThe simulation of an unblocked matrix-matrix multiply with the memory hierarchy of an RS6000 shows a good correspondence of the calculated results with the measured execution. The lower chart illustrates that the accesses to matrix B are dominating the whole execution time for bigger matrices.
|
More Details on the LDA-Simulator |
GeneralCost models are used to determine the execution time of programs, to compare the efficiency of algorithms, and to analyse the behaviour of memory hierarchies. The Latency-of-Data-Access (LDA) model that takes into account multiple hierarchical memory levels with different latencies, is a newly proposed, innovative cost model. In this diploma-thesis, a simulator for memory hierarchies is presented that allows the calculation of execution times using the LDA-model. The simulator is used to prove the claim that the execution time of a program can be accurately estimated with the LDA-model. With the simulator, it is for the first time possible to determine the execution time of programs with this model in a practical way. The simulator can be configured for systems with various cache architectures and supports shared memory (SMP) multiprocessor systems with different cache coherence protocols. The simulator can be configured with the results from microbenchmarks which measure the architectural properties of a memory hierarchy. The results confirm the claim not only for single processor systems, but also for SMP systems, where concurrently interacting processors influence each others cache access sequence. Additionally, a new field of usage of the LDA-model was developed to determine execution times of program parts. Single access costs can be assigned to one of several parallel running models. As an example the costs of accesses to different memory areas can be split and determined separately. This profiling technique allows to optimise data structures and memory access patterns of sequential and parallel SMP programs by precise production of information. This simulator is written in C++ and is published as open source under the GNU General Public License. |
Papers
Downloadsldasim-1.0.tar.gz (version used for the simulations in the diploma thesis)ldasim-1.2.tar.gz (prepared for gcc 3.0 and Intel C++ Compiler 5.0.1) Related WorkYou can find other simulators on this page.ContactFlorian Schintke |