Serviceeinschränkungen vom 12.-22.02.2026 - weitere Infos auf der UB-Homepage

Treffer: Online optimizations driven by hardware performance monitoring

Title:
Online optimizations driven by hardware performance monitoring
Contributors:
The Pennsylvania State University CiteSeerX Archives
Publication Year:
2007
Collection:
CiteSeerX
Document Type:
Fachzeitschrift text
File Description:
application/pdf
Language:
English
Rights:
Metadata may be used without restrictions as long as the oai identifier remains attached to it.
Accession Number:
edsbas.2D6B93F8
Database:
BASE

Weitere Informationen

Hardware performance monitors provide detailed direct feedback about application behavior and are an additional source of information that a compiler may use for optimization. A JIT compiler is in a good position to make use of such information because it is running on the same platform as the user applications. As hardware platforms become more and more complex, it becomes more and more difficult to model their behavior. Profile information that captures general program properties (like execution frequency of methods or basic blocks) may be useful, but does not capture sufficient information about the execution platform. Machine-level performance data obtained from a hardware performance monitor can not only direct the compiler to those parts of the program that deserve its attention but also determine if an optimization step actually improved the performance of the application. This paper presents an infrastructure based on a dynamic compiler+runtime environment for Java that incorporates machine-level information as an additional kind of feedback for the compiler and runtime environment. The low-overhead monitoring system provides fine-grained performance data that can be tracked back to individual Java bytecode instructions. As an example, the paper presents results for object co-allocation in a generational garbage collector that optimizes spatial locality of objects on-line using measurements about cache misses. In the best case, the execution time is reduced by 14 % and L1 cache misses by 28%.