Result: Optimizing software cache performance of packet processing applications

Title:

Optimizing software cache performance of packet processing applications

Authors:

QIN WANG, JUNPU CHEN, WEIHUA ZHANG, MIN YANG, BINYU ZANG

Source:

Proceedings of the 2007 ACM SIGPLAN-SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2007), San Diego, California, June 13-15, 2007ACM SIGPLAN notices. 42(7):227-235

Publisher Information:

Broadway, NY: ACM, 2007.

Publication Year:

2007

Physical Description:

print, 36 ref

Original Material:

INIST-CNRS

Subject Terms:

Computer science, Informatique, Sciences exactes et technologie, Exact sciences and technology, Sciences appliquees, Applied sciences, Informatique; automatique theorique; systemes, Computer science; control theory; systems, Logiciel, Software, Langages de programmation, Programming languages, Systèmes informatiques et systèmes répartis. Interface utilisateur, Computer systems and distributed systems. User interface, Génie logiciel, Software engineering, Accès mémoire, Storage access, Acceso memoria, Antémémoire, Cache memory, Antememoria, Code binaire, Binary code, Código binario, Gestion logiciel, Software management, Gestión logicial, Globalement asynchrone localement synchrone, Globally asynchronous locally synchronous, Globalmente asincrono localmente sincrono, Goulot étranglement, Bottleneck, Gollete estrangulamiento, Haute performance, High performance, Alto rendimiento, Langage programmation, Programming language, Lenguaje programación, Localisation, Localization, Localización, Multitâche, Multithread, Multitarea, Méthode adaptative, Adaptive method, Método adaptativo, Méthode chemin critique, Critical path method, Método camino crítico, Optimisation, Optimization, Optimización, Redondance, Redundancy, Redundancia, Retard, Delay, Retraso, Algorithms, Network Processor, Performance. Optimization, local memory

Document Type:

Conference Conference Paper

File Description:

text

Language:

English

Author Affiliations:

Parallel Processing Institute, Fudan University, Shanghai, China

ISSN:

1523-2867

Access URL:

http://pascal-francis.inist.fr/vibad/index.php?action=search&terms=19154371

Rights:

Copyright 2007 INIST-CNRS
CC BY 4.0
Sauf mention contraire ci-dessus, le contenu de cette notice bibliographique peut être utilisé dans le cadre d’une licence CC BY 4.0 Inist-CNRS / Unless otherwise stated above, the content of this bibliographic record may be used under a CC BY 4.0 licence by Inist-CNRS / A menos que se haya señalado antes, el contenido de este registro bibliográfico puede ser utilizado al amparo de una licencia CC BY 4.0 Inist-CNRS

Notes:

Computer science; theoretical automation; systems

Accession Number:

edscal.19154371

Database:

PASCAL Archive

Further Information

Network processors (NPs) are widely used in many types of networking equipment due to their high performance and flexibility. For most NPs, software cache is used instead of hardware cache due to the chip area, cost and power constraints. Therefore, programmers should take full responsibility for software cache management which is neither intuitive nor easy to most of them. Actually, without an effective use of it, long memory access latency will be a critical limiting factor to overall applications. Prior researches like hardware multi-threading, wide-word accesses and packet access combination for caching have already been applied to help programmers to overcome this bottleneck. However, most of them do not make enough use of the characteristics of packet processing applications and often perform intraprocedural optimizations only. As a result, the binary codes generated by those techniques often get lower performance than that comes from hand-tuned assembly programming for some applications. In this paper, we propose an algorithm including two techniques - Critical Path Based Analysis (CPBA) and Global Adaptive Localization (GAL), to optimize the software cache performance of packet processing applications. Packet processing applications usually have several hot paths and CPBA tries to insert localization instructions according to their execution frequencies. For further optimizations, GAL eliminates some redundant localization instructions by interprocedural analysis and optimizations. Our algorithm is applied on some representative applications. Experiment results show that it leads to an average speedup by a factor of 1.974.

Result: Optimizing software cache performance of packet processing applications

Further Information

Links

Additional functions