Treffer: Minimal Unroll Factor for Code Generation of Software Pipelining

Title:
Minimal Unroll Factor for Code Generation of Software Pipelining
Contributors:
Architectures, Languages and Compilers to Harness the End of Moore Years (ALCHEMY), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Centre Inria de Saclay, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Models and methods of analysis and optimization for systems with real-time and embedding constraints (AOSTE), Centre Inria d'Université Côte d'Azur, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED), Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UniCA), Trinity College Dublin, Parallélisme de Kahn Synchrone (Parkas), Département d'informatique - ENS-PSL (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), DIGITEO (contrat numéro 2007-10D).
Source:
International Journal of Parallel Programming, 2012, ⟨10.1007/s10766-012-0203-z⟩
Publisher Information:
CCSD; Springer Verlag, 2012.
Publication Year:
2012
Collection:
collection:UNICE
collection:ENS-PARIS
collection:EC-PARIS
collection:CNRS
collection:INRIA
collection:UNIV-PSUD
collection:INRIA-SOPHIA
collection:INRIA-ROCQ
collection:INRIA-SACLAY
collection:I3S
collection:INRIASO
collection:INRIA_TEST
collection:AOSTE
collection:TESTALAIN1
collection:UMR8623
collection:INRIA2
collection:PSL
collection:UNIV-PARIS-SACLAY
collection:UNIV-PSUD-SACLAY
collection:INRIA-PSL
collection:UNIV-COTEDAZUR
collection:ENS-PSL
collection:DIENS
collection:TEST-NICE
Original Identifier:
HAL: hal-00764521
Document Type:
Zeitschrift article<br />Journal articles
Language:
English
ISSN:
0885-7458
1573-7640
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1007/s10766-012-0203-z
DOI:
10.1007/s10766-012-0203-z
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.hal.00764521v1
Database:
HAL

Weitere Informationen

We address the problem of generating compact code from software pipelined loops. Although software pipelining is a powerful technique to extract fine- grain parallelism, it generates lifetime intervals spanning multiple loop iterations. These intervals require periodic register allocation (also called variable expansion), which in turn yields a code generation challenge. We are looking for the minimal unrolling factor enabling the periodic register allocation of software pipelined kernels. This challenge is generally addressed through one of: (1) hardware support in the form of rotating register files, which solve the unrolling problem but are expensive in hard- ware; (2) register renaming by inserting register moves, which increase the number of operations in the loop, and may damage the schedule of the software pipeline and reduce throughput; (3) post-pass loop unrolling that does not compromise throughput but often leads to impractical code growth. The latter approach relies on the proof that MAXLIVE registers (maximal number of values simultaneously alive) are sufficient for periodic register allocation (Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995; Hendren et al. in CC '92: Proceedings of the 4th International Conference on Compiler Construction, pages 176-191, London, UK, 1992). However, the best existing heuristic for controlling this code growth--modulo variable expansion (Lam in SIGPLAN Not 23(7):318-328, 1988)--may not apply the correct amount of loop unrolling to guarantee that MAXLIVE registers are enough, which may result in register spills Eisenbeis et al. in PACT '95: Proceedings of the IFIP WG10.3 working conference on Parallel Architectures and Compilation Techniques, pages 264-267, Manchester, UK, 1995. This paper presents our research results on the open problem of minimal loop unrolling, allowing a software-only code generation M. Bachir * S.-A.-A. Touati (B) * F. Brault * D. Gregg * A. Cohen University of Nice Sophia-Antipolis, Nice, France e-mail: Sid.Touati@inria.fr 123  that does not trade the optimality of the initiation interval ( I I ) for the compactness of the generated code. Our novel idea is to use the remaining free registers after periodic register allocation to relax the constraints on register reuse. The problem of minimal loop unrolling arises either before or after software pipelining, either with a single or with multiple register types (classes). We provide a formal problem definition for each scenario, and we propose and study a dedicated algorithm for each problem. Our solu- tions are implemented within an industrial-strength compiler for a VLIW embedded processor from STMicroelectronics, and validated on multiple benchmarks suites.