Result: Near-Optimal Placement of MPI processes on Hierarchical NUMA Architectures
Title:
Near-Optimal Placement of MPI processes on Hierarchical NUMA Architectures
Authors:
Contributors:
Efficient runtime systems for parallel architectures (RUNTIME), Centre Inria de l'Université de Bordeaux, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Pasqua D'Ambra and Mario Rosario Guarracino and Domenico Talia, Plafrim
Source:
Europar. :199-210
Publisher Information:
CCSD; Springer, 2010.
Publication Year:
2010
Collection:
collection:CNRS
collection:INRIA
collection:ENSEIRB
collection:INRIA-BORDEAUX
collection:LABRI
collection:UNIV-BORDEAUX
collection:INRIA_TEST
collection:TESTALAIN1
collection:INRIA2
collection:PLAFRIM
collection:UNIVERSITE-BORDEAUX
collection:INRIA
collection:ENSEIRB
collection:INRIA-BORDEAUX
collection:LABRI
collection:UNIV-BORDEAUX
collection:INRIA_TEST
collection:TESTALAIN1
collection:INRIA2
collection:PLAFRIM
collection:UNIVERSITE-BORDEAUX
Subject Terms:
Subject Geographic:
Original Identifier:
HAL:
Document Type:
Conference
conferenceObject<br />Conference papers
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-642-15291-7_20
DOI:
10.1007/978-3-642-15291-7_20
Access URL:
Rights:
info:eu-repo/semantics/OpenAccess
Accession Number:
edshal.inria.00544346v1
Database:
HAL
Further Information
MPI process placement can play a deterministic role concerning the application performance. This is especially true with nowadays architecture (heterogenous, multicore with different level of caches, etc.). In this paper, we will describe a novel algorithm called TreeMatch that maps processes to resources in order to reduce the communication cost of the whole application. We have implemented this algorithm and will discuss its performance using simulation and on the NAS benchmarks.