Result: Multi-scale analysis of large distributed computing systems

Title:

Multi-scale analysis of large distributed computing systems

Authors:

Mello Schnorr, Lucas, Legrand, Arnaud, Vincent, Jean-Marc

Contributors:

Middleware efficiently scalable (MESCAL), Centre Inria de l'Université Grenoble Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Informatique de Grenoble (LIG), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS), ANR-08-SEGI-0022,USS-SimGrid,Simulation extrêmement extensible avec SimGrid(2008)

Source:

Proceedings of the third international workshop on Large-scale system and application performance. :27-34

Publisher Information:

CCSD; ACM, 2011.

Publication Year:

2011

Collection:

collection:UGA
collection:CNRS
collection:INRIA
collection:UNIV-GRENOBLE1
collection:UNIV-PMF_GRENOBLE
collection:INPG
collection:INRIA-RHA
collection:LIG
collection:INRIA_TEST
collection:TESTALAIN1
collection:INRIA2
collection:INRIA-RENGRE
collection:ANR
collection:LIG_SIDCH
collection:TEST-UGA

Subject Terms:

BOINC, Simgrid, Triva, Volunteer computing, Resource usage anomalies, Performance visualization analysis, Large-scale distributed systems, Grid computing, Cloud computing, ACM: I.: Computing Methodologies, I.6: SIMULATION AND MODELING, I.6.7: Simulation Support Systems, ACM: C.: Computer Systems Organization, C.4: PERFORMANCE OF SYSTEMS, [INFO.INFO-DC]Computer Science [cs], Distributed, Parallel, and Cluster Computing [cs.DC]

Subject Geographic:

San Jose, CA, United States

Original Identifier:

HAL:

Document Type:

Conference conferenceObject<br />Conference papers

Language:

English

Relation:

info:eu-repo/semantics/altIdentifier/doi/10.1145/1996029.1996037

DOI:

10.1145/1996029.1996037

Access URL:

https://inria.hal.science/inria-00627754
https://inria.hal.science/inria-00627754v1/document
https://inria.hal.science/inria-00627754v1/file/2011-lsap-schnorr.pdf

Rights:

info:eu-repo/semantics/OpenAccess

Accession Number:

edshal.inria.00627754v1

Database:

HAL

Further Information

Large scale distributed systems are composed of many thousands of computing units. Today's examples of such systems are grid, volunteer and cloud computing platforms. Generally, their analyses are done through monitoring tools that gather resource information like processor or network utilization, providing high-level statistics and basic resource usage traces. Such approaches are recognized as rather scalable but are unfortunately often insufficient to detect or fully understand unexpected behavior. In this paper, we investigate the use of more detailed tracing techniques --commonly used in parallel computing-- in distributed systems. Finely analyzing the behavior of such systems comprising thousands of resources over several months may seem infeasible. Yet, we show that the resulting trace can be analyzed using tools that enable to easily zoom in and out on selected area of space and time. We use the BOINC volunteer computing system as a basis of this study. Since detailed activity traces of the BOINC clients are not available yet, we rely instead on traces obtained through a BOINC simulator developed with the SimGrid toolkit and which uses as input real availability trace files from the Seti@Home BOINC project. We show that the analysis of such detailed resource utilization traces provides several non-trivial insights about the whole system and enables the discovery of unexpected behavior.

Result: Multi-scale analysis of large distributed computing systems

Further Information

Links

Additional functions