Performance and Reliability
           
Event Type Start Time End Time Rm # Chair  

 

Paper 3:30PM 4:00PM 36-37 Bernd Mohr (Forschungszentrum Juelich)
 
Title:

Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
  Speakers/Presenter:
Jiuxing Liu (The Ohio State University), Balasubramanian Chandrasekaran (The Ohio State University), Jiesheng Wu (The Ohio State University), Weihang Jiang (The Ohio State University), Sushmitha Kini (The Ohio State University), Weikuan Yu (The Ohio State University), Darius Buntinas (The Ohio State University), Pete Wyckoff (Ohio Supercomputer Center), D. K. Panda (The Ohio State University)

 

Paper 4:00PM 4:30PM 36-37 Bernd Mohr (Forschungszentrum Juelich)
 
Title:

MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging
  Speakers/Presenter:
Aurelien Bouteiller (CNRS-LRI), Franck Cappello (INRIA-LRI), Thomas Herault (CNRS-LRI), Geraud Krawezik (CNRS-LRI), Pierre Lemarinier (CNRS-LRI), Frederic Magniette (CNRS-LRI)

 

Paper 4:30PM 5:00PM 36-37 Bernd Mohr (Forschungszentrum Juelich)
 
Title:

Hierarchical Dynamics, Interarrival Times, and Performance
  Speakers/Presenter:
Stephen D Kleban (Sandia National Laboratories), Scott H Clearwater (Sandia National Laboratories)
             

 

     
  Session: Performance and Reliability
  Title: Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
  Chair: Bernd Mohr (Forschungszentrum Juelich)
  Time: Tuesday, November 18, 3:30PM - 4:00PM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Jiuxing Liu (The Ohio State University), Balasubramanian Chandrasekaran (The Ohio State University), Jiesheng Wu (The Ohio State University), Weihang Jiang (The Ohio State University), Sushmitha Kini (The Ohio State University), Weikuan Yu (The Ohio State University), Darius Buntinas (The Ohio State University), Pete Wyckoff (Ohio Supercomputer Center), D. K. Panda (The Ohio State University)
   
  Description:
  In this paper, we present a comprehensive performance comparison of MPI implementations over InfiniBand, Myrinet and Quadrics. Our performance evaluation consists of two major parts. The first part consists of a set of MPI level micro-benchmarks that characterize different aspects of MPI implementations. The second part of the performance evaluation consists of application level benchmarks. We have used the NAS Parallel Benchmarks and the sweep3D benchmark. We not only present the overall performance results, but also relate application communication characteristics to the information we acquired from the micro-benchmarks. Our results show that the three MPI implementations all have their advantages and disadvantages. For our 8-node cluster, InfiniBand can offer significant performance improvements for a number of applications compared with Myrinet and Quadrics when using the PCI-X bus. Even with just the PCI bus, InfiniBand can still perform better if the applications are bandwidth-bound.
  Link: Download PDF
   

 

     
  Session: Performance and Reliability
  Title: MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging
  Chair: Bernd Mohr (Forschungszentrum Juelich)
  Time: Tuesday, November 18, 4:00PM - 4:30PM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Aurelien Bouteiller (CNRS-LRI), Franck Cappello (INRIA-LRI), Thomas Herault (CNRS-LRI), Geraud Krawezik (CNRS-LRI), Pierre Lemarinier (CNRS-LRI), Frederic Magniette (CNRS-LRI)
   
  Description:
  Execution of MPI applications on clusters and Grid deployments suffering from node and network failures motivates the use of fault tolerant MPI implementations.

We present MPICH-V2 (the second protocol of MPICH-V project), an automatic fault tolerant MPI implementation using an innovative protocol that removes the most limiting factor of the pessimistic message logging approach: reliable logging of in transit messages. MPICH-V2 relies on uncoordinated checkpointing, sender based message logging and remote reliable logging of message logical clocks.

This paper presents the architecture of MPICH-V2, its theoretical foundation and the performance of the implementation. We compare MPICH-V2 to MPICH-V1 and MPICH-P4 evaluating a) its point-to-point performance, b) the performance for the NAS benchmarks, c) the application performance when many faults occur during the execution. Experimental results demonstrate that MPICH-V2 provides performance close to MPICH-P4 for applications using large messages while reducing dramatically the number of reliable nodes compared to MPICH-V1.
  Link: Download PDF
   

 

     
  Session: Performance and Reliability
  Title: Hierarchical Dynamics, Interarrival Times, and Performance
  Chair: Bernd Mohr (Forschungszentrum Juelich)
  Time: Tuesday, November 18, 4:30PM - 5:00PM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Stephen D Kleban (Sandia National Laboratories), Scott H Clearwater (Sandia National Laboratories)
   
  Description:
  We report on a model of the distribution of job submission interarrival times in supercomputers. Interarrival times are modeled as a consequence of a complicated set of decisions between users, the queuing algorithm, and other policies. This cascading hierarchy of decision-making processes leads to a particular kind of heavy-tailed distribution. Specifically, hierarchically constrained systems suggest that fatter tails are due to more levels coming into play in the overall decision-making process. The key contribution of this paper is that heavier tails resulting from more complex decision-making processes, that is more hierarchical levels, will lead to overall worse performance, even when the average interarrival time is the same. Finally, we offer some suggestions for how to overcome these issues and the tradeoffs involved.
  Link: Download PDF