Performance Analysis and Modeling
           
Event Type Start Time End Time Rm # Chair  

 

Paper 1:30PM 2:00PM 38-39 Adolfy Hoisie (Los Alamos National Lab)
 
Title:

The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q
  Speakers/Presenter:
Fabrizio Petrini (Los Alamos National Laboratory), Darren J. Kerbyson (Los Alamos National Laboratory), Scott Pakin (Los Alamos National Laboratory)

 

Paper 2:00PM 2:30PM 38-39 Adolfy Hoisie (Los Alamos National Lab)
 
Title:

Early Evaluation of the Cray X1
  Speakers/Presenter:
Thomas H. Dunigan, Jr. (Oak Ridge National Laboratory), Mark R. Fahey (Oak Ridge National Laboratory), James B. White III (Oak Ridge National Laboratory), Patrick H. Worley (Oak Ridge National Laboratory)

 

Paper 2:30PM 3:00PM 38-39 Adolfy Hoisie (Los Alamos National Lab)
 
Title:

Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
  Speakers/Presenter:
Leonid Oliker (Lawrence Berkeley National Laboratory), Andrew Canning (Lawrence Berkeley National Laboratory), Jonathan Carter (Lawrence Berkeley National Laboratory), John Shalf (Lawrence Berkeley National Laboratory), David Skinner (Lawrence Berkeley National Laboratory), Stephane Ethier (Princeton University), Rupak Biswas (NASA Ames Research Center), Jahed Djomehri (Computer Sciences Corporation), Rob Van der Wijngaart (Computer Sciences Corporation)
             

 

     
  Session: Performance Analysis and Modeling
  Title: The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q
  Chair: Adolfy Hoisie (Los Alamos National Lab)
  Time: Wednesday, November 19, 1:30PM - 2:00PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Fabrizio Petrini (Los Alamos National Laboratory), Darren J. Kerbyson (Los Alamos National Laboratory), Scott Pakin (Los Alamos National Laboratory)
   
  Description:
  In this paper we describe how we improved the effective performance of ASCI Q, the world's second-fastest supercomputer, to meet our expectations. Using an arsenal of performance-analysis techniques including analytical models, custom microbenchmarks, full applications, and simulators, we succeeded in observing a serious -- but previously undetectable -- performance problem. We identified the source of the problem, eliminated the problem, and "closed the loop" by demonstrating improved application performance. We present our methodology and provide insight into performance analysis that is immediately applicable to other large-scale cluster-based supercomputers.

This paper has been nominated for the Best Paper of SC2003 award.
  Link: Download PDF
   

 

     
  Session: Performance Analysis and Modeling
  Title: Early Evaluation of the Cray X1
  Chair: Adolfy Hoisie (Los Alamos National Lab)
  Time: Wednesday, November 19, 2:00PM - 2:30PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Thomas H. Dunigan, Jr. (Oak Ridge National Laboratory), Mark R. Fahey (Oak Ridge National Laboratory), James B. White III (Oak Ridge National Laboratory), Patrick H. Worley (Oak Ridge National Laboratory)
   
  Description:
  Oak Ridge National Laboratory installed a 32 processor Cray X1 in March, 2003, and will have a 256 processor system installed by October, 2003. In this paper, we describe our initial evaluation of the X1 architecture, focusing on microbenchmarks, kernels, and application codes that highlight the performance characteristics of the X1 architecture and indicate how to use the system most efficiently.
  Link: Download PDF
   

 

     
  Session: Performance Analysis and Modeling
  Title: Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
  Chair: Adolfy Hoisie (Los Alamos National Lab)
  Time: Wednesday, November 19, 2:30PM - 3:00PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Leonid Oliker (Lawrence Berkeley National Laboratory), Andrew Canning (Lawrence Berkeley National Laboratory), Jonathan Carter (Lawrence Berkeley National Laboratory), John Shalf (Lawrence Berkeley National Laboratory), David Skinner (Lawrence Berkeley National Laboratory), Stephane Ethier (Princeton University), Rupak Biswas (NASA Ames Research Center), Jahed Djomehri (Computer Sciences Corporation), Rob Van der Wijngaart (Computer Sciences Corporation)
   
  Description:
  The growing gap between sustained and peak performance for scientific applications is a well-known problem in high end computing. The recent development of parallel vector systems offers the potential to bridge this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of scientific computing areas. First, we present the performance of a microbenchmark suite that examines low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Results demonstrate that the SX-6 achieves high performance on a large fraction of our applications and often significantly outperforms the cache-based architectures. However, certain applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively.
  Link: Download PDF