Performance Measurement and Analysis
           
Event Type Start Time End Time Rm # Chair  

 

Paper 3:30PM 4:00PM 38-39 Jeffrey Vetter (LLNL)
 
Title:

Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems
  Speakers/Presenter:
Robert W. Wisniewski (IBM T.J. Watson Research), Bryan Rosenburg (IBM T.J. Watson Research)

 

Paper 4:00PM 4:30PM 38-39 Jeffrey Vetter (LLNL)
 
Title:

Memory Profiling using Hardware Counters
  Speakers/Presenter:
Marty Itzkowitz (Sun Microsystems), Brian J.N. Wylie (Sun Microsystems), Christopher Aoki (Sun Microsystems), Nicolai Kosche (Sun Microsystems)

 

Paper 4:30PM 5:00PM 38-39 Jeffrey Vetter (LLNL)
 
Title:

Identifying and Exploiting Spatial Regularity in Data Memory References
  Speakers/Presenter:
Tushar Mohan (Lawrence Berkeley National Lab), Bronis R. de Supinski (LLNL), Sally A. McKee (CSL Cornell), Frank Mueller (NCSU), Andy Yoo (LLNL), Martin Schulz (CSL, Conell)
             

 

     
  Session: Performance Measurement and Analysis
  Title: Efficient, Unified, and Scalable Performance Monitoring for Multiprocessor Operating Systems
  Chair: Jeffrey Vetter (LLNL)
  Time: Thursday, November 20, 3:30PM - 4:00PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Robert W. Wisniewski (IBM T.J. Watson Research), Bryan Rosenburg (IBM T.J. Watson Research)
   
  Description:
  Programming, understanding, and tuning the performance of large multiprocessor operating systems is challenging. Crucial to achieving good performance is understanding the system's behavior. We have developed an efficient, unified, and scalable tracing infrastructure that allows for correctness debugging, performance debugging, and performance monitoring of an operating system. The infrastructure allows variable-length events to be logged without locking and provides random access to the event stream. The infrastructure allows cheap and parallel logging of events by applications, libraries, servers, and the kernel. The infrastructure was designed for K42, a new open-source research kernel designed to scale near perfectly on large cache-coherent 64-bit multiprocessor systems. The techniques are generally applicable, and have been integrated into LTT (Linux Trace Toolkit). We describe the implementation of the infrastructure, how we used the facility, e.g., analyzing lock contention, to understand and achieve K42's scalable performance, and the lessons we learned. The infrastructure has been invaluable.
  Link: Download PDF
   

 

     
  Session: Performance Measurement and Analysis
  Title: Memory Profiling using Hardware Counters
  Chair: Jeffrey Vetter (LLNL)
  Time: Thursday, November 20, 4:00PM - 4:30PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Marty Itzkowitz (Sun Microsystems), Brian J.N. Wylie (Sun Microsystems), Christopher Aoki (Sun Microsystems), Nicolai Kosche (Sun Microsystems)
   
  Description:
  Although memory performance is often a limiting factor in application performance, most tools only show performance data relating to the instructions in the program, not to its data. In this paper, we describe a technique for directly measuring the memory profile of an application. We describe the tools and their user model, and then discuss a particular code, the MCF benchmark from SPEC CPU 2000. We show performance data for the data structures and elements, and discuss the use of the data to improve program performance. Finally, we discuss extensions to the work to provide feedback to the compiler for prefetching and to generate additional reports from the data.
  Link: Download PDF
   

 

     
  Session: Performance Measurement and Analysis
  Title: Identifying and Exploiting Spatial Regularity in Data Memory References
  Chair: Jeffrey Vetter (LLNL)
  Time: Thursday, November 20, 4:30PM - 5:00PM
  Rm #: 38-39
  Speaker(s)/Author(s):  
  Tushar Mohan (Lawrence Berkeley National Lab), Bronis R. de Supinski (LLNL), Sally A. McKee (CSL Cornell), Frank Mueller (NCSU), Andy Yoo (LLNL), Martin Schulz (CSL, Conell)
   
  Description:
  The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. Strided memory accesses forming streams can be targeted by optimizations such as prefetching, relocation, remapping, and vector loads. Undetected, they can be a significant source of memory stalls in loops. The concept of locality fails to capture the existence of streams in a program's memory accesses.

First, we define spatial regularity as a means to discuss the presence and effects of streams. Second, we develop measures to quantify spatial regularity, and we design and implement an on-line, parallel algorithm to detect streams in running applications. Third, we use examples from real codes and common benchmarks to illustrate how derived stream statistics can be used to guide the application of profile-driven optimizations. Overall, we demonstrate the benefits of our novel regularity metric as an instrument to detect potential for optimizations affecting memory performance.

This paper has been nominated for the Best Paper of SC2003 award.
  Link: Download PDF