Networking
           
Event Type Start Time End Time Rm # Chair  

 

Paper 10:30AM 11:00AM 36-37 Dhabaleswar Panda (The Ohio State University)
 
Title:

A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC
  Speakers/Presenter:
Ranjesh G. Jaganathan (Clemson University), Keith D. Underwood (Sandia National Laboratories), Ron R. Sass (Clemson University)

 

Paper 11:00AM 11:30AM 36-37 Dhabaleswar Panda (The Ohio State University)
 
Title:

Optimizing 10-Gigabit Ethernet in Networks of Workstations, Clusters, and Grids: A Case Study
  Speakers/Presenter:
Wu-chun Feng (Los Alamos National Laboratory), Justin (Gus) Hurwitz (Los Alamos National Laboratory), Harvey B. Newman (California Institute of Technology), Sylvain Ravot (California Institute of Technology), Roger Les Cottrell (Stanford Linear Accelerator Center), Olivier Martin (CERN), Fabrizio Coccetti (Stanford Linear Accelerator Center), Cheng Jin (California Institute of Technology), David Wei (California Institute of Technology), Steven Low (California Institute of Technology)

 

Paper 11:30AM 12:00PM 36-37 Dhabaleswar Panda (The Ohio State University)
 
Title:

Scalable Hardware-Based Multicast Trees
  Speakers/Presenter:
Salvador Coll (Technical University of Valencia), Jose Duato (Technical University of Valencia), Fabrizio Petrini (Los Alamos National Laboratory), Francisco J. Mora (Technical University of Valencia)
             

 

     
  Session: Networking
  Title: A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC
  Chair: Dhabaleswar Panda (The Ohio State University)
  Time: Wednesday, November 19, 10:30AM - 11:00AM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Ranjesh G. Jaganathan (Clemson University), Keith D. Underwood (Sandia National Laboratories), Ron R. Sass (Clemson University)
   
  Description:
  The high overhead of generic protocols like TCP/IP provides strong motivation for the development of a better protocol architecture for cluster-based parallel computers. Reconfigurable computing has a unique opportunity to contribute hardware level protocol acceleration while retaining the flexibility to adapt to changing needs. This paper focuses on work to create a set of parameterizable components that can be put together as needed to obtain a customized protocol for each application. To study the feasibility of such an architecture, hardware components were built that can be stitched together as needed to provide the required functionality. Feasibility is demonstrated using four different protocol configurations, namely: (1) unreliable packet transfer; (2) reliable, unordered message transfer without duplicate elimination; (3) reliable, unordered message transfer with duplicate elimination; and (4) reliable, ordered message transfer with duplicate elimination. The different configurations illustrate trade-offs between chip space and functionality while reducing processor overhead.
  Link: Download PDF
   

 

     
  Session: Networking
  Title: Optimizing 10-Gigabit Ethernet in Networks of Workstations, Clusters, and Grids: A Case Study
  Chair: Dhabaleswar Panda (The Ohio State University)
  Time: Wednesday, November 19, 11:00AM - 11:30AM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Wu-chun Feng (Los Alamos National Laboratory), Justin (Gus) Hurwitz (Los Alamos National Laboratory), Harvey B. Newman (California Institute of Technology), Sylvain Ravot (California Institute of Technology), Roger Les Cottrell (Stanford Linear Accelerator Center), Olivier Martin (CERN), Fabrizio Coccetti (Stanford Linear Accelerator Center), Cheng Jin (California Institute of Technology), David Wei (California Institute of Technology), Steven Low (California Institute of Technology)
   
  Description:
  This paper presents a case study of the 10-Gigabit Ethernet (10GigE) adapter from Intel. Specifically, with appropriate optimizations to the configurations of the 10GigE adapter and TCP, we demonstrate that the 10GigE adapter can perform well in local-area, storage-area, system-area, and wide-area networks.

In local-area, storage-area, and system-area networks, we achieved over 4-Gb/s end-to-end throughput and 20-us end-to-end latency between applications running on less capable, lower-end PCs. In the wide-area network, we broke the recently-set Internet2 Land Speed Record by 2.5 times by sustaining an end-to-end TCP/IP throughput of 2.38 Gb/s between Sunnyvale, California and Geneva, Switzerland (i.e., 10,037 kilometers). Thus, the above results indicate that 10GigE may be a cost-effective solution across a multiude of network environments.
  Link: Download PDF
   

 

     
  Session: Networking
  Title: Scalable Hardware-Based Multicast Trees
  Chair: Dhabaleswar Panda (The Ohio State University)
  Time: Wednesday, November 19, 11:30AM - 12:00PM
  Rm #: 36-37
  Speaker(s)/Author(s):  
  Salvador Coll (Technical University of Valencia), Jose Duato (Technical University of Valencia), Fabrizio Petrini (Los Alamos National Laboratory), Francisco J. Mora (Technical University of Valencia)
   
  Description:
  This paper presents an algorithm for implementing optimal hardware-based multicast trees, on networks that provide hardware support for collective communication. Although the underlying methodology is general enough to be applied in other, present and future, technologies, the Quadrics network has been chosen as state-of-the-art interconnect where applying hardware-based multicast trees. The proposed mechanism is intended to improve the performance of collective communication patterns, in those cases where the hardware support cannot be directly used, for instance, due to some faulty nodes. This scheme provides significant reduction on multicast latencies compared to the system primitives, which use multicast trees based on unicast communication. A backtracking algorithm to find the optimal solution to the problem is presented. In addition, a greedy algorithm is presented and shown to provide near-optimal solutions. Finally, our experimental results show the good performance and scalability of the proposed multicast tree in comparison to traditional unicast-based multicast trees.
  Link: Download PDF