Session: S01: Production Linux Clusters 2003 - Architecture and System Software for Serious Computing
  Title: S1: Production Linux Clusters 2003 - Architecture and System Software for Serious Computing
  Chair: Remy Evard (Argonne National Laboratory )
  Time: Sunday, November 16, 8:30AM - 5:00PM
  Rm #: 19-20
  Speaker(s)/Author(s):  
  Remy Evard (Argonne National Laboratory), Susan Coghlan (Argonne National Laboratory), Peter Beckman (Argonne National Laboratory), William Saphir (none)
   
  Description:
  Content-Level: 40% Introductory 50% Intermediate 10% Advanced

Abstract: Linux clusters have become the dominant computing platform for small and mid range computing, and have substantial penetration into the upper echelon of the top500 list. Clusters are available from dozens of vendors and there are even more ways to run them. However, due in large part to the huge range of hardware and software options for building clusters, clusters still require a great deal of expertise to plan, deploy, and support. Building a complete, robust, and easily-managed production cluster is still a significant challenge today.

This tutorial will explain how to design your next cluster, plan for it, buy it, install it, run it, manage it, evaluate performance, and keep users happy on it. We will consider current hardware, describe proven management techniques, and discuss several modern cluster software systems while attempting to remain distribution and package neutral. Our goal is not to talk about how to cobble cheap PCs into a fast computer or to advocate a specific package, but to focus on making your next production supercomputer a Linux cluster.

This tutorial is a full-day tutorial. The handouts include practical, current information that can be directly applied to cluster selection and management.
  Link: Download PDF