HPC for the Windows IT Professional

Introduction

This three-day instructor-led course provides students with the background, knowledge and skills to setup and administer high-performance clusters running Microsoft Windows HPC Server 2008 R2.

Audience

This course is intended for IT Professionals with experience on the Windows platform. No background in the field of high-performance computing is required.

Module 1: Introduction to High-Performance Computing and HPC Server 2008 R2

This module introduces the field of high-performance computing, and the Microsoft product Windows HPC Server 2008 R2.

Lessons
  • The business case for HPC
  • Brief product history
  • Overview of HPC Server 2008 R2 – main components, job submission, job scheduler
  • HPC resources: cores vs. sockets vs. nodes
  • Product differentiators
  • The goal of your developers’ linear speedup
Lab 01: Introduction to HPC And Windows HPC Server 2008 R2
  • Submitting jobs
  • Monitoring job execution
  • Measuring performance
After completing this module, students will be able to:
  • Understand the motivation for HPC and HPC Server 2008 R2.
  • Identify the main components of an HPC Server 2008 R2 cluster.
  • Submit and monitor a job.
  • Measure performance.

Module 2: The HPCS Job Scheduler in depth

This module discusses the heart of Windows HPC Server 2008 R2 — the Job Scheduler.

Lessons
  • Throughput vs. performance
  • Jobs vs. Tasks
  • Job and task states
  • Pre and post tasks
  • The definition of a failed job / task, what does “auto re-starting upon failure” really mean
    Default scheduling policies
  • Job-level vs. task-level policies
  • The impact of job priorities, job preemption, and dynamic growing / shrinking
  • Email notifications
Lab 02: The Job Scheduler
  • Investigating job parameters
  • Investigating task parameters
  • Observing the job scheduler in action
  • Receiving email notifications
After completing this module, students will be able to:
  • Understand the Job Scheduler and the various policies impacting its decision making
  • Configure jobs for a variety of scenarios
  • Configure tasks for a variety of scenarios
  • Understand the impact of job and task failures
  • Track jobs and tasks through the system
  • Configure email notification when jobs complete

Module 3: Interfacing with HPC Server

This module demonstrates the various ways you can interface with Windows HPC Server 2008 R2. These include the Cluster Manager for administrators, the Job Manager for users, DOS, and PowerShell.

Lessons
  • Cluster Manager
  • Job Manager
  • Job Description Files
  • clusrun
  • Console window
  • PowerShell
  • Scripting
Lab 03: Interfacing with HPC Server 2008 R2
  • Customizing the Cluster Manager
  • Using the Job Manager
  • Clusrun is your administrative friend
  • Using the command line
  • Using PowerShell
  • Scripting
After completing this module, students will be able to:
  • Work comfortably in both the Cluster Manager and the Job Manager
  • Perform simple administrative tasks using clusrun
  • Access the cluster from the DOS command line
  • Access the cluster from the PowerShell command line
  • Write simple DOS and PowerShell scripts

Module 4: Basic cluster setup: from hardware to software

This module overviews the standard setup needs and procedures for an HPC Server 2008 R2 cluster. In particular, we focus first on a manual setup, and in the next module discuss automatic cluster setup, as well as other setup options.

Lessons
  • Hardware, physical and virtual
  • Software: Windows Server 2008 R2 editions, HPC Pack, HPC redistributables
  • Active Directory integration
  • SQL Server integration, both local and remote
  • Common groups, local directories, and network shares
  • Network topologies, DNS, and DHCP
  • Runtimes, software, and tools commonly needed by developers
  • Supporting remote debugging and tracing
  • Running some of the built-in diagnostics
Lab 04: Installing Windows HPC Server 2008 R2
  • Basic install of Windows HPC Server 2008 R2 on a small, virtual cluster
  • Testing setup via built-in diagnostics
After completing this module, students will be able to:
  • Understand the hardware and software requirements for cluster setup
  • Manually setup a small, virtual cluster running HPC Server 2008 R2
  • How to setup a cluster using local and remote SQL Server databasese
  • Configure a basic network topology and set of network shares
  • Setup basic group and user accounts for HPC access
  • Install commonly-needed run-times and other support code
  • Run built-in diagnostics to test cluster setup

Module 5: Other Setup Options

This module dives deeper into the setup of HPCS-based clusters, along with use of the more advanced setup features available in Windows HPC Server 2008 R2. These include automatic deployment via Windows Deployment Services (WDS), new product support for diskless booting, workstation scavenging of on-premise resource, and Windows Azure for off-premise resources.

Lessons
  • Automating cluster setup using Windows Deployment Services (WDS)
  • Installing and configuring high-speed networking hardware and drivers, e.g. InfiniBand
  • Configuring head node and broker node failover with Windows Server Failover Clustering
  • Setup and configuration for cycle scavenging of on-premise workstations
  • Setup and configuration for using off-premise compute resources via Windows Azure
  • Additional monitoring via System Center Operations ManagerP
  • Enabling support for Open Grid Forum’s basic web profile
Lab 05: Advanced Setup of Windows HPC Server 2008 R2
  • Automating setup with WDS
  • Configuring a Win7 workstation for cycle scavenging
After completing this module, students will be able to:
  • Setup clusters quickly using Windows Deployment Services
  • Install and test InfiniBand networking hardware and drivers
  • Understand how to configure the head node and broker nodes for fault tolerance
  • Understand how to configure HPCS and local workstations for cycle scavenging
  • Understand how to configure HPCS and Windows Azure for use of off-premise compute resources
  • Enable Open Grid Forum support for platform-neutral cluster interop

Module 6: Configuring HPC Server 2008 R2

This module discusses the most commonly-used configuration options in HPCS-based clusters. These include node groups, job templates, filters, job preemption, resource allocation, email notification, and the heat map.

Lessons
  • Node groups
  • Job templates
  • Job preemption
  • Dynamic resource allocation
  • Submission and activation filters
  • Email notifications
  • Customizing the heat map
  • Job history, job restarting
  • Applying configurations to HPC users and groups
Lab 06: Configuring Windows HPC Server 2008 R2
  • Creating node groups
  • Creating job templates
  • Setting access permissions on a job template
  • Customizing the heat map
  • Configuring job preemption and resource allocation
  • Installing job submission and activation filters
After completing this module, students will be able to:
  • Divide compute resources into node groups
  • Create and assign job templates to control what users can, and cannot, do
  • Divide users into groups, and assign these groups to job templates
  • Customize the heat map
  • Configure email notifications
  • Install job submission and activation filters

Module 7: Understanding HPC Developers and Their Applications

This module discusses the goals of HPC developers, and how you can support them in their efforts. The module helps you understand how your developers use clusters, and the types of applications they will be developing. The more you know about the various application types — parametric sweep, multi-threading, MPI, SOA, etc. — the more effective you will be as an HPC administrator.

Lessons
  • What applications can be run on the cluster
  • The software technologies developers will typically use
  • Sequential apps
  • Parametric sweep
  • SOA applications
  • HPC-based Excel apps using HPC Services for Excel 2010
  • Multi-threaded apps
  • GPU apps
  • MPI apps
  • UNIX apps
Lab 07: Understanding and Helping Your Developers
  • Running a multi-threaded application on one node
  • Running a parametric sweep application across the cluster
  • Running an MPI application across the cluster
  • Running a SOA application on part of the cluster
After completing this module, students will be able to:
  • Understand how developers use an HPC Server 2008 R2 cluster
  • Identify the key software technologies used to develop HPC Applications
  • Run various types of HPC applications on the cluster

Module 8: Cluster Maintenance, Performance Tuning, and Troubleshooting

This module presents the tools and techniques available for maintaining, tuning, and troubleshooting your HPC Server 2008 R2 cluster.

Lessons
  • Cluster maintenance: head node, broker nodes, compute nodes, and SQL Server
  • Installing service packs and other software updates
  • Performance tuning and maintenance: MPI ping-pong, SOA ping-pong, Lizard, and uSane
  • Initial troubleshooting via built-in diagnostics
  • Common job and task failures — from the command-line to licensing
  • App-specific failures with MPI and SOA
  • Remote desktop as a troubleshooting technique
  • Checking the Windows event log and WCF trace logs
  • Built-in charts and reports
Lab 08: Maintenance, Tuning, and Troubleshooting
  • Creating a SQL Server maintenance plan
  • Installing a service pack across the cluster
  • Running MPI ping-pong to check network performance
  • Running the built-in diagnostics
  • Investigating common job and task failures
  • Using remote desktop to identify failures
  • Exploring the Windows event log
After completing this module, students will be able to:
  • Maintain a Windows Server 2008 R2 cluster
  • Performance tune a Windows Server 2008 R2 cluster
  • Troubleshoot common problems on a cluster
  • Run the built-in diagnostics as well as create your own
  • Check the Windows event log
  • Run the built-in charts and reports for additional information

Module 9: Administrative Programming

This module presents basic administrative programming skills to simplify cluster maintenance and administrative chores.

Lessons
  • An overview of the HPC Server Job Scheduling and Data Reporting APIs
  • Defining your own, custom diagnostic tests
  • Defining your own, custom data report
  • Defining simple Job Submission and Activation filters
  • SharePoint integration
Lab 09: Administrative Programming
  • Writing a simple script to run MPI ping-pong
  • Writing your own diagnostic test
  • Writing your own data report
  • Writing your own job submission and activation filters
After completing this module, students will be able to:
  • Perform simple administrative chores programmatically.
  • Design and write their own diagnostic test.
  • Design and write their own data report.
  • Design and write their own job filters.
  • Design and write their own scripts for job submission and cluster monitoring.
  • Integrate HPC Server 2008 R2 with SharePoint.