HPC Server for the Cluster Developer

Introduction

This five-day instructor-led course provides students with the knowledge and skills to develop high-performance applications for the Microsoft Windows HPC Server 2008 R2 platform. The course focuses on the design, implementation, debugging and tuning of applications using a variety of HPC technologies, including parametric sweep, multi-threading, SOA, Excel, GPUs, and MPI. Students also learn the ins and outs of Windows HPC Server 2008 R2, and how to interact with the product both interactively and programmatically.

Audience

This course is intended for experienced C, C++ or C# programmers familiar with the Windows platform and Visual Studio 2010.

Module 01: Introduction to High-Performance Computing and Windows HPC Server 2008 R2

This module introduces the field of High-Performance Computing, and the Microsoft product Windows HPC Server 2008 R2.

Lessons
  • The business case for HPC
  • Examples of HPC applications
  • What is HPC Server 2008 R2
  • Working with HPC Server 2008 R2 — batch vs. SOA
  • The main components of an HPCS 2008 R2 cluster
  • Evaluating parallel performance — speedup
  • Linear speedup and Amdahl’s law
Lab 01: Introduction to HPC and HPCS 2008 R2
  • Submitting jobs
  • Measuring speedup
  • The importance of the memory hierarchy
After completing this module, students will be able to:
  • Identify the main components of an HPC Server 2008 R2 cluster.
  • Submit jobs and tasks to HPC Server 2008 R2.
  • Measure speedup and evaluate parallel performance.
  • Appreciate the difficulty of cluster-wide HPC.

Module 02: Developing Software for HPC Server 2008 R2 – Designs, Technologies, and Challenges

This module presents a variety of technologies, designs, and challenges in developing HPC applications.

Lessons
  • The HPC technology landscape — multi-threading, GPUs, SOA, MPI, parametric sweep, …
  • Problem decomposition, communication, and consolidation
  • Common forms of parallelism and problem decompositions
  • Common challenges in HPC: I/O, load balancing, communication
  • HPC = parallelism + communication
Lab 02: Using Parametric Sweep to Solve a Real-world HPC Problem
  • Embarrassingly parallel problems
  • Parametric sweep as a simple but effective design technique
  • Parametric sweep in HPCS 2008 R2
After completing this module, students will be able to:
  • Evaluate the impact of different HPC technologies for software development.
  • Identify common forms of parallelism and problem decompositions.
  • Identify common challenge areas of HPC apps.
  • Apply parametric sweep in the design of cluster-wide HPC apps.

Module 03: Understanding HPC Server 2008 R2: the Job Scheduler

This module introduces the heart of HPC Server 2008 R2, the Job Scheduler.

Lessons
  • Jobs vs. Tasks
  • Pre and post tasks
  • Important job and task characteristics
  • Job scheduler state transitions
  • How jobs are scheduled onto resources
  • Policies that impact job scheduling — priority, preemption, job size, backfill, and more
  • Job fault tolerance
Lab 03: The HPCS 2008 R2 Job Scheduler
  • The job scheduler’s default behavior
  • Pre and post tasks
  • The impact of job size
  • Experimenting with job priorities, preemption policies, and dynamic growth
  • Job fault tolerance
After completing this module, students will be able to:
  • Design applications that interact properly with the job scheduler.
  • Submit jobs appropriately for various types of HPC applications.
  • Take advantage of pre and post tasks.
  • Design fault tolerant HPC applications.

Module 04: Multi-core and Multi-threading for HPC

This module presents multi-core and multi-threading as a means of HPC.

Lessons
  • An overview of multi-core processors and multi-threaded apps
  • VC++ techniques: Windows API, OpenMP, PPL
  • C# techniques: Thread class, TPL
  • Multi-threading in Visual Studio 2010
  • Scheduling multi-threaded apps on HPCS 2008 R2
Lab 04: Multi-threading for Performance
  • Multi-threading in VC++ with OpenMP and the PPL
  • Multi-threading in C# with the Thread class and the TPL
  • Running multi-threaded apps under HPCS 2008 R2
After completing this module, students will be able to:
  • Write multi-threaded apps in VC++.
  • Write multi-threading apps in C#.
  • Develop multi-threaded apps in Visual Studio 2010.
  • Properly schedule and run multi-threaded apps under HPC Server 2008 R2.

Module 05: Programming HPC Server 2008 R2

This module demonstrates a variety of techniques for programmatically interacting with HPC Server 2008 R2.

Lessons
  • Scripting — DOS and PowerShells
  • Programming via the HPCS 2008 R2 Scheduling API
  • HPCS 2008 R2 job and task environment variables
  • Showing job progress
  • Growing jobs dynamically
  • Implementing job fault tolerance
Lab 05: Programmatic Access via the HPCS 2008 R2 Scheduling API
  • Submitting jobs programmatically
  • Deploying data and applications
  • Showing job progress
  • Harvesting and displaying results
After completing this module, students will be able to:
  • Script against HPC Server 2008 R2.
  • Program against HPC Server 2008 R2.
  • Build client-side HPC front-ends for push-button execution on HPC Server 2008 R2 clusters.

Module 06: Cluster-wide HPC with MPI

This module motivates the need for MPI (Message-Passing Interface), explains the underlying architecture, and presents the basics of message-passing with MPI Send and Recv.

Lessons
  • Why MPI?
  • The architecture of MPI apps – SPMD
  • MPI and HPCS 2008 R2: mpiexec, shared memory optimizations, and network protocols
  • The semantics of MPI_Send and MPI_Recv
  • Submitting MPI-based workloads
Lab 06: Introduction to MPI
  • Running MPI apps under HPCS 2008 R2
  • Configuring Visual Studio 2010 for MPI development
  • MPI_Send
  • MPI_Recv
After completing this module, students will be able to:
  • Understand the use-cases for MPI.
  • Design and implement basic MPI applications.
  • Run MPI apps on an HPCS 2008 R2 cluster.

Module 07: Data Parallelism with MPI’s Collective Operations

This module dives deeper into MPI and presents MPI’s powerful support for data parallelism.

Lessons
  • Data parallelism – one of the most common use-cases for MPI
  • MPI’s collective operations: Broadcast, Scatter, Gather, Reduce, and more
  • Common pitfalls to avoid
Lab 07: MPI’s Collective Operations
  • Broadcast
  • Scatter
  • Gather
  • Reduce
  • Writing your own reduction operator
After completing this module, students will be able to:
  • Apply MPI in solving data parallel problems across the cluster.
  • Use MPI’s collective operations to develop more efficient, readable, and reliable MPI applications.
  • Avoid the most common pitfalls in MPI.

Module 08: MPI Application Design

This module discusses many of the design aspects of MPI applications – safety, non-blocking operations, RMA, and the use of managed code.

Lessons
  • Data parallelism – one of the most common use-cases for MPI
  • MPI’s collective operations: Broadcast, Scatter, Gather, Reduce, and more
  • Common pitfalls to avoid
Lab 08: MPI Application Design
  • MPI safety == avoiding deadlock
  • Non-blocking operations for hiding communication latencies
  • Remote memory access: sharing memory without explicit sends and receives
  • MPI.NET – developing MPI apps in managed code
After completing this module, students will be able to:
  • Deadlock: avoiding and detecting
  • Working with MPI’s non-blocking operations (MPI_Isend, MPI_Irecv, etc.)
  • Experimenting with RMA

Module 09: Working with MPI

This module presents tools and techniques for debugging, tracing, and performance tuning MPI applications.

Lessons
  • Local MPI debugging with Visual Studio 2010’s cluster debugger
  • Remote MPI debugging on an HPCS 2008 R2 cluster
  • MPI tracing with mpiexec, Jumpshot, and Vampir
  • Techniques for MPI performance tuning
Lab 09: MPI Debugging, Tracing, and Performance Tuning
  • Local and remote debugging with Visual Studio 2010’s cluster debugger
  • MPI tracing with mpiexec
  • Viewing a trace with Jumpshot and Vampir
  • Applying some performance tuning techniques
After completing this module, students will be able to:
  • Debug MPI applications on their local workstation.
  • Debug MPI applications remotely on an HPCS 2008 R2 cluster.
  • Trace MPI applications using mpiexe.
  • View MPI traces using Jumpshot and Vampir.
  • Apply some basic MPI performance tuning techniques.

Module 10: Working with MPI

This module introduces SOA in the realm of HPC and HPC Server 2008 R2.

Lessons
  • Motivation – why SOA in HPC?
  • Client and server-side components
  • The architecture of SOA-based apps in HPCS 2008 R2
  • The SOA execution model in HPCS 2008 R2g
Lab 10: An Intro to SOA for HPC
  • Running and observing an existing SOA app
  • Modifying and redeploying the service/li>
  • Modifying and running the client
After completing this module, students will be able to:
  • Understand the use-cases for SOA.
  • Understand the pros and cons of SOA for HPC.
  • Understand the architecture and execution model of SOA-based HPC apps running under HPC Server 2008 R2.

Creating High-Performance SOA Apps

This module explains in detail the 5 steps for creating SOA apps on HPC Server 2008 R2.

Lessons
  • The SOA execution model in more detail
  • Configuring an HPCS 2008 R2 cluster for SOA
  • Encapsulating your computation as a WCF service
  • Configuring and deploying your WCF service to the cluster
  • Connecting to the service from the client
  • Calling the service from the client
  • Various SOA design strategies: connected, durable, private vs. shared sessions, and more
Lab 11: Creating SOA-based HPC Apps
  • Turning a computation into a WCF service
  • Configuring a WCF service for execution with HPCS 2008 R2
  • Deploying a WCF service on an HPCS 2008 R2 clustert
  • Building a client to connect and call an HPC-based WCF service
After completing this module, students will be able to:
  • Design client and server-side SOA components for HPC Server 2008 R2.
  • Build, configure and deploy an HPC-based WCF service.
  • Build a client-side app to connect and call an HPC-based WCF service.
  • Understand the pros and cons of available SOA design strategies.

Module 12: Working with SOA

This module presents tools and techniques for debugging, tracing, and performance tuning SOA applications.

Lessons
  • SOA debugging with Visual Studio 2010’s cluster debugger
  • Techniques for tracing SOA apps: Windows event log, WCF tracing, and more
  • Techniques for SOA performance tuning
  • Designs and configurations for increasing SOA performance
Lab 11: Creating SOA-based HPC Apps
  • SOA debugging with Visual Studio 2010’s cluster debugger
  • SOA tracing via Windows event logs and WCF tracing
  • Applying some performance tuning techniques
After completing this module, students will be able to:
  • Design SOA applications for better performance.
  • Debug SOA applications on an HPCS 2008 R2 cluster.
  • Trace SOA applications.
  • Apply some basic SOA performance tuning techniques.

Module 13: HPC Services for Excel 2010

This module explains how to use HPC Server 2008 R2 and Excel 2010 for the parallel execution of spreadsheets across a cluster.

Lessons
  • Motivation – why cluster-based execution of spreadsheets?
  • Execution models: the three techniques
  • Workbook parallelization using an Excel-based VBA macro framework
  • Workbook parallelization using cluster UDFs
  • Workbook parallelization using HPCS 2008 R2 API
  • The pros and cons of each technique, and when to apply
Lab 13: HPC Services for Excel 2010
  • Parallelizing an Excel workbook using the VBA macro framework
  • Parallelizing an Excel workbook using a cluster UDF
After completing this module, students will be able to:
  • Understand the pros and cons of executing Excel workloads in parallel.
  • Decide which of the three techniques to use for Excel workbook parallelization.
  • Design and implement Excel-based apps around each of the techniques.

Module 14: Designing HPC Apps when resources fall outside the Cluster

This module explains how to use HPC Server 2008 R2 and Excel 2010 for the parallel execution of spreadsheets across a cluster.

Lessons
  • Motivation – why off-cluster and off-premise?
  • Off-cluster but on-premise: Windows 7 workstations
  • Off-premise: Azure-based machines and clusters in the cloud
  • HPCS 2008 R2 execution model for off-cluster and off-premise resources
  • Designing HPC apps for off-cluster and off-premise resources
After completing this module, students will be able to:
  • Understand the pros and cons of off-cluster and off-premise resources.
  • Understand when it makes sense to take advantage of such resources, and when it does not.
  • Design and build HPC apps that take advantage of off-cluster and off-premise resources.