This lesson is in the early stages of development (Alpha version)

Running HemeLB on HPC systems

HemeLB is a 3D blood flow simulation code based on the lattice Boltzmann method. It is an open-source code built using C++ and MPI and has demonstrated excellent scaling performance on some of the largest and fastest supercomputers on the planet. One particular challenge for simulating the typically sparse domains characteristic of blood vessels is dealing with the sparse domain space - for a bounding box of a given domain maybe 1% (and often much less) actually consists of fluid that you are interested in studying. During its development, HemeLB has been specifically optimised to efficiently study such domains. The full feature version of HemeLB can be found here. However for this lesson, we recommend using the HemePure example - a version of HemeLB that has further optimisations for scalable simulation on CPU based machines.

This workshop is specifically aimed at running HemeLB on an HPC system. You may be running HemeLB on either a desktop, laptop or already on an HPC system, however ineffective use of HemeLB can lead to running jobs for longer than necessary. Being able configure HemeLB on an HPC system effectively can speed up simulations significantly an improve it’s performance. This workshop will look to address these issues.

Some questions that you may ask yourself are;

If you have asked the any of above questions, then you might be a good candidate for taking this course.

An HPC system is a complex computing platform that usually has several hardware components. Terms that might be familiar are CPU, RAM and GPU since you can find these in your own laptop or server. There are other commonly used terms such as “shared memory”, “distributed computing”, “accelerator”, “interconnect” and “high performance storage” that may be a little less familiar. In this course we will try to cover the subset of these that are relevant to your use case with HemeLB.

On any HPC system with a variety of hardware components, software performance will vary depending on what components it is using, and how optimized the code is for those components. There are usually no such complications on a standard desktop or laptop, running on an HPC is very, very different.

Note

  • This is the draft HPC Carpentry release. Comments and feedback are welcome.

Prerequisites

  • Basic experience with working on an HPC system is required. If you are new to these these types of systems we recommend participants to go through the Introduction to High-Performance Computing from HPC Carpentry.
  • You should have some familiarity with the concepts behind MPI and OpenMP as they are useful tools for benchmarking and scalability studies.
  • You should be familiar with working with HemeLB, how to install it and how to run a basic HemeLB simulation. For running on HPC systems and submitting a bash script, you can refer to the HemeLB documentation

Schedule

Setup Download files required for the lesson
00:00 1. Why bother with performance? What is software performance?
Why is software performance important?
How can performance be measured?
What is meant by flops, walltime and CPU hours?
How can performance be enhanced?
How can I use compute resources effectively?
00:15 2. Connecting performance to hardware How can I use the hardware I have access to?
What is the difference between OpenMP and MPI?
How can I use GPUs and/or multiple nodes?
00:35 3. Benchmarking and Scaling What is benchmarking?
How do I do a benchmark?
What is scaling?
How do I perform a scaling analysis?
01:20 4. Bottlenecks in HemeLB How can I identify the main bottlenecks in HemeLB?
How do I come up with a strategy for minimising the impact of bottlenecks on my simulation?
What is load balancing?
02:45 5. Accelerating HemeLB What are the various options to accelerate HemeLB?
What accelerator tools are compatible with which hardware?
04:45 6. GPUs with HemeLB Why do we need GPUs?
What is CUDA programming?
How is GPU code profiled?
How do I use the HemeLB library on a GPU?
What is the performance like?
What is a 1-to-1, 2-to-1 relationship in GPU programming
05:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.