This lesson is in the early stages of development (Alpha version)

Running HemeLB on HPC systems

Key Points

Why bother with performance?
  • Software performance is the use of computational resources effectively to reduce runtime

  • Understanding performance is the best way of utilising your HPC resources efficiently

  • Performance can be measured by looking at flops, walltime, and CPU hours

  • There are many ways of enhancing performance, and there is no single ‘correct’ way. The performance of any software will vary depending on the tasks you want it to undertake.

Connecting performance to hardware
  • OpenMP works on a single node, MPI can work on multiple nodes

Benchmarking and Scaling
  • Benchmarking is a way of assessing the performance of a program or set of programs

  • Strong scaling indicates how the quickly a problem of fixed size can be solved with differing number of cores on a given machine

Bottlenecks in HemeLB
  • The best way to identify bottlenecks is to run different benchmarks on a smaller system and compare it to a representative system

  • Effective load balancing is being able to distribute an equal amount of work across processes.

  • Evaluate both simulation time and overall wall time to determine if improved load balance leads to more efficient performance.

  • The choice of simulation parameters impacts both the accuracy and stability of a simulation as well as the time needed to complete a simulation. This is especially true for solvers using explicit techniques such as HemeLB.

  • For all applications, writing data to file can be a time consuming activity. Carefully consider what data you need for post-processing to minimise the time spent in this regime.

Accelerating HemeLB
  • The use of efficiently vectorised instructions can effectively speed-up simulations compared to default options, however they may not be available on all CPUs

  • Simulation monitoring in HemeLB can require a significant amount of time to complete. If a set of simulation parameters is known to be stable and acceptably accurate, removal of this feature will accelerate simulation time. It is recommended to be kept in place to catch unstable simulations when new geometries or simulation parameters are being investigated.

GPUs with HemeLB
  • Knowing the capabilities of your host, device and if you can use a CUDA-aware MPI runtime is required before starting a GPU run

https://psteinb.github.io/hpc-in-a-day/bo-01-bonus-mpi-for-pi/

https://fzj-jsc.github.io/tuning_lammps/01-why-bother-with-performance/index.html

https://fzj-jsc.github.io/tuning_lammps/02-hardware-performance/index.html