This week Nick Johnson presented work from the Adept project at the EASC 2015 conference in Edinburgh, Scotland. The abstract for the presentation is here:
Energy measurement at the Exascale
Nick Johnson, James Perry and Michele Weiland
When exascale is being discussed, the figures of 1 Exaflop and 20MW are usually cited as the targets to aim for. Runtime performance measurement is well established and a variety of tools available on HPC systems can give a rich variety of accurate information from cache misses to peak flop rate for scientific applications. We find that there is often little consideration given to energy monitoring and when it is present, it is often at low resolution, compared to runtime measurements.
In this paper we examine the energy and power consumption of selected codes which may scale to the Exascale. These are SEISMO, a simple seismic wave propagation modelling code; and Ludwig, a parallel Lattice-Boltzmann code for fluid simulation. The algorithms and data access patterns used in these codes, as well as the strategies used to parallelise them, are representative of a large range of HPC codes, so any results gained from studying them them should have a wider relevance.
Using custom-designed measurement hardware, we instrument a COTS x86 desktop PC with a specification similar to that used in modern HPC nodes. By monitoring the power rails of the CPU, DRAM, disk and network interface (via the PCIe bus), we can build a high-resolution power profile of a scientific code, one much more detailed than we would expect from profiling directly on a HPC node. Extrapolation and inference allows us to predict the component energy and power consumption of the same code running at an exascale node-count. We use a custom measurement system to allow us to perform out-of-band measurements based around a Xilinx Zynq board with appropriate sensors for each power rail. Sampling at 20MHz and with a resolution of 16 bits per measurement (both voltage and current) we can achieve a high quality set of measurements for any rail of interest with no overhead on the system-under-test. Correlation of runtime and power measurements are achieved by post-processing data to gain temporal alignment.
The novelty of our system is that by using our energy-scaling ratio, we can compare the performance of the application in terms of energy to solution and time to solution as the core count scales. The goal then becomes one of minimising both energy to solution and time to solution given the constraints of the target system. We present results for the SEISMO and Ludwig applications gathered from our system using both our own hardware and, for the CPU and DRAM, from hardware counters such as RAPL. This allows to validate results against RAPL which is known to be accurate for Intel CPUs, and to estimate the overhead of using an in-band method like RAPL which can then be added to an energy model for the code.
You can view the slides from the presentation here.