ARCHER logo ARCHER banner

The ARCHER Service is now closed and has been superseded by ARCHER2.

  • ARCHER homepage
  • About ARCHER
    • About ARCHER
    • News & Events
    • Calendar
    • Blog Articles
    • Hardware
    • Software
    • Service Policies
    • Service Reports
    • Partners
    • People
    • Media Gallery
  • Get Access
    • Getting Access
    • TA Form and Notes
    • kAU Calculator
    • Cost of Access
  • User Support
    • User Support
    • Helpdesk
    • Frequently Asked Questions
    • ARCHER App
  • Documentation
    • User Guides & Documentation
    • Essential Skills
    • Quick Start Guide
    • ARCHER User Guide
    • ARCHER Best Practice Guide
    • Scientific Software Packages
    • UK Research Data Facility Guide
    • Knights Landing Guide
    • Data Management Guide
    • SAFE User Guide
    • ARCHER Troubleshooting Guide
    • ARCHER White Papers
    • Screencast Videos
  • Service Status
    • Detailed Service Status
    • Maintenance
  • Training
    • Upcoming Courses
    • Online Training
    • Driving Test
    • Course Registration
    • Course Descriptions
    • Virtual Tutorials and Webinars
    • Locations
    • Training personnel
    • Past Course Materials Repository
    • Feedback
  • Community
    • ARCHER Community
    • ARCHER Benchmarks
    • ARCHER KNL Performance Reports
    • Cray CoE for ARCHER
    • Embedded CSE
    • ARCHER Champions
    • ARCHER Scientific Consortia
    • HPC Scientific Advisory Committee
    • ARCHER for Early Career Researchers
  • Industry
    • Information for Industry
  • Outreach
    • Outreach (on EPCC Website)

You are here:

  • ARCHER
  • ARCHER Community
  • ARCHER Benchmarks
  • ARCHER KNL Performance Reports
  • Cray CoE for ARCHER
  • Embedded CSE
  • ARCHER Champions
  • ARCHER Scientific Consortia
  • HPC Scientific Advisory Committee
  • ARCHER for Early Career Researchers

Contact Us

support@archer.ac.uk

Twitter Feed

Tweets by @ARCHER_HPC

ISO 9001 Certified

ISO 27001 Certified

ARCHER KNL Performance Reports

This page contains a summary of the findings from the various KNL Performance Reports and all the KNL Performance Reports.

  • Summary of results and advice
  • Individual KNL Performance Reports

The KNL Performance Reports were written by ARCHER users and the ARCHER CSE Service at EPCC and compare the performance of standard ARCHER compute nodes to ARCHER KNL nodes for a variety of applcations and benchmarks.

Summary of results and advice

Luis Cebamanos, ARCHER CSE Team, EPCC

The recent addition of the ARCHER Knights Landing (KNL) testing and development platform opens new opportunities for optimizing applications to one of the most advanced manycore devices available today. Here we present and summarise the performance evaluation of a group of applications run on the ARCHER-Xeon and the ARCHER-KNL system. This group of applications comprises CFD codes (ICOMPACT3D, COSA, SENGA2, OpenSBLI and HLBM), molecular dynamics (MD) codes (LAMMPS, NAMD and CP2K), forward dynamics modeling codes (GaitSym), Statistics (R), and plasma modeling codes (GS2).

Objectives

The main of objectives of this study are:

  • Analyze the performance of different applications run on both ARCHER–Xeon and ARCHER–KNL systems.
  • Compare results obtained of different user cases.
  • Provide configuration and optimization advice based on the performance results obtained here.

Hyperthreading

Although it is not a general rule, the use of multiple hyperthreads on the ARCHER-KNL system seems to give more performance benefits. We have seen this effect on SENGA2, OpenSBLI, LAMMPS and NAMD. In other applications, hyperthreading has not given any boost in performance. This is the case of COSA or CP2k. The use of hyperthreads although not always the best option does not seem to deteriorate the performance. When hyperthreading aids to improve the application’s performance, the best choice normally varies between 2 or 4 hyperthreads. The choice between 2 or 4 hyperthreads mostly depends on the user test case employed although we have also seen variation with the number of nodes, this is the case of NAMD or LAMMPS.

Hybridisation

Hybrid codes are more likely to perform well on the ARCHER-KNL system. With the exception of CP2K, the codes that implement a hybrid model (MPI+ threads) have demonstrated to perform better on the ARCHER-KNL system than on the Xeon system, particularly in CFD applications. We have seen this effect on LAMMPS, OpenSBLI and NAMD. The boost in performance most likely depends on the user case, being more beneficial in problems where fewer MPI processes allows more memory usage per process. The right number of threads would also be very application dependent, but traditional hybrid MPI+OpenMP models seem to obtain peak performance with 2 or 4 threads per process.

Energy consumption

The ARCHER-KNL system has demonstrated to provide considerably lower figures of energy consumption compare to the ARCHER-Xeon system. We have seen with applications like OpenSBLI or LAMMPS that about half of the energy was consumed by the KNL system as a result of a faster simulation. However, it would also be interesting to compare these figures with more recent hardware than the ARCHER Xeon Ivy Bridge processors.

Cache

The cache effect has been seen on several applications benchmarked here. This is the case of COSA or HLMB where superlinear scaling has been achieved. In general, applications that properly use the cache on KNL systems or that data used in a test case fits into the MCDRAM would see a significant performance benefit.

Drop in performance

Although a considerable number of the tested applications experienced a boost in performance running on a single ARCHER-KNL node compared to a single Xeon node, a few of them have also shown a fall in performance gains as the number of nodes where increased. We have seen in applications like NAMD or SENGA2. A possible reason for this could be that certain test cases do not provide enough computation per node to use the additional compute resource available as the node count increases.

Performance comparison

As previously indicated, most applications have shown certain level of performance benefit running on ARCHER-KNL system compared to the Xeon system. Having said that, this performance benefit is present if the comparison is done node to node: reporting the performance figures on a given number of nodes. On the other hand, core to core comparisons always show the ARCHER-Xeon system as having higher performance by almost a factor of 2. The reason for this is most likely because of the difference clock rate being 1.3GHz for the KNL cores (or 1.1 GHz for the floating point operations) and 2.7GHz for the Xeon cores.

Peak performance

Performance in terms of percentage of peak performance has been measured for the LAMMPS application. This reported that to be considerably higher for the ARCHER-Xeon system compared to the ARCHER-KNL. The most likely reason for this is that vectorisation has not been fully exploited by the user cases employed. This obviously highlights the importance of vectorisation on the KNL system to make the most of the computational performance on offer.

KNL MCDRAM configuration mode

The majority of benchmarks have used the quad_100 configuration of memory, where all 16GB MCDRAM memory is allocated to be used as cache memory. The HLMB and COSA application demonstrated that quad_0 nodes were as twice as much slower as the quad_100 nodes if the MCDRAM memory was not manually employed (using a method such as numactl) and main memory (DDR) was used on its own.

Individual KNL Performance Reports

  • CASTEP (PDF), Gordon Gibb, ARCHER CSE Team, EPCC
  • COSA (PDF), Adrian Jackson, EPCC
  • CP2K (PDF), Fiona Reid, EPCC
  • CRYSTAL (PDF), Barry Searle, STFC
  • GaitSym (PDF), Bill Sellers, University of Manchester
  • GPAW (PDF), Martti Louhivuori, CSC, Finland
  • GS2 (PDF), David Dickinson, University of York
  • HBM (PDF), grid-based Navier-Stokes George Barakos, Mark Woodgate, University of Glasgow
  • HLBM (PDF), lattice-boltzmann, George Barakos, Mark Woodgate, University of Glasgow
  • Incompact3d (PDF), Sylvain Laizet, Imperial College, London
  • LAMMPS (PDF), Luis Cebamanos, ARCHER CSE Team, EPCC
  • LAMMPS (PDF), Kevin Stratford, ARCHER CSE Team, EPCC
  • NAMD (PDF), Andy Turner, ARCHER CSE Team, EPCC
  • OpenSBLI (PDF), Luis Cebamanos, ARCHER CSE Team, EPCC
  • OpenSBLI (PDF), Christian Jacobs, University of Southampton
  • Quantum Espresso (PDF), Paolo Di Cono, King's College London
  • R and SPRINT (PDF), Adrian Jackson, EPCC
  • SENGA2 (PDF), Neelofer Banglawala, ARCHER CSE Team, EPCC
  • UCNS3D (PDF), Panagiotis Tsoutsanis, Antonios Antoniadis, Cranfield University

Copyright © Design and Content 2013-2019 EPCC. All rights reserved.

EPSRC NERC EPCC