CLAIX (Cluster Aix-la-Chapelle)

picture: Conor Crowe

The RWTH Compute Cluster as part of the JARA-HPC partition currently consists of two machine generations organized in four sections.

  • In total approximately 150 million core-h per call are available.
  • The minimum volume per proposal is 2.4 million core-h.
    Smaller proposals can be filed using a simplified procedure through IT Center
  • We currently do not expect any overbooking after the recent installation of new machine capacity.
  • We expect some 20-30 applications.

 

The following table summarizes the characteristics of the four compute sections.

Compute section

Node characteristics

Available resources per call

CLAIX-2016-MPI

2 Intel E5-2650 v4 processors“Broadwell”
(2.2 GHz, 12 cores each)
24 cores per node,
128 GB main memory per node
( ~5 GB main memory per core)

50 million core-h
CLAIX-2016-SMP

8 Intel E7-8860 v4processors “Broadwell“
(2.2 GHz, 18 cores each)
144 cores per node,
1 TB main memory per node
(~7 GB main memory per core)

4 million core-h
CLAIX-2018-MPI

2 Intel Xeon Platinum 8160 Processors “SkyLake”
(2.1 GHz, 24 cores each)

48 cores per node,
192 GB main memory per node
(~4 GB main memory  per core)

95 million core-h
CLAIX-2018-GPU

2 Intel Xeon Platinum 8160 Processors “SkyLake”
(2.1 GHz, 24 cores each)
48 cores per node,
192 GB main memory per node
(~4 GB main memory  per core)
plus
2 NVIDIA Tesla V100 GPUs per node coupled with NVLINK
16 GB HBM2 memory per GPU

4 million (host) core-h

It is our intention to optimize the overall system throughput and performance as well as fulfill your project’s specific requirements. Therefore, we depend on your input concerning your programs’ characteristic behavior in order to decide which project will be allocated where.
We invite you to apply for a PREP project here in order get access to machines of all sections for experimentation and preparation of your JARA-HPC proposals.

 

Compute requirements
All JARA CLAIX projects will be assigned to either CLAIX-2016-MPI or CLAIX-2018-MPI and jobs will be launched to one of these compute sections by default.
Of course, new machines typically perform better than old machines. We expect an increase in per-core-performance between machines of the CLAIX-2016-MPI section on one hand and of the CLAIX-2018-MPI section on the other hand of 30%-50%. In order to compensate for this difference in speed the resources, which the JARA-HPC commission approves, will be increased by 50% by the IT Center when your project is assigned to CLAIX-2016-MPI.

The difference between the processors of these 2 generations beside the higher core count are major organizational changes in the cache hierarchy (the SkyLake processors implement a 2-D mesh to connect L2 caches, their L3 cache is non-inclusive: data residing in L2 is not replicated in L3) and improvements in their vectorization capabilities: AVX-512.
Whether your application can profit from these new features largely depends on whether it benefits from vectorization, which means, the employment of SIMD instructions. (We point to workshops that have been focusing on vectorization recently.

GPU requirements
A few nodes of the CLAIX-2018 installation are equipped with two NVIDIA Tesla V100 GPUs each. Apart from these GPUs, the CLAIX-2018-GPU nodes have the same properties as the nodes of the CLAIX-2018-MPI section.
Please be aware that accounting of a GPU node is related to the cores of its host processors. Consequently, occupying two GPUs for one hour is accounted like using 48 host cores for one hour. You have to consider this when specifying theresources for your project, which you plan to consume on these GPU nodes in your project application.

Communication requirements
The nodes of each section are connected with the same generation of Intel Omni-Path IB fabric – so the MPI performance will generally not differ a lot. However, because of the higher core count the newer processors can host double the amount of MPI processes and may experience a higher communication pressure.
The connection to the outside world of the new cluster sections for large data transfers is expected to be much better.

Storage requirements
All CLAIX machines share access to the same HOME and WORK storage servers.
But for higher I/O demands concerning bandwidth and volume there are separate HPCWORK storage servers (Lustre) for the CLAIX-2016-* and CLAIX-2018-* sections.
The later sections’s Lustre server is expected to provide much more bandwidth and has a higher capacity. So if your project’s applications have a high demand for (parallel) I/O you should express your preference for CLAIX-2018-* machines.
Furthermore, the nodes of the CLAIX-2018-* sections offer 480 GB of fast storage on SSDs. For parallel applications running on multiple nodes the SSDs of all involved nodes can be linked together to provide a fast parallel ad-hoc temporary filesystem (bgfs on demand / BeeOND).

Large memory requirements
All CLAIX projects will also be able to launch jobs on the CLAIX-2016-SMP section, if there is a requirement for a large amount of main memory per node (between 128/192 GB and 1 TB) or a requirement for a large number of threads per (MPI) process (up to 144) which cannot be fulfilled on the MPI sections.
Nodes of the CLAIX-2016-SMP section also have access to the new Lustre filesystem at a reduced bandwidth.
We ask you to specify the fraction of resources for your project, which you plan to consume on these large memory nodes in your project application.

You find more information about how to use CLAIX here.

For further information, please contact the ServiceDesk of IT Center RWTH Aachen University.