CLAIX (Cluster Aix-la-Chapelle)
picture: Conor Crowe
The RWTH Compute Cluster as part of the JARA partition currently consists of two machine generations organized in two sections (CLAIX-2016 and CLAIX-2018) with two subsections each.
- In total approximately 150 million core-h (about 36000 EFLOP) per call are available.
- The minimum volume per proposal is 2.4 million core-h.
Smaller proposals can be filed using a simplified procedure through IT Center here
- We currently do not expect any overbooking after the recent installation of new machine capacity.
- We expect some 20-30 applications.
The following table summarizes the characteristics of the four compute subsections.
Available resources per call
2 Intel E5-2650 v4 processors“Broadwell”
|50 million core-h or about 6500 EFLOP|
8 Intel E7-8860 v4processors “Broadwell“
|4 million core-h or about 500 EFLOP|
2 Intel Xeon Platinum 8160 Processors “SkyLake”
48 cores per node,
|95 million core-h or about 23500 EFLOP|
2 Intel Xeon Platinum 8160 Processors “SkyLake”
|4 million (host) core-h or about 5500 EFLOP|
It is our intention to optimize the overall system throughput and performance as well as fulfill your project’s specific requirements. Therefore, we depend on your input concerning your programs’ characteristic behavior in order to decide which project will be allocated where.
We invite you to apply for a PREP project here in order get access to machines of all sections for experimentation and preparation of your JARA proposals.
All JARA CLAIX projects will be assigned to either CLAIX-2016-MPI or CLAIX-2018-MPI and jobs will be launched to one of these compute subsubsections by default. In order to direct jobs to any other subsection, you need to adjust the Slurm parameters of your batch job accordingly.
Of course, new machines typically perform better than old machines. We expect an increase in per-core-performance between machines of the CLAIX-2016-MPI subsection on one hand and of the CLAIX-2018-MPI subsection on the other hand of 30%-50%. In order to compensate for this difference in speed the resources, which the JARA commission approves, will be increased by 50% by the IT Center when your project is assigned to CLAIX-2016-MPI.
The difference between the processors of these 2 generations beside the higher core count are major organizational changes in the cache hierarchy (the SkyLake processors implement a 2-D mesh to connect L2 caches, their L3 cache is non-inclusive: data residing in L2 is not replicated in L3) and improvements in their vectorization capabilities: AVX-512.
Whether your application can profit from these new features largely depends on whether it benefits from vectorization, which means, the employment of SIMD instructions. (We point to workshops that have been focusing on vectorization recently.
A few nodes of the CLAIX-2018 installation are equipped with two NVIDIA Tesla V100 GPUs each. Apart from these GPUs, the CLAIX-2018-GPU nodes have the same properties as the nodes of the CLAIX-2018-MPI subsection.
Please be aware that accounting of a GPU node is related to the cores of its host processors. Consequently, occupying two GPUs for one hour is accounted like using 48 host cores for one hour. You have to consider this when specifying the resources for your project, which you plan to consume on these GPU nodes in your project application.
The nodes of each subsection are connected with the same generation of Intel Omni-Path IB fabric – so the MPI performance will generally not differ a lot. However, because of the higher core count the newer processors can host double the amount of MPI processes and may experience a higher communication pressure.
The connection to the outside world of the new cluster subsections for large data transfers is expected to be much better.
All CLAIX machines share access to the same HOME and WORK storage servers.
But for higher I/O demands concerning bandwidth and volume there are separate HPCWORK storage servers (Lustre) for the CLAIX-2016-* and CLAIX-2018-* subsections.
All JARA CLAIX projects have access to only one of these Lustre servers.
The later subsections’s Lustre server is expected to provide much more bandwidth and has a higher capacity. So if your project’s applications have a high demand for (parallel) I/O you should express your preference for CLAIX-2018-* machines.
Furthermore, the nodes of the CLAIX-2018-* subsections offer 480 GB of fast storage on SSDs. For parallel applications running on multiple nodes the SSDs of all involved nodes can be linked together to provide a fast parallel ad-hoc temporary filesystem (bgfs on demand / BeeOND).
Large memory requirements
The few nodes of the CLAIX-2016-SMP subsection are available, if there is a requirement for a large amount of main memory per node (between 128/192 GB and 1 TB) or a requirement for a large number of threads per (MPI) process (up to 144) which cannot be fulfilled on the MPI subsections.
Nodes of the CLAIX-2016-SMP subsection also have access to the new Lustre filesystem at a reduced bandwidth.
We ask you to specify the fraction of resources for your project, which you plan to consume on these large memory nodes in your project application.
You find more information about how to use CLAIX here.
For further information, please contact the ServiceDesk of IT Center RWTH Aachen University.