The Partition's computers
The JARA Partition consists of contingents on the high-performance computers and supercomputers installed at RWTH Aachen University (CLAIX) and Forschungszentrum Jülich (JURECA). The Partition has been established in 2012 and has been gradually expanded since then. All listed core-hours are evenly split between two computing time periods per year.
CLAIX (Cluster Aix-la-Chapelle)
(picture: Conor Crowe)
The RWTH Compute Cluster as part of the JARA-HPC partition currently consists of two machine generations organized in 3 sections:
|Compute section||Processors||Capabilities||Available resources|
2 Intel E5-2650v4processors “Broadwell”
|24 cores per node,
128 GB main memory per node
( ~5 GB main memory per core)
|~50 Mio Coreh
8 Intel E7-8860v4processors “Broadwell“
|144 cores per node,
1 TB main memory per node
(~7 GB main memory per core)
|~5 Mio Coreh per Call|
2 Intel Xeon Platinum 8160 Processors “SkyLake”
|48 cores per node,
192 GB main memory per node
(~4 GB main memory per core)
|~95 Mio Coreh per Call|
CLAIX Machines which also provide GPU-capabilities (CLAIX-2016-GPU or CLAIX-2018-GPU) are currently not part of the JARA-HPC partition, but resources can be applied for in the context of the NOVA project category. This might change in future – depending on users’ demand.
It is our intention to optimize the overall system throughput and performance as well as fulfill your project’s specific requirements. Therefore, we depend on your input concerning your programs’ characteristic behavior in order to decide which project will be allocated where.
We invite you to apply for a PREP project in order get access to machines of all sections for experimentation and preparation of your JARA-HPC proposals.
Please find more information about the mentioned project categories here.
All JARA CLAIX projects will be assigned to either CLAIX-2016-MPI or CLAIX-2018-MPI and jobs will be launched to one of these compute sections by default.
Of course, new machines typically perform better than old machines. We expect an increase in per-core-performance between machines of the CLAIX-2016-MPI section on one hand and of the CLAIX-2018-MPI section on the other hand of 30%-50%. In order to compensate for this difference in speed the resources, which the JARA-HPC commission approves, will be increased by 50% by the IT Center when your project is assigned to CLAIX-2016-MPI.
The difference between the processors of these 2 generations beside the higher core count are major organizational changes in the cache hierarchy (the SkyLake processors implement a 2-D mesh to connect L2 caches, their L3 cache is non-inclusive: data residing in L2 is not replicated in L3) and improvements in their vectorization capabilities: AVX-512.
Whether your application can profit from these new features largely depends on whether it benefits from vectorization, which means, the employment of SIMD instructions. (We point to workshops that have been focusing on vectorization recently.
The nodes of each section are connected with the same generation of Intel Omni-Path IB fabric – so the MPI performance will generally not differ a lot. But because of the higher core count the newer processors can host double the amount of MPI processes and may experience a higher communication pressure.
The connection to the outside world of the new cluster sections for large data transfers is expected to be much better.
All CLAIX machines share access to the same HOME and WORK storage servers.
But for higher I/O demands concerning bandwidth and volume there are separate HPCWORK storage servers (Lustre) for the CLAIX-2016-* and CLAIX-2018-* sections.
The later sections’s Lustre server is expected to provide much more bandwidth and has a higher capacity. So if your project’s applications have a high demand for (parallel) I/O you should express your preference for CLAIX-2018-* machines.
Furthermore, the nodes of the CLAIX-2018-* sections offer 480 GB of fast storage on SSDs. For parallel applications running on multiple nodes the SSDs of all involved nodes can be linked together to provide a fast parallel ad-hoc temporary filesystem (bgfs on demand / BeeOND).
Large memory requirements
All CLAIX projects will also be able to launch jobs on the small CLAIX-2016-SMP section, if there is a requirement for a large amount of main memory per node (between 128/192 GB and 1 TB) or a requirement for a large number of threads per (MPI) process (up to 144) which cannot be fulfilled on the MPI sections.
Nodes of the CLAIX-2016-SMP section also have access to the new Lustre filesystem at a reduced bandwidth.
We ask you to specify the fraction of resources for your project which you plan to consume on these large memory nodes in your project application.
You find more information about how to use CLAIX here.
For further information please contact the ServiceDesk of IT Center RWTH Aachen University.
picture: JURECA, FZ Jülich
The modular supercomputer JURECA consists of two complementary modules, a cluster module for memory-intensive, low to medium-scalable applications and a booster module for highly-scalable applications.
The JURECA Cluster Module
The module comprises about 1800 compute nodes. Each node contains two Intel Haswell processors with 12 cores each and has at least 128 GB of main memory. This module has a peak performance of about 1.8 PFlop/s. In addition, 75 compute nodes are equipped with two NVIDIA K80 GPUs with a total peak performance of 0.44 Flop/s.
Resources on the JURECA Cluster Module are primarily available for researches of the Forschungszentrum Jülich. Researchers of the RWTH Aachen can only apply for this module if they benefit from the modular architecture of the JURECA system and use the JURECA Cluster Module in combination with the JURECA Booster Module. A detailed and convincing work plan must justify the need to apply for this module.
The JURECA Booster Module
The module comprises about 1600 compute nodes. Each node is equipped with one Intel Xeon Phi 7250-F Knights Landing CPU containing 68 cores with at least 96 GB of main memory. The module posesses an Intel Omni-Path Architecture high-speed network with a non-blocking fat tree topology. The login infrastructure is shared with the cluster module. The booster module has a peak performance of about 5 PFlop/s. 800 TFlop/s are available for users of the JARA partition.
In order to estimate the resources for a regular computing time project it is possible to gain test access to the JURECA system. For this purpose, please contact firstname.lastname@example.org.
You can find more information about JURECA here.
Important note: JSC introduced a new, user-centered model for using the supercomputing systems located at JSC (here: JURECA). Each user has only one account. Via this account all assigned projects can be accessed. In addition, data projects were introduced besides the known computing time projects. Computing time resources will continue to be requested through computing time projects and these projects continue to have access to a scratch file system (without backup) and a project file system (with backup). Access to the tape-based archive, however, is only possible via data projects. In addition, data projects provide access to various additional storage layers; however, they are not equipped with a computing time budget. For a fact sheet on data projects and to apply for data projects please see https://application.fz- juelich.de/Antragsserver/dataprojects/WEB/application/login. php?appkind=dataprojects. For further information see the JSC websites, contact the user support (email@example.com) or contact the coordination office for the allocation of computing time (firstname.lastname@example.org).