Towards an integrated data science of complex natural systems
Involved JARA CSD members: Martin Grohe, Moritz Helias, Abigail Morrison, Holger Rauhut, Michael Schaub
Abstract: Quantitative natural sciences have a long, successful history of obtaining insights into nature by applying a reductionist, model-driven approach to explain empirical observations from a small set of principles. Identifying these principles is the very objective of science, and our ability to comprehend and ultimately control our world critically hinges on this deep understanding of nature.
The rapid progress in data science witnessed over recent years goes in the reverse direction: instead of starting with a mechanistic, generative model and using experimental data and simulations to challenge and improve the model, we directly learn a high-dimensional (statistical) description of the data. This data-driven paradigm has enabled us to make high-precision predictions about nature, even in situations where a mechanistic understanding of the underlying processes still eludes us. However, the data-driven approach lacks the defining feature of natural science: the fostering of our mechanistic understanding of natural processes.
Can both of these perspectives be aligned? In this project we we will work on core challenges and critical steps required towards consolidating model- and data-driven research methodologies to develop a novel form of hybrid modeling: a mechanism-informed data science for complex natural systems.
Deep Image Data Analysis for Precision Medical Imaging
Involved JARA CSD member: Volkmar Schulz
Abstract: Modern medical imaging devices generate an ever-increasing data volume of surging information granularity. In order to cope with this high data volume, the acquired raw data are currently heavily filtered and compressed using classical, historically developed signal processing techniques.
Research in this field shows that the raw data contain a large amount of unused information with an unexplored high diagnostic value. The proposed project aims to unlock the full potential of this buried treasure of information for the medical imaging sector. To achieve this goal, the project will develop the next- generation high-performance simulation and processing platform that will enable us to explore new ultrafast simulation, deep learning, and statistical and numerical methods. The platform will be developed and demonstrated for a first application of advanced positron emission tomography (PET). Clinical research devices for neuro, whole-body and breast imaging are currently being developed at RWTH Aachen together with Forschungszentrum Juelich. These unique devices allow unrestricted access to uncompressed, untruncated raw sensor data and thus offer a unrivaled data source for research towards novel data processing and simulation methods.
Involved JARA CSD members: Harrie-Jan Hendricks-Franssen, Harry Vereecken, Florian Wellmann
Abstract: Geoscientific studies have a major global economic and societal impact: They are fundamental to understanding the physical, chemical, biogeochemical and biological processes of the Earth, in order to develop forecasts, and derive strategies for action – and this aspect is also evident in the fact that modeling of Earth and Environment is identified as one of the “Grand Challenge Problems” in the House of Simulation and Data Science of JARA-CSD. Although geoscientific investigations are extremely diverse, they share cross-sectional methodologies, as uncertainty quantification and data assimilation. Due to the computationally demanding nature of these crosssectional methodologies, High-Performance Computing (HPC) is of significant importance.
Particularly challenging are the systematic acquisition and processing of large quantities of geoscientific data, the extraction of relevant parameters from these data, the integration of novel machine learning approaches, and the validation of statistical models. However, the present-day education of Geoscientists is lacking detailed methodological training in this field – while, at the same time, novel computational paradigms emerge (e.g. an increased use of GPU systems), and machine learning and data processing methods evolve at and unprecedented speed.
Our aim is to close the gap between geoscientific domain knowledge and cutting-edge computational methods. The main outcome of this project is consequently an application for a DFG Research Training Group (RTG) with a focus on HPC in Geosciences. In order to strengthen this application, we will reach out to our community and related fields to identify future challenges in Computational Geosciences with a special focus on HPC and prepare a joint publication on the topic. These measures will not only allow us to prepare the educational basis to train the next generation of experts in the field, but also significantly increase the visibility of related research in JARA CSD.