Lecture Descriptions
A Not So Simple Matter of Software by Jack Dongarra
For nearly forty years, Moore’s Law produced exponential growth in hardware performance, and during that same time, most software failed to keep pace with these hardware advances. We will look at some of the algorithmic and software changes that have tried to keep up with the advances in the hardware. In this talk, we examine how high performance computing has changed over the last 40-years and look toward the future in terms of trends. These changes have had and will continue to have a significant impact on our numerical scientific software. A new generation of software libraries and algorithms are needed to effectively use the high performance computing environments in use today.
Agent-Based Simulation for Earthquake Disaster Mitigation by Maddegedara Lalith
Considering the large number of lives involved, widespread damages, and impact on the economy, high-resolution models are essential for making comprehensive decisions in disaster mitigation. In this regard, significant progress has been made using HPC to simulate natural hazards (e.g., typhoons and earthquakes) and the damage they cause to the built environment. However, the use of HPC to comprehensively analyze the impact of these disasters on people and the economy is still in its infancy. To fill this gap, we have developed HPC-enhanced simulators to analyze large-scale evacuations and the impact of disasters on the national economies. Our simulator for analyze tsunami-triggered mass evacuations of coastal urban areas, includes a high-resolution (e.g., 1mx1m) 2D model of urban environment, agents capable of recognizing the features in the environment, and interacting with the environment and other agents in a complex manner. While it has reasonably high strong scalability, it can accommodate several millions of agents spread over several hundred square kilometers. Our HPC-enhanced economic simulator is capable of including every economic entity (e.g., every household, every firm, government agencies, etc.) in a country and mimicking real-world economic interactions. We calibrated the model to the Japanese economy using data available at government data portals, and validated by reproducing past observations of the national economy, each industrial sector, and even individual firms. High computational efficiency and scalability allow us to simulate a single period of the model of the Japanese economy consisting of 130 million agents within 2 minutes using 128 CPU cores. By integrating it with physics-based disaster simulators, we aim to simulate the impact of natural disasters on the national economy. In this lecture, we explain the details of these two agent-based models.
AI Factories in the European HPC Ecosystem by Kimmo Koski
The talk describes the European status and development plans in HPC Ecosystem with a special focus on a concept called AI Factories. The target of it is to expand the AI capabilities in Europe, both in terms of computing resources and competencies. Part of the AI Factory concept is for funding the next generation AI oriented systems, part of it aims at funding the work around it. Examples include development of competencies, help in code design or training of people - to name a few areas.
AI Factories is driven by EuroHPC initiative, which is a European collaboration body with a target to boost the HPC Ecosystem. The joint projects include not only AI, but wider scope of work from quantum to HPC and from research infrastructures to industry collaboration, forming an ecosystem where different kind of computing resources, data, communications, applications and related competencies play a role together.
In this talk, the current status of European HPC and AI Factories is described and plans for the future explained. Global collaboration opportunities, for example between Europe / EuroHPC and Asia, are discussed as also the concept of HPC Ecosystem where one of the European flagship computers, LUMI supercomputer placed in Finland, will be used as an example.
Diagnostic studies of High-Resolution Global Climate Model by Yoshiyuki Kajikawa
The development of HPC has brought higher resolution and more elaboration in the climate model. We are now entering the era of “Kilometre-scale” modeling of the climate system. With these developments, it would be natural that the climate model analysis should be also advanced. In this lecture, the diagnostic studies of high-resolution climate model simulations with the benefit of high-resolution will be introduced with the pioneering recent studies. In particular, the expression of convection in the regional and global climate model as well as its aggregation process will be focused. In the latter half of the lecture, we will also introduce how the reproducibility of the climate fields and elements are improved in various spatio-temporal scales by resolving the cumulus convection. We would like to share and discuss the direction of renewed climate science with high-resolution climate model simulations.
Earth Observation Data Analysis Using Machine Learning by Naoto Yokoya
This lecture focuses on applying machine learning techniques with earth observation data analysis. We start by exploring different applications of Earth observation data, such as environmental monitoring, disaster management, and urban planning. After an introduction to machine learning, we delve into the basics of neural networks and how they can be applied to remote sensing data. Participants will take part in a hands-on session where they will build a neural network model for building damage classification. They will then explore automated mapping techniques using remote sensing imagery, with a particular focus on semantic segmentation for land cover mapping. Another hands-on session will guide participants through practical implementation steps for land cover mapping using machine learning algorithms.
From Sequence to Function: Applications of Protein Language Models in Protein Design by Camila Pontes
Proteins play a crucial role in various cellular processes, making their study vital for understanding biology and underpinning disease mechanisms. Over recent years, language models were naturally adapted to understand the language of proteins. This happened because the protein language has many similarities with natural languages: while a phrase is composed of words according to a specific grammar, a protein is composed of different amino acids following biological rules. This session will give an overview of how different protein language models (pLMs) can be leveraged to obtain functional insights about protein families or to generate new-to-nature protein sequences. In the practical session, we will investigate the outputs (embeddings and logits) of a BERT-based protein language model called ESM2 and learn how to interpret them. Then, we will work on an example of how these outputs can be used to redesign a protein.
From viruses to food and new drugs, next generation AI applications and high performance computing by Sebastian Maurer-Stroh
This talk will take us on a journey along success stories where computational methods contributed to real world impact from genomic surveillance during Covid, highly pathogenic avian influenza in cow milk to safety evaluation of novel food designs. I will also explore how next generation AI methods change important fields such as drug discovery and will give examples how AI can accelerate and de-risk the drug discovery process highlighting the need for seamless scale up from edge to high performance cluster computing.
Hands-on for Scientific Benchmarking by Jens Domke
In this hands-on, we will go through one example on how to test and reach the peak performance of an important HPC proxy which tests a major architecture feature of Fugaku. The attendees will analyze, modify, execute, and post-process an OpenMP-based benchmark, and learn how to approach the systematic benchmarking process. The attendees will be introduced to typical issues which they might encounter with other apps and how to tackle those challenges. Ideally, the outcome of this session will to enable the attendees to replicate the art of benchmarking on an HPC systems at their home institute.
Integration of 3D Earthquake Simulation & Real-Time Data Assimilation by Kengo Najajima
We propose an innovative method of computational science for sustainable promotion of scientific discovery by supercomputers in the Exascale Era by integrating (Simulation/Data /Learning (S+D+L)), and develop a software platform “h3-Open-BDEC” for integration of (S+D+L) and evaluate the effects of the integration of on heterogenous supercomputer systems. Our target system the Wisteria/BDEC-01system with 33+PF at the University of Tokyo, which started its operation in May 2021, and consists of computing nodes for CSE with A64FX and those for Data Analytics/AI with NVIDIA A100 GPU’s. The h3-Open-BDEC is designed for extracting the maximum performance of the supercomputers with minimum energy consumption. Japan is a country with many natural disasters. In particular, the damage caused by the earthquake is enormous. Over the past 30 years, large earthquakes have occurred across Japan. Although it is extremely difficult to predict the occurrence of an earthquake, research is currently being actively conducted to minimize damage after an earthquake occurs. We have applied h3-Open-BDEC to Seism3D/OpenSWPC-DAF (Data-Assimilation-Based Forecast), which was developed by ERI/U.Tokyo for integration of simulation and data assimilation. In this talk, we will describe the demonstration of real-time data assimilation of the developed code on the Wisteria/BDEC-01 with h3-Open-BDEC using measured data through JDXnet (Japan Data eXchange network) with 2,000+ high-sensitivity/broadband seismic observation stations in Japan.
Introduction to climate modeling and its objectives by Hirofumi Tomita
In this lecture, I will first present my perspective on the concept of climate research and the various viewpoints of this research. Following that, I will provide an overview of the phenomena studied in climate research from the perspective of spatiotemporal scales and discuss the modeling policy required to examine them. In the latter part of the lecture, I will focus on atmospheric models within climate models, explaining how to solve them using computers. I hope this will help clarify how climate scientists address the complexities of climate dynamics.
Introduction to HPC Applications and systems by Bernd Mohr
In this introductory lecture, students will learn what "high performance computing" (HPC) means and what differentiates it from more mainstream areas of computing. They will also be introduced to the major application areas that use HPC for research and industry, and how AI and HPC interact with each other. Then, it will present the major HPC system architectures needed to run these applications (distributed and shared memory, hybrid, and heterogeneous systems).
Introduction to HPC Programming by Bernd Mohr
In this second introductionary lecture, students will be provided with an overview of the programming languages, frameworks, and paradigms used to program HPC applications and systems. They will learn how MPI can be used to program distributed memory systems (clusters), how OpenMP can be used for shared memory systems, and finally, how to program graphics processing units (GPUs) with OpenMP, OpenACC, or lowel-level methods like CUDA or ROCm/HIP.
Introduction to the Use of Fugaku by Jorji Nonaka
This session will provide an overview of the hardware and software resources of the supercomputer Fugaku available to the users, as well as a hands-on introduction to their use via traditional CLI (Command Line Interface) as well as Web-based GUI (Graphical User Interface). The overview also includes information resources to the user guides, operational status, and user support.
Molecular dynamics simulations of biomolecular systems by Chigusa Kobayashi
Advancements in various experimental techniques have revealed the structural changes and dynamics of biomacromolecules, such as proteins, within living organisms. The relationship between biomolecular structures and their biological functions is gradually becoming clearer. However, obtaining detailed structural information about reaction pathways and intermediates remains a significant challenge. One method to address this issue is molecular dynamics (MD) simulations, a computational approach that employs Newton's equations of motion to calculate the trajectories of atoms from the interaction forces between them.
Understanding the dynamics of biomacromolecules requires substantial computational resources, and the field has made significant progress thanks to the enhanced capabilities of modern computing systems, including supercomputers. We have been developing novel, efficient, and accurate computational methods through the creation of the MD software Generalized Ensemble Simulation Systems (GENESIS). GENESIS demonstrates high parallel efficiency on massively parallel computing systems, including "Fugaku," making it one of the most effective tools for leveraging the full potential of "Fugaku."
Released as free software under the Lesser GNU General Public License (LGPL), GENESIS is widely utilized by researchers worldwide. This tutorial will offer a comprehensive introduction to the basic usage of GENESIS through a combination of hands-on practice and lectures.
Parallel Programming on GPUs and/or HPC Python Programming by Bernd Mohr
In these lectures, students will be able to learn how to program graphics processing units (GPUs) and will perform programming exercises on one or more GPU systems (e.g. NVIDIA or AMD) made available in the cloud. We also plan for a lecture on how to use effectively Python on HPC systems. More details will be provided soon. For this lecture, students should have some basic experience with programming computers with C or C++.
Parallel Programming with MPI and OpenMP Hands-on by Jens Domke and Bernd Mohr
In this lecture, students will be provided with all the necessary details, to perform parallel programming exercises with MPI and OpenMP on the Fugaku supercomputer of RIKEN, Japan, one of the fastest computers in the world. Ideally, students should have some basic experience with programming computers with C, C++, FORTRAN or Python. Experiences with the Linux operation system are also helpful, but not required.
Scientific Benchmarking by Jens Domke
In this lecture, the attendees will learn the dos and don'ts of the scientific benchmarking process. The lecture will introduce why benchmarking is important in the current complex landscape of hardware and software, and will highlight a few concepts like metrics, methodologies, data visualization and interpretation, etc. The lecture will also showcase pitfalls and mitigation strategies based on the experiences collected by the HPC/AI community over the years.
Solving 3D Puzzles of Biomolecular Interactions by Integrative Modeling by Alexandre Bonvin
The prediction of the quaternary structure of biomolecular macromolecules is of paramount importance for fundamental understanding of cellular processes and drug design. One way of increasing the accuracy of modelling methods used to predict the structure of biomolecular complexes is to include as much experimental or predictive information as possible in the process. We have developed for this purpose the versatile integrative modelling software HADDOCK (https://www.bonvinlab.org/software) available as a web service from https://wenmr.science.uu.nl. HADDOCK can integrate a large variety of information derived from biochemical, biophysical or bioinformatics methods to enhance sampling, scoring, or both.The lecture (online before the school) will highlight some recent developments around HADDOCK and its new modular HADDOCK3 version and illustrate its capabilities with various examples including among others recent work on the modelling of antibody-antigen interactions from sequence only, a notoriously difficult class of complexes to predict for AI-based methods. The practical session will demonstrates the use of the new modular HADDOCK3 version for predicting the structure of an antibody-antigen complex using knowledge of the hypervariable loops on the antibody (i.e., the most basic knowledge) and epitope information identified from NMR experiments for the antigen to guide the docking.
Time to take HPC insights straight to the clinic by Chandra Verma
Computational models of biomolecules at the atomic level have reached a significant level of sophistication. Predictions are regularly validated experimentally and result in novel IP, create new industry. What about their role in interfacing directly at the clinical level. Can they influence clinical decisions? We will explore this with some examples and will discover that steps in this direction can confidently be taken, but needs a robust pipeline of high performance computing based predictions and validations.
The Digital Revolution of Earth System Modelling by Peter Dueben
This talk will outline three revolutions that happened in Earth system modelling in the past decades. The quiet revolution has leveraged better observations and more compute power to allow for constant improvements of prediction quality of the last decades, the digital revolution has enabled us to perform km-scale simulations on modern supercomputers that further increase the quality of our models, and the machine learning revolution has now shown that machine learned weather models are often competitive with physics based weather models for many forecast scores while being easier, smaller and cheaper. This talk will summarize the past developments, explain current challenges and opportunities, and outline how the future of Earth system modelling will look like. In particular, regarding machine-learned foundation models in a physical domain such as Earth system modelling.
The Use of LLMs and Other AI Models in Science by Mohamed Wahib
The rapid advancement of large language models (LLMs) and other AI models has opened new frontiers in scientific research. In this talk, we will explore how these cutting-edge technologies are transforming various scientific disciplines. Across a wide range of science domains, LLMs and AI are enabling researchers to tackle complex problems with unprecedented speed and precision. We will first give an introduction to LLMs, then we discuss practical applications in fields such as chemistry, biology, and physics, and examine the potential challenges and ethical considerations in integrating these AI models into scientific workflows.
Towards an AI-based solution of protein structure prediction problem by Zhang Yang
The past decade has witnessed revolutionary changes in computer-based protein structure prediction, mainly driven by the artificial intelligence (AI) and deep neural-network learning techniques. In this lecture, we discuss structure prediction results in recent community-wide CASP experiments, showing that new deep-learning approaches, built on coevolution data from multiple sequence alignments, can result in consistent and successful folding of large proteins with complicated topologies. Of note, AlphaFold2 trained through end-to-end transformer networks could fold nearly all protein domains in CASP14 with 2/3 of them having accuracy comparable to low-resolution experimental solutions. In the most recent CASP experiments, new advancements over AlphaFold2 have been made by integrating end-to-end learning and protein language models with fragment-assembly simulations. These achievements essentially break through the 50-years-old barrier of homology-based modelling and marked a solution, at least at the fold-level, to the single-domain protein structure prediction problem. Nevertheless, constructions of atomic-resolution models for multi-domain and higher-order protein complexes remain challenging. Given the power of AI and rapid advancement of the field, it is expected that the problems should be solved in a foreseeable future by coupling deep learning techniques and metagenome sequencing databases, with the aid of advanced structure assembly simulation algorithms.