Lecture Descriptions
Harnessing HPC and AI for Interactive Analysis and Visualization of Large Scientific Datasets by Michela Taufer
Computing is ubiquitous, present in the cloud, clusters at our institutions, and even in our laptops. However, the significant challenge remains the management of vast amounts of data, often generated remotely at the edge using experimental facilities or supercomputers in large national laboratories. When dealing with such vast data stored in various public and private remote locations, moving data from remote facilities to our desktop is impractical. Scientists dealing with this data often prefer to review it remotely before transferring only specific portions for closer AI-based analysis and visualization. Each step of this process is challenging: streaming the data, identifying and deploying tools for data analysis and visualization, interacting dynamically with the data, and exploring multiple datasets simultaneously.
This lecture addresses this significant challenge in HPC by presenting solutions to scientists’ need to interactively deploy efficient AI-based analysis and visualization tools for large scientific datasets. The lecture demonstrates how a data service initiative such as the National Science Data Fabric (NSDF) enables accessible, flexible, and customizable workflows for multi-faceted analysis and visualization of various datasets. The lecture walks through the workflow steps of generating large datasets through modular applications, storing this data remotely, and using AI to analyze the data locally to draw scientific conclusions. NSDF services allow users to stream data from public storage platforms like DataVerse or private storage platforms like Seal Storage and access an easy-to-use NSDF dashboard for immediate interaction with data.
The lecture highlights how to navigate every step of a modular workflow, efficiently handling different data formats for streaming, and using AI methods and visualization tools for scientific inference on selected data subsets. The lecture applies this new knowledge to experimental datasets in earth science, material sciences, and more use cases. The lecture equips participants with the skills to utilize data services for comprehensive scientific data analysis. It guides them through creating flexible workflows, managing data across various storage solutions, and deploying data visualization and analysis tools. Attendees will learn to manage substantial datasets and incorporate them into their applications, facilitating better access to data and advancing scientific exploration.
Workflows for HPC & AI by Rosa Badia and Jorge Ejarque
With Exaflop systems already here, High-Performance Computing (HPC) involves everytime larger and complex supercomputers. At the same time, the user community is aware of the underlying performance and eager to leverage it by providing more complex application workflows to leverage them.
What is more, current application trends aim to use data analytics and artificial intelligence combined with HPC modeling and simulation.
However, the programming models and tools are different in these fields, and there is a need for methodologies that enable the development of workflows that combine HPC software, data analytics, and artificial intelligence. PyCOMPSs is a parallel task-based programming in Python. Sequential Python programs decorated with simple annotations are executed in parallel in HPC clusters and other distributed infrastructures. PyCOMPSs has been extended to support tasks that invoke HPC applications and combine them with Artificial Intelligence and Data analytics frameworks. These extensions were performed in the eFlows4HPC project, which aimed to develop a workflow software stack that fulfils the previously mentioned need.
The lecture will be composed of two parts. The first part will consist of a presentation about how this hybrid HPC-AI workflows are developed, illustrated with examples from the eFlows4HPC and CAELESTIS project. The lecture will have a special emphasis on the PyCOMPSs programming model and the dislib machine learning library. Also, the lecture will explain how the deployment of the workflows is performed with containers and executed with container engines supported in HPC systems.
The lecture will include a hands-on session on programming and executing PyCOMPSs workflows from container images. In the hands-on session the students will be able to work with an example that combines execution of HPC simulations with machine learning methods implemented with the dislib library and to execute them in the MareNostrum 5 system.
European Processor Initiative: Cornerstone of EU digital agenda and EU digital sovereignty by Mario Kovac
The importance of high-performance computing (HPC) has been recognized as key for most segments of industry and society. However, the need to collect and efficiently process these vast amounts of data requires exascale computing systems (capable of calculating at least 1018 floating point operations per second) and that comes at a price. The approach to HPC systems design requires significant changes for the exascale era. Top state of the art most energy-efficient high-performance computing systems feature novel architectures for general purpose processors and integration of accelerator processors to achieve the best possible efficiency. Global race for brand new exascale microprocessor architectures was recognized by EU as the unique opportunity to create brand new EU microprocessor industry and address EU sovereignty challenges. European Processor Initiative (EPI), that we present here, is the strategic EU project with the goal to develop key components for the European Union to equip itself with a world-class supercomputing infrastructure: European general purpose processor and accelerator processors technologies with drastically better performance and power ratios and tackle important segments of broader and/or emerging HPC and Big-Data markets.
A deep dive into sustainable RISC-V HPC systems by Andrea Bartolini
The lecture will cover the foundations of sustainable computing, ranging from sand to large-scale systems and carbon emissions. It will then provide a deep dive into the building blocks of high-performance computing systems, targeting next-generation RISC-V HPC systems with practical considerations and labs on performance measurements on the Monte Cimone RISC-V cluster.
RISC-V compliant vector computing design from embedded edge processors to HPC machines by Mauro Olivieri and Francesco Minervini
The computer architecture scenario is exhibiting a convergence of targets between classical HPC and high-performance embedded computing on the Edge. The lecture will briefly analyse the mathematical relations among computing speed and power efficiency, demonstrating the effectiveness of computation acceleration to achieve high energy efficiency. As a mainstream approach to specialized computation acceleration, vector acceleration support will be discussed in two opposed scenarios: we will illustrate a custom RISC-V vector acceleration extension in an embedded soft-processor, and a standard RISC-V HPC vector processor (the Vitruvius vector unit developed in the European Processor Initiative project).
Chiplet-scalable RISC-V Acceleration for Low-Power GenerativeAI inference at the edge: The Occamy Project by Luca Benini and Andrea Bartolini
GenerativeAI is becoming the key enabler for a wide range of applications that deploy intelligent agents in vehicles, appliances, even wearable devices. Shifting GenAI inference from the cloud to the edge is the end-goal, and enabling low-latency and energy efficient operation with a reasonable power budget, is the key challenge. In this talk I will discuss the architectural guiding principles to address the challenge, leveraging the RISC-V open architecture and chiplet-based system scalability. As a concrete embodiment of architecture, design and implementation, I will report on Occamy. a real silicon demonstration of a flexible and efficient GenerativeAI RISC-V-based inference accelerator.
Distributed Data Analytics for AI in Supercomputing Systems by Josep Lluís Berral
Distribution of data processing is a requisite for modern analytics and machine learning applications, where High-Performance Data Analytics leverages from data/model parallelism frameworks, which, at the same time, can leverage from High Performance Computing infrastructures. This course introduces distributed analytics and streams, through frameworks like Apache Hadoop and Spark, along virtualization and containerization platforms that allow us to scale such frameworks in supercomputing environments.
Large Language Models Applied to Life Sciences by Miguel Vázquez
In this lecture we will discuss two broad themes in the application of LLMs to the area of molecular biology. We will start by illustrating how LLMs, along with other text processing components, can be used to mine scientific literature to extract knowledge that can help understand and model cellular systems. Then, we will turn our attention away from sequences of text and show how 'attention', which is the powerful paradigm that serves as a foundation for LLMs, has a natural application to biological sequences such as DNA and protein amino-acids.
Large Language Models Applied to Life Sciences II by Camila Pontes
Proteins play a crucial role in various cellular processes, making their study vital for understanding biology and disease mechanisms. In the same way a phrase is composed of words according to a specific grammar, a protein is composed of different amino acids following biological rules. This is why language models can be easily adapted to proteins. This practical session will give an overview of how different language models are applied in the study of proteins. First, we will investigate the outputs (logits and embeddings) of a BERT-based protein language model called ESM and learn to interpret and use them. Then, we will use a generative model called ProGen2 to generate artificial protein sequences and compare them to natural sequences
Advanced Techniques with LLMs: Building AI agents, Collaborative Problem-solving by Arvind Ramanathan
The lecture will cover how one can use large language models for life science applications, including synthetic biology, drug discovery and biomaterials design. We will provide an overview of how one can train foundation models using biological sequence, structure, and natural text description and use these models in “understanding the code of life.” Next, we will use these models to demonstrate some downstream applications, including protein/ gene design using natural language-based prompting, designing antimicrobial peptides, and optimizing drug discovery workflows by incorporating experimental observations. We will also present an overview of how AI models can be instantiated as scientific assistants (or as agents) to collaboratively build and learn new skills in the context of designing novel experiments in the laboratory. Hands-on exercises will be presented to guide the discussion
Introduction to Quantum Computing and Qilimanjaro by Marta Estarellas
Conventional computers are reaching their limits. Complex simulation problems of the world around us in the fields of chemistry, physics, and materials, or optimization problems prevalent in the industry, do not find efficient solutions with current technology. Moreover, the growing data processing requirements of the digital revolution, now even more so with the inclusion of Artificial Intelligence, make a new computing paradigm, more powerful and sustainable, necessary. In this talk, we will explore the fundamentals of quantum computing, its advantages, applications, and challenges, as well as Qilimanjaro's efforts towards this new and promising technology and its goal of creating the first Quantum Data Center in Europe in the heart of Barcelona to solve problems that, today, cannot be resolved.
Introduction to Quantum Computing with BSC Quantum Systems by Mireia Tena, Adria Blanco, Victor Sanchez and Javier Sabariego
This workshop aims to provide an in-depth introduction to the practical usage of BSC's quantum computers.
Participants will gain hands-on experience in quantum programming and learn how to execute quantum algorithms on BSC's state-of-the-art quantum hardware.
The session will cover the following sections:
Overview and Introduction
Introduction to BSC quantum computing resources
Hands-On Exercises
Setting up the development environment
Practical exercises.
Q&A and Troubleshooting
Open forum for participants to ask questions
Pushing RISC-V into HPC and AI by Jesús Labarta
The course will present the philosophy and results of the activity within the European Processor Initiative (EPI) to design a RISC-V vector processor to be used in HPC and how it could evolve in the direction of supporting AI workloads. Prof. Labarta will focus on the vision of how long vector architectures address fundamental issues in HPC computing, such as expressing concurrency and dealing with latency. He will also discuss how the Open Standard RISC-V vector ISA provides a foundation on which that vision can be deployed and proposals by BSC researchers in cooperation with NEC on an Integrated Matrix Extension of the ISA to seamlessly address HPC and AI workloads. Prof. Labarta will describe the architecture of the RISC-V processor designed for the EPI project and its software environment and will present performance and analysis results obtained on an FPGA emulator implementing the same RTL of the taped-out test chip as well as other emulation environments. The lecture will show the importance of very detailed performance analysis capabilities to better understand the behaviour of this architecture and other state-of-the-art systems in HPC and AI.
Presentation of the contest “Investigating the Use Cases and Applicability of Arm's Scalable Matrix Extension Through Simulation” by Finn Wilkinson
In recent years, many different matrix-centric instruction set extensions have been introduced with the aim of accelerating dense matrix-matrix multiplication (GEMM) workloads on the CPU. Examples include Intel's Advanced Matrix eXtension (AMX), IBM's Matrix Math Accelerator (MMA), and Arm's Scalable Matrix Extension (SME). Each are implemented differently; with the number of tile-like registers and type of assembly instructions available for use differing greatly between vendors.
Arm's SME is built upon their Scalable Vector Extension (SVE), providing the programmer with a single square register that accumulates the result of outer-product operations. Depending on the data type used, this register (ZA) can be targeted as a whole or as square sub-tiles. Additionally, vectors of data can be moved in and out of ZA either to memory or SVE vector registers. This provides a programmer the necessary building blocks to construct a matrix-multiplication kernel via the sum of outer-products. Subsequent additions to the ISA extension (SME2) introduces instructions such as dot-product which can use single or multiple SVE vectors for each source operand; extending the types of kernels that SME can be used for to matrix-vector multiplication and more.
As (other than Apple's M4 processor) no hardware implementations of SME exist, it is difficult to gauge the performance advantages SME may provide, or what such a core microarchitecture would look like. Thus, we must rely on simulation tools in order to perform these design space exploration studies. The Simulation Engine (SimEng) is a cycle-level, open source, microarchitectural simulation tool from the University of Bristol that provides such a platform to do said studies. Supporting the execution of regular (static) Linux ELF binaries targeting Armv9.2-a, hypothetical SVE and SME core designs can be modelled and evaluated quickly and easily.
For this contest, students will use SimEng to investigate ways in which SME (as of Armv9-A A64 Version 2021-06) can be used to improve the performance of an existing benchmark, mini-app, or key section of code of their choice. They should begin with a model of Fujitsu's A64FX (available in the SimEng repository) and add to this an implementation of SME. Given A64FX's unique design, some investigation into if the existing microarchitecture can suitably support SME (i.e. a high percentage of peak theoretical performance can be attained), and if the design would benefit from any modifications, should be performed. Although SME is primarily designed to accelerate matrix-multiplication kernels using outer-product instructions, students are encouraged to think outside the box and discover new ways to utilise the instruction set extension. The compute performance of SME should be the primary focus, although, SimEng's support for SST does allow exploration into how the memory hierarchy effects SME's performance. Standalone, SimEng models a perfect (100% hit rate) L1 cache with configurable access latency, core-to-L1 bandwidth, and permitted number of LD/STR requests per cycle.
Understanding LLMs & Life after Instruction Tuning: RLHF, LFPF, DPO, PPO. by Javier Aula-Blasco
The first part of this lecture will offer a 101 exploration of the architecture, ""life-cycle"", and actual capabilities of Large Language Models (LLMs). The focus will be on how these models process and generate human-like text, all the available methods for evaluating their performance, and the limitations inherent in deploying and using these models.
The second part of this lecture will delve into advanced techniques for refining language models post-instruction tuning. The discussion will encompass Reinforcement Learning from Human Feedback (RLHF), Learning from Preference Feedback (LFPF), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO). Emphasis will be placed on how these methods enhance model performance, adaptability, and alignment with human preferences, driving more effective and nuanced AI applications.
Machine Learning in Materials Research by Priya Darshan Vashishta and Rajiv K. Kalia
Lecture description to come.