General purpose CPU history and the thrust towards a new era: Machine Learning Accelerator by Uri Weiser
For the last 40 years process-technology and General Purpose Computer Architecture have been orchestrating the magnificent growth in computing performance; Process Technology was the main locomotive, while Computer Architecture contributed to about a 1/3 of the performance outcome. Nowadays, we have reached a major turning point; Moore’s law and Dennard scaling are reaching their end, while performance requirements continue to soar for many new exciting applications. The combination of new the “killer applications” (Machine Learning) and the trend towards Heterogeneous computing provides a new innovative thrust in computer architecture.
In this session I will be presenting the transformative change in computing in order to support the new “killer applications”. This change in computing based on Machine Learning calls for new architectures. I will begin by reviewing the concept of Machine Learning and continue by explaining the main computing structures. I will also elaborate on the demand for a new computing capacity that calls for efficient architecture to mitigate the devices’ power. I will end by highlighting some of our specific research work that aims at techniques to improve Machine Learning Hardware efficiency and its implications.
Cache and Memory Compression Techniques by Per Stenström
Cache and memory capacity has a significant impact on performance, energy consumption and cost in today's computers ranging from smartphones, laptops/desktops to server systems in data centers. One promising approach to improve the uilization of a given amount of cache or main memory is to compress the data contained in it. However, to deal with a compressed cache or memory design involves several challenges including how to access compressed data in cache or memory fast by tackling the issues of choosing a compression algorithm and how to locate, compress and recompress data. This course offers an overview of state-of-the-art techniques for cache and memory compression and goes into detail in some of the recent ongoing advances in this area.
European Processor Initiative: cornerstone of EU digital agenda and EU digital sovereignty by Mario Kovač
The importance of high-performance computing (HPC) has been recognized as key for most segments of industry and society. However, the need to collect and efficiently process these vast amounts of data requires exascale computing systems (capable of calculating at least 1018 floating point operations per second) and that comes at a price. The approach to HPC systems design requires significant changes for the exascale era. Top state of the art most energy-efficient high-performance computing systems feature novel architectures for general purpose processors and integration of accelerator processors to achieve the best possible efficiency. Global race for brand new exascale microprocessor architectures was recognized by EU as the unique opportunity to create brand new EU microprocessor industry and address EU sovereignty challenges. European Processor Initiative (EPI), that we present here, is the strategic EU project with the goal to develop key components for the European Union to equip itself with a world-class supercomputing infrastructure: European general purpose processor and accelerator processors technologies with drastically better performance and power ratios and tackle important segments of broader and/or emerging HPC and Big-Data markets.
Highly parallel techniques & performance optimisation by Jesús Labarta
The course will present a vision on what I consider best practices and the programming models supporting them to efficiently exploit the large scale and computing power in the HPC systems we are facing. Assuming a basic knowledge of MPI and OpenMP, the dominant programming models in HPC platforms, we will describe fundamental aspects that should underpin the mindset how they are used at large scale. These include enabling asynchrony and overlap in general, decoupling the specification of computations from the resources where they have to run in a malleable way through task based models, understanding the interaction and synchronization coupling in hybrid/hierarchical programming models or abstracting/homogenizing heterogeneity are important components of such practices. Before actually deciding the approach to follow to optimize an application there is a critical need to really understand in detail the actual behavior of a program on a given platform and the fundamental characteristics that limit its performance. The course will actually start then by presenting the BSC tools and methodology to gain such deep insight, both qualitative and quantitative, including estimates of the potential gain to be expected from the refactoring work.
Simplifying the life-cycle management of HPC, data analytics and AI workflows by Rosa Badia and Jorge Ejarque
With Exaflop systems already here, High-Performance Computing (HPC) involves everytime larger and complex supercomputers. At the same time, the user community is aware of the underlying performance and eager to leverage it by providing more complex application workflows to leverage them. What is more, current application trends aim to use data analytics and artificial intelligence combined with HPC modeling and simulation. However, the programming models and tools are different in these fields, and there is a need for methodologies that enable the development of workflows that combine HPC software, data analytics, and artificial intelligence. PyCOMPSs is a parallel task-based programming in Python. Based on simple annotations, sequential Python programs can be executed in parallel in HPC-clusters and other distributed infrastructures. PyCOMPSs has been extended to support tasks that invoke HPC applications and can be combined with Artificial Intelligence and Data analytics frameworks. These extensions have been done in the eFlows4HPC project, which aims at providing a workflow software stack that fulfills the previous mentioned need. The project is also developing the HPC Workflows as a Service (HPCWaaS) methodology that aims at providing tools to simplify the development, deployment, execution, and reuse of workflows. In particular, the project has been working on the Container Image Creation service, a component which automates the creation of the container images tailored to a specific HPC platform. The lecture will be composed of two parts. The first part will consist of a presentation about the eFlows4HPC project with an especial emphasis on the PyCOMPSs programming model and the Container Image Creation service. The second part will include a hands-on where the students will be able to practice on the programming with PyCOMPSs and on the creation of HPC-ready container images with the Container Image Creation service.
Distributed Data Analytics for AI in Supercomputing Systems by Josep Lluís Berral
Distribution of data processing is a requisite for modern analytics and machine learning applications, where High-Performance Data Analytics leverages from data/model parallelism frameworks, which, at the same time, can leverage from High Performance Computing infrastructures. This course introduces distributed analytics and streams, through frameworks like Apache Hadoop and Spark, along virtualization and containerization platforms that allow us to scale such frameworks in supercomputing environments.
High-performance RISC-V computing and AI acceleration - an open platform perspective by Luca Benini and Andrea Bartolini
The RISC-V ISA is disrupting the computing continuum, from tiny microcontrollers to supercomputers. The lecture will discuss recent advances in high performance general-purpose processors and accelerators based on RISC-V with a special emphasis on the innovations enabled by the extensibility of the ISA. Several examples will be provided from the experience gathered in the context of the EPI (European Processor Initiative) in designing and prototyping open-source high-performance acceleration engines for Machine Learning and Artificial intelligence workloads (training and inference).
Vector computing on an embedded open-source processor platform by Mauro Olivieri
Due to the demand for high computing power on the extreme edge of the IoT hierarchy, there is a convergence of challenges between classic HPC and high-performance embedded computing. The lecture will be composed as follows: a 1 hour introductory lecture analysing the inter-relations between circuit level and microarchitecture level in the view of power efficiency, focusing on vector computing acceleration; a 1 hour training lecture on the architecture, tool chain and runtime libraries of the open-source, RISC-V compliant Klessydra processor family with parameterized vector acceleration; a 2 hours training exercise session on the programming and RTL-simulation of a Klessydra processor for executing widely known linear algebra kernel benchmarks. Several vector acceleration hardware configurations will be experimented by the attendants and compared in performance. Acknowledgement: the training exercise session setup will be taken care of by Dr Marcello Barbirotta and Dr Abdallah Cheikh, both from Sapienza University.
A gentle introduction to quantum computation by Alba Cervera
In this lecture, I will address the basic concepts of quantum computation, its potential applications and the state-of-the-art of this technology. We will review the motivation behind the use of quantum mechanical phenomena to process information and how can we implement these properties with real quantum devices. Finally, we will explore the potential of integrating quantum machines in HPC infrastructures.
A conversation with Leslie Lamport by Leslie Lamport
Turing laureate Leslie Lamport will entertain a conversation with our young scientist Alba Cervera on his past scientific life and his current research activities. The idea is to inspire the new generation of computer scientists for their future life and learn from one of the most brilliant computers scientists now and for the last several years. The contrast with a young scientist involved in novel computing science trends such as quantum computing will provide a stimulating environment for the students to engage and ask hopefully interesting questions. We assume that the students will have read more on Leslie’s work and accomplishments from the web and from the information which Leslie will circulate in advance.
Barcelona Supercomputing Centre Director's final keynote by Mateo Valero