Lecture Descriptions

Practical Introduction to programming Deep Learning on a supercomputer by Jordi Torres

Next-generation of Deep Learning applications impose new and demanding computing infrastructures. How are the computer systems that support Deep Learning? How to program these systems? This course will start with a Deep Learning introduction. Next, the student will learn basic concepts in distributed and parallel training Deep Learning. Finally, the student will be able to practice with an image classification problem using the marenostrum supercomputer.

Distributed Data Analytics in Supercomputing Systems by Josep Lluis Berral

Distribution of data processing is a requisite for modern analytics and machine learning applications, where High-Performance Data Analytics leverages from data/model parallelism frameworks, which, at the same time, can leverage from High Performance Computing infrastructures. This course introduces distributed analytics and streams, through frameworks like Apache Hadoop and Spark, along virtualization and containerization platforms that allow us to scale such frameworks in supercomputing environments.

Many shades of Machine Learning Acceleration – an Open RISC-V Platform Perspective by Luca Benini

The next wave of “Extreme Edge AI” pushes signal processing and machine learning aggressively towards sensors and actuators, with sub mW (TinyML) power budgets, while at the same time raising the bar in terms of accuracy and flexibility. To succeed in this balancing act, we need principled ways to walk the line between general-purpose and highly specialized architectures. In the lecture I will detail on how to walk the line, drawing from key ideas in the literature, best practices in industry, and the 40+ chips tape-out experience of the open PULP (parallel ultra-low power) platform, based on RISC-V processors coupled with domain-specific acceleration engines for Machine Learning applications from the Cloud to the Extreme Edge.

Performance analysis and hybrid programming in HPC by Jesús Labarta

The course will present a vision of the evolution of processor architecture, its impact on how HPC systems are being used and programmed and what are the fundamental approaches I believe HPC computing should be taking in the direction of maximizing efficiency and productivity. In this context, the course focus in two topics, performance analysis and hybrid programming. We will present the BSC tools for performance analysis and how they can be used to deeply understand how our current HPC systems and applications behave. This will include the Paraver trace browser, the Dimemas simulator and a methodology for a top down analysis of applications performance. In terms of programming model, we will focus on asynchronous and malleable programming with MPI + OpenMP tasks. Simple assignments will be provided to apply in practice some of the methodologies described. Experience gained in the course of the European processor initiative will be described.

Energy Efficient Approach in ML Architecture by Uri Weiser

For the last 40 years Process Technology and Computer Architecture have been orchestrating the magnificent growth in computing performance; Process Technology was the main locomotive, while Computer Architecture contributed to only about a 1/3 of the performance outcome. It seems that we have reached a major turning point; Moore’s law is reaching its end and Dennard scaling has already ended, while performance requirements continue to soar for many new exciting applications. The combination of new “killer applications” (Machine Learning) and the trend towards Heterogeneous computing provide a new thrust in computer architecture. In this session I will present the transformative change in computing to support the new “killer applications”. This change in computing based on Machine Learning calls for new architectures. First I’ll review the new concept of Machine Learning and explain the computing structure. The new huge demand in computing capacity calls for the concept of efficient architecture to mitigate the power. I will highlight some of our specific research aiming at techniques to improve Machine Learning Hardware efficiency and its implications.

Vector acceleration in HPC and Edge Devices by Mauro Olivieri

The computer architecture scenario is exhibiting a convergence of targets between classical HPC and high-performance embedded computing on the Edge. The lecture will analyse how the requirements of computing speed, power efficiency and memory traffic demand translate into quantitative concepts for processor design in such converging scenario, taking into account the inter-relations between circuit level and micro-architecture level. The focus of the lecture will be vector acceleration support. Two examples of an HPC vector processor and of an IoT-oriented embedded soft-processor will be illustrated and compared. The hands-on session will include architecture design exercises and instruction-level exercises, showing architecture/circuit parameter exploration.