Lecture Descritptions

Machine Learning Accelerators: From Cloud to Edge by Luca Benini & Francesco Conti

Machine learning (ML), and more specifically deep learning (DL) - training and inference - has rapidly become the key workload for a wide range of computing systems: high-performance supercomputers, cloud data centers, small clusters and servers, embedded computers and even mobile or IoT devices. As a consequence, industry and academia have been working with unprecedented focus to squeeze energy efficiency and performance by tuning and specializing systems, architectures, targeting them to machine learning workloads. This monumental effort has produced, in less than five years, an enormous proliferation of ML (DL) accelerator architectures, with a number of exciting new ideas and esigns. The goal of this lecture is to give a practical knowledge of the main architectural patterns used in the design of ML accelerators and to analyze their hardware and software embodiments. The lecture will also offer a deep dive on ultra-low power accelerators for edge devices.

Cache and Memory Compression Techniques by Per Stenström

Cache and memory capacity has a significant impact on performance, energy consumption and cost in today's computers ranging from smartphones, laptops/desktops to server systems in data centers. One promising approach to improve the uilization of a given amount of cache or main memory is to compress the data contained in it. However, to deal with a compressed cache or memory design involves several challenges including how to access compressed data in cache or memory fast by tackling the issues of choosing a compression algorithm and how to locate, compress and recompress data. This course offers an overview of state-of-the-art techniques for cache and memory compression and goes into detail in some of the recent ongoing advances in this area.

Energy Efficient Approach in ML Architecture by Uri Weiser

For the last 40 years Process Technology and Computer Architecture have been orchestrating the magnificent growth in computing performance; Process Technology was the main locomotive, while Computer Architecture contributed to only about a 1/3 of the performance outcome.It seems that we have reached a major turning point; Moore’s law is reaching its end and Dennard scaling has already ended, while performance requirements continue to soar for many new exciting applications. The combination of “new” killer applications (ML) and the trend towards Heterogeneous computing provide a new thrust in computer architecture. In this session we will present the environment change and development of an analytical model (MultiAmdahl) that provides basis to optimally use limited resources (e.g. memory, area) to achieve the target goal (e.g. maximum performance, minimum energy). We will apply the MultiAmdahl model to a specific Neural Network implementation.

The Convergence Between Supercomputers and IoT Nodes by Mauro Olivieri

Design concepts and their application: In the next future, the computer architecture panorama is going to exhibit a convergence of targets between HPC and AI embedded processing for IoT. The lecture will analyse how the requirements of computing speed, power efficiency and memory demand translate into processor design in such converging scenario, taking into account the inter-relations between circuit level and micro-architecture level. Two examples of an HPC vector processor and of an AI-oriented IoT embedded processor will be illustrated and compared. The hands-on session will include RTL design exercises showing architecture/circuit parameter exploration.

Hybrid Task Based Programming in the HPC Domain by Jesús Labarta

Prof. Labarta will start his seminar addressing how architectural evolutions and multicores have impacted the way we program our machines and his vision on how we should proceed with the objective of ensuring productivity and performance. In this context, programming models, their runtime implementation and the architectural support play a key role to succeed in our efforts towards exascale. The seminar will then continue presenting task-based programming (with emphasis on OpenMP and its OmpSs forerunner, developed at BSC), the hybridization with the MPI message passing interface (TAMPI, Task-aware MPI) and support for Dynamic Load Balancing through the DLB library. During his lecture, Prof. Labarta will discuss some examples showing how future runtime-aware architectures can provide the required support to the parallel runtime system in order to take the appropriate decisions.

Scheduling Policies and Performance Models for Disaggregated Architectures by David Carrera

Disaggregated Computing refers to a computer organization model in which resources are not exposed as pre-configured computers, but instead, pools of resources are packaged together (memory-dense nodes, storage-dense nodes, ccelerator-dense nodes), interconnected with very fast network fabrics, and supervised by a global system manager that can dynamically interconnect those resources to assemble general purpose nodes on-demand. In this lecture, we will explore the technologies used to enable resource pooling and disaggregation, the needed changes in scheduling policies and the applications that can benefit from this paradigm.

Who can participate?

The School is open to participants from all countries and all nationalities. Advanced undergraduate and graduate (MSc and PhD) students, postdocs, young faculty, and other academic and industrial researchers interested in an intensive course on HPC Computer Architectures for AI and dedicated applications are welcome to apply. [MORE...]

Apply to the 2019 Barcelona ACM Europe Summer School

Admission to the Summer School will be based on the candidate’s profile, academic and professional experience, together with their motivation letter and the two recommendation letters. To be considered, complete he online application form here. The top 10 applicants will receive free registration. The deadline for applying is 1 May 2019.



Industry Sponsors