Lecture Descriptions

Developing complex workflows that integrate HPC, Artificial Intelligence and Data analytics by Rosa M. Badía

The evolution of High-Performance Computing (HPC) systems towards every-time more complex machines is opening the opportunity of hosting larger and heterogeneous applications. In this sense, the demand for developing applications that are not purely HPC, but that combine aspects or Artifical Intelligence and or Data analytics is every time more common. However, there is a lack of environments that support the development of these complex workflows. The project will present PyCOMPSs, a parallel task-based programming in Python. Based on simple annotations, sequential Python programs can be executed in parallel in HPC-clusters and other distributed infrastructures. PyCOMPSs has been extended to support tasks that invoke HPC applications and can be combined with Artificial Intelligence and Data analytics frameworks. Some of these extensions are made in the framework of the eFlows4HPC project, which in addition is developing the HPC Workflows as a Service methodology to make it easier the development, deployment, execution and reuse of workflows. The lesson will present the current status of the PyCOMPSs programming model and how it is being extended in the eFlows4HPC project towards the project needs. Also, the HPCWaaS methodology will be introduced.

Specializing Processors for ML by Luca Benini

Specializing Processors for ML - Topics covered: ML workload analysis and key requirements. Classical instruction set architectures (ISAs) limitations for ML. ISA Extensions for ML- Micro-architecture of ML-specialized cores. PPA (power performance area) optimization. Energy efficiency analysis.

From Single to Multi-Core Heterogeneous SoCs for ML by Luca Benini

From Single to Multi-Core Heterogeneous SoCs for ML - Topics covered: Low-Power Multi-core ML SoCs architecture. Improving efficiency, PPA. Integration of cores and Hardwired ML accelerators. Memory hierarchy challenges and solutions.

Scaling up ML Hardware Architectures by Luca Benini

Scaling up ML Hardware Architectures - Topics Covered: Heterogeneous Accelerators, Scalable Accelerators, Near-memory and In-memory architectures. Scaling beyond single die: Systems-in-package. Chiplets

Distributed Data Analytics in Supercomputing Systems: Theory by Josep Lluis Berral

Distribution of data processing is a requisite for modern analytics and machine learning applications, where High-Performance Data Analytics leverages from data/model parallelism frameworks, which, at the same time, can leverage from High Performance Computing infrastructures. This course introduces distributed analytics and streams, through frameworks like Apache Hadoop and Spark, along virtualization and containerization platforms that allow us to scale such frameworks in supercomputing environments

RISC-V Vector by Roger Ferrer and Jesús Labarta

  • Course Intro
  • Vectorization with the RISC-V Vector Extension

The EPI project by Roger Ferrer and Jesús Labarta

  • The EPI project
  • Philosophy behind the EPAC RVV design
  • EPAC: overall design
  • Test chip and SDVs (description)

Software Development Vehicles by Roger Ferrer and Jesús Labarta

  • Vehave & MUSA. Presentation (model parameters, some results) and demo
  • SDV @ FPGA. Presentation. Demo of access and basic instrumentation and trace

Use case by Roger Ferrer and Jesús Labarta

  • Application analysis vectorization and optimization loop
  • Closure

Neuromorphic engineering for low-power edge intelligence by Charlotte Frenkel

The field of neuromorphic engineering aims at replicating the brain’s key organizing principles in silico toward order-of-magnitude efficiency improvements compared to current processor architectures. It is now included in worldwide research roadmaps, has seen a x10 increase in yearly research output over the last decade, and fuels interest from large industrial players as well as a flourishing landscape of new startups. As the field is not yet consolidated, a wide diversity of design approaches are still actively being explored. This lecture will start with an overview of the key trends in neuromorphic engineering. From brain-inspired computation and learning algorithms to digital architectures, this course will then show how neuromorphic hardware/algorithm co-design can best serve the development of low-power edge intelligence.

European Processor Initiative: cornerstone of EU digital agenda and EU digital sovereignty by Mario Kovač

The importance of high-performance computing (HPC) has been recognized as key for most segments of industry and society. However, the need to collect and efficiently process these vast amounts of data requires exascale computing systems (capable of calculating at least 1018 floating point operations per second) and that comes at a price. The approach to HPC systems design requires significant changes for the exascale era. Top state of the art most energy-efficient high-performance computing systems feature novel architectures for general purpose processors and integration of accelerator processors to achieve the best possible efficiency. Global race for brand new exascale microprocessor architectures was recognized by EU as the unique opportunity to create brand new EU microprocessor industry and address EU sovereignty challenges. European Processor Initiative (EPI), that we present here, is the strategic EU project with the goal to develop key components for the European Union to equip itself with a world-class supercomputing infrastructure: European general purpose processor and accelerator processors technologies with drastically better performance and power ratios and tackle important segments of broader and/or emerging HPC and Big-Data markets.

Highlighting Research through the ACM and ACM-W by Ruth Lennon

Many researchers find it difficult to get their work highlighted. Funding is a primary requirement to establishing connections across the globe. The ACM and the ACM-W provide a variety of mechanisms to promote the work of researchers. In this talk you will learn out membership levels can be used to promote your work. Grants and scholarships can be used to improve your connections to facilitate collaboration. The network of members provides unique access to high level researcher that you can work with to establish the next steps in your career path.

Customised vector acceleration in edge-computing RISC-V soft-cores by Mauro Olivieri

Due to the demand for high computing power on the extreme edge of the IoT hierarchy, there is a convergence of challenges between classical HPC and high-performance embedded computing on the edge. The lecture will analyse how the requirements of computing speed and power efficiency translate into fundamental quantitative concepts for processor design, taking into account the inter-relations between circuit level and micro-architecture level. Then, the focus of the lecture will be on vector acceleration support integrated in soft-processors to be implemented in FPGA devices, which are a first technology choice for embedded computing. The framework for illustrating the key concepts will be the open-source, RISC-V compliant Klessydra processor family. The perspective of configurable vector acceleration will be described as a viable way for edge computing optimization. The hands-on session will include simple instruction-level exercises, showing the resulting performance advantage.

The European Processor Initiative: The Memory Subsystem of EPAC by Per Stenström

The European Processor Initiative is a major European undertaking to develop computer technology that promises higher performance / power ratios. One of the project streams develop a RISC-V based accelerator. The project was launched in 2018 and has already fabricated a test-chip that fulfills its objectives. This lecture will focus on the memory subsystem of the RISC-V based accelerator that Chalmers and ZeroPoint Technologies have contributed to. One aspect of it is maintenance of cache coherence of directory-based solutions. We will review the challenges involved in verification and maintaining a high throughput of the directory protocol to meet the performance goals of accelerators. A second aspect covered is how to sustain a high memory bandwidth in future servers. To this end, ZeroPoint Technologies is offering a solution in which substantially higher memory bandwidth can be offered by compressing the data in main memory. ‘We will review the challenges involved how to ultrafast compress and decompress data, how to compact the compressed data to free up bandwidth and how to manage the compressed memory content.

Exploring Energy-Efficiency Tradeoffs for Parallel Scientific Applications by Valerie Taylor

The demand for computational power continues to drive the deployment of ever-growing parallel systems. Production parallel systems with hundreds of thousands of components are being designed and deployed. Future parallel systems are expected to have millions of processors and hundreds of millions of cores, with significant power requirements. The complexity of these systems is increasing, with hierarchically configured manycore processors and accelerators, together with a deep and complex memory hierarchy. As a result of the complexity, applications face a significant challenge in exploiting the necessary parameters for efficient execution. While reducing execution time is still the major objective for high performance computing, future systems and applications will have additional power requirements that represent challenge for energy efficiency. To embrace these key challenges, we must understand the complicated tradeoffs among runtime and power. This talk will present our methods and analyses to explore these tradeoffs for parallel, scientific applications.

Robot learning in assistive contexts poses both computing and ethics challenges by Carme Torras

The combination of autonomous robots, artificial intelligence and the internet of things offers immense possibilities to improve healthcare and assistance in daily living activities. A key challenge in this context is to attain safe, friendly and effective robot interaction with both caregivers and patients. This requires user modeling, personalization, reliable and situated communication, quick reaction to changing conditions… in sum, high robot adaptability. The challenge is being addressed through a myriad of machine learning procedures for perception, planning, control and actuation, which differ from the main trend in their relying on small data and latent spaces to cope with the strict computing demands involved. Assistive robots pose also ethical and social defies, many practical ones stemming from robot decision-making conflicting with human freedom and dignity. Several institutions are developing regulations and standards, and many ethics education initiatives have emerged aimed at schools, universities, and the general public, in which science fiction often plays a prominent role by highlighting the pros and cons of possible future scenarios.

Practical introduction to programming Deep Learning on a BSC Supercomputer: Theory by Jordi Torres

Next-generation of Deep Learning applications impose new and demanding computing infrastructures. How are the computer systems that support Deep Learning? How do you program these systems? The course consists of a first theoretical part where an introduction to Deep Learning is made (for all those students who do not have previous knowledge of deep learning.). The second part provides the student with written documentation so that he/she can follow the learning process at his/her own pace (even after the session is over.). This second part includes basic concepts in parallel training of Deep Learning applications, and a handbook hands-on that guides the student in practising an image classification problem using the marenostrum supercomputer. The teacher will be in the classroom to answer any questions.

Closing lecture by Mateo Valero

Mateo Valero, director of BSC will conclude the ACM school with a final talk.

Innovation path: From General Purpose CPU to Machine Learning Accelerator by Uri Weiser

For the last 40 years Process Technology and General Purpose Computer Architecture have been orchestrating the magnificent growth in computing performance; Process Technology was the main locomotive, while Computer Architecture contributed to only about a 1/3 of the performance outcome. Nowadays, we have reached a major turning point; Moore’s law and Dennard scaling are reaching their end, while performance requirements continue to soar for many new exciting applications. The combination of new the “killer applications” (Machine Learning) and the trend towards Heterogeneous computing provide a new innovative thrust in computer architecture. In this session I will be presenting the transformative change in computing in order to support the new “killer applications”. This change in computing based on Machine Learning calls for new architectures. I will begin by reviewing the new concept of Machine Learning and continue by explaining the computing structure. I will also elaborate on the demand for a new computing capacity that calls for the need of efficient architecture to mitigate the devices’ power. I will end by highlighting some of our specific research work that aims at techniques to improve Machine Learning Hardware efficiency and its implications.