Practical Approaches to Data Analytics in High Performance Computing by Michela Taufer
This tutorial studies the high impact aspects of data analytics and high performance computing from a practical perspective. In this tutorial students will learn how to use distributed programming models such as MapReduce and how to implement clustering and classification algorithms in MapReduce to enable scalable analysis of datasets across domains (e.g., medical, biological, and social sciences) on high-end clusters and supercomputers.
Specifically, the course provides a practical introduction to data analytics, blending theory (e.g., of clustering algorithms and techniques for dealing with noisy data) and practice (e.g., using Apache Spark, Jupyter Notebooks, and Github). Over the course of four modules, students will become familiar with modern data science methods, gain comfort with the tools of the trade, and explore real-world data sets. Upon completing the tutorial, students will have: used Jupyter notebooks to create reproducible, explanatory data science workflows; learned a modern MapReduce implementation, Apache Spark; implemented parallel clustering methods in Spark; studied strategies for overcoming the common imperfections in real-world datasets, and applied their new skills to extract insights from high-dimensional datasets.
Machine Learning Accelerators: From Cloud to Edge by Luca Benini & Francesco Conti
Machine learning (ML), and more specifically deep learning (DL) - training and inference - has rapidly become the key workload for a wide range of computing systems: high-performance supercomputers, cloud data centers, small clusters and servers, embedded computers and even mobile or IoT devices. As a consequence, industry and academia have been working with unprecedented focus to squeeze energy efficiency and performance by tuning and specializing systems, architectures, targeting them to machine learning workloads. This monumental effort has produced, in less than five years, an enormous proliferation of ML (DL) accelerator architectures, with a number of exciting new ideas and esigns. The goal of this lecture is to give a practical knowledge of the main architectural patterns used in the design of ML accelerators and to analyze their hardware and software embodiments. The lecture will also offer a deep dive on ultra-low power accelerators for edge devices.
Cache and Memory Compression Techniques by Per Stenström
Cache and memory capacity has a significant impact on performance, energy consumption and cost in today's computers ranging from smartphones, laptops/desktops to server systems in data centers. One promising approach to improve the uilization of a given amount of cache or main memory is to compress the data contained in it. However, to deal with a compressed cache or memory design involves several challenges including how to access compressed data in cache or memory fast by tackling the issues of choosing a compression algorithm and how to locate, compress and recompress data. This course offers an overview of state-of-the-art techniques for cache and memory compression and goes into detail in some of the recent ongoing advances in this area.
Energy Efficient Approach in ML Architecture by Uri Weiser
For the last 40 years Process Technology and Computer Architecture have been orchestrating the magnificent growth in computing performance; Process Technology was the main locomotive, while Computer Architecture contributed to only about a 1/3 of the performance outcome.It seems that we have reached a major turning point; Moore’s law is reaching its end and Dennard scaling has already ended, while performance requirements continue to soar for many new exciting applications. The combination of “new” killer applications (ML) and the trend towards Heterogeneous computing provide a new thrust in computer architecture. In this session we will present the environment change and development of an analytical model (MultiAmdahl) that provides basis to optimally use limited resources (e.g. memory, area) to achieve the target goal (e.g. maximum performance, minimum energy). We will apply the MultiAmdahl model to a specific Neural Network implementation.
The Convergence Between Supercomputers and IoT Nodes by Mauro Olivieri
Design concepts and their application: In the next future, the computer architecture panorama is going to exhibit a convergence of targets between HPC and AI embedded processing for IoT. The lecture will analyse how the requirements of computing speed, power efficiency and memory demand translate into processor design in such converging scenario, taking into account the inter-relations between circuit level and micro-architecture level. Two examples of an HPC vector processor and of an AI-oriented IoT embedded processor will be illustrated and compared. The hands-on session will include RTL design exercises showing architecture/circuit parameter exploration.
Hybrid Task Based Programming in the HPC Domain by Jesús Labarta
Prof. Labarta will start his seminar addressing how architectural evolutions and multicores have impacted the way we program our machines and his vision on how we should proceed with the objective of ensuring productivity and performance. In this context, programming models, their runtime implementation and the architectural support play a key role to succeed in our efforts towards exascale. The seminar will then continue presenting task-based programming (with emphasis on OpenMP and its OmpSs forerunner, developed at BSC), the hybridization with the MPI message passing interface (TAMPI, Task-aware MPI) and support for Dynamic Load Balancing through the DLB library. During his lecture, Prof. Labarta will discuss some examples showing how future runtime-aware architectures can provide the required support to the parallel runtime system in order to take the appropriate decisions.
Scheduling Policies and Performance Models for Disaggregated Architectures by David Carrera
Disaggregated Computing refers to a computer organization model in which resources are not exposed as pre-configured computers, but instead, pools of resources are packaged together (memory-dense nodes, storage-dense nodes, ccelerator-dense nodes), interconnected with very fast network fabrics, and supervised by a global system manager that can dynamically interconnect those resources to assemble general purpose nodes on-demand. In this lecture, we will explore the technologies used to enable resource pooling and disaggregation, the needed changes in scheduling policies and the applications that can benefit from this paradigm.