Keynotes & Courses

Keynotes

Data Science in the Era of Heterogeneity

given by Gustavo Alonso

Computing platforms are evolving rapidly along many dimensions: processors, specialization, disaggregation, acceleration, smart memory and storage, etc. Many of these developments are being driven by data science but also arise from the need to make cloud computing more efficient. From a practical perspective, the result we see today is a deluge of possible configurations and deployment options, most of them too new to have a precise idea of their performance implications and lacking proper support in the form of tools and platforms that can manage the underlying diversity. The growing heterogeneity is opening up many opportunities but also raising significant challenges. In the talk I will describe the trend towards specialization at all layers of the architecture, the possibilities it opens up, and demonstrate with real examples how to take advantage of heterogeneous computing platforms.  I will also discuss opportunities for systems research in the context of data science both on the software as well as on the hardware side.


Re-configuring data practices for Intelligent, Reliable and Responsible decision-making systems 

given by Timos Sellis

In this talk we will focus on how data management practices need to be re-configured in order to support Intelligent, Reliable and Responsible decision-making systems. The appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this talk, we highlight these interconnected challenges and introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of Responsible Data Management.


The Data Systems Grammar: Self-designing Systems for the era of AI

given by Stratos Idreos

In this talk we will focus on how data management practices need to be re-configured in order to support Intelligent, Reliable and Responsible decision-making systems. The appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this talk, we highlight these interconnected challenges and introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of Responsible Data Management.



Courses

Querying Graph Databases 

given by Angela Bonifati

Graph data modeling and querying arises in many practical application domains such as social, biological and fraud detection networks where the primary focus is on concepts and their relationships and the 
complex graph patterns involving multiple labels and lightweight recursion. In this lecture, I present a concise unified view on the current challenges which arise over the complete life cycle of formulating and processing queries on graph databases. To that purpose, I present all major concepts relevant to this life cycle, formulated in terms of a common and unifying ground: the property graph data model—the predominant data model adopted by modern graph database systems. I also introduce property graph schemas and graph indexing techniques for label-constrained reachability queries on graph databases. Practical work will follow with a focus on query-driven pangenomic analysis using an open-source graph database. 


The power of graph neural networks 

given by Floris Geerts

Graph neural networks (GNNs) have become a prominent technique for graph learning tasks such as vertex and graph classification, link prediction and graph regression. It was recently shown that classical GNNs have limited expressive power. This resulted in the proposal of a plenitude of new - more expressive - graph learning architectures. In this course we will present a systematic investigation in the expressive power of these different architectures. We here use techniques from areas such as graph algorithms, logic and query languages. The goal is to introduce various ways of boosting the expressive power of GNNs and to provide techniques to estimate the expressive power of GNNs. The conceptual part of the course is complemented with some practical coding sessions showing how theory and practice compare. 


Spatial and multi-dimensional indexing and data analytics 

given by Nikos Mamoulis

Smart telecommunication and IoT devices have become a commodity and have brought to availability huge volumes of spatial and multi-dimensional data, rendering their search and analysis affordable by small companies and even for personal use. In this course, we will study the most fundamental spatial and multi-dimensional access methods and the most popular search and analysis tasks that they support. 
 
Outline: 
 
Part 1. Fundamental spatial access methods and search operations for spatial and low-dimensional data 
Spatial data types, relationships, and queries. Spatial data analytics. Multi-dimensional access methods for points. Spatial access methods for non-point data. Evaluation of spatial queries and data analytics tasks. 

Part 2. Access methods and similarity search for multi-dimensional data and metric spaces 
Distance and similarity. The curse of dimensionality. Similarity search in multi-dimensional metric spaces. Multi-dimensional data analytics tasks. 
 
Part 3. Scalable spatial access methods 
Scalable in-memory spatial indexing. Parallel and distributed spatial data management. Big spatial data analytics. 
 
Part 4. Learned and adaptive spatial and multi-dimensional indexing 
Learned indexes for multi-dimensional data. Adaptive indexes for spatial and multi-dimensional data. 


Computational Methods to Counter Online Misinformation  

given by Paolo Papotti

Misinformation is an important problem but mitigators are overwhelmed by the amount of false content that is produced online every day. To assist human experts in their efforts, several projects are proposing computational methods that aim at supporting the detection of malicious content online. In the first part of the lecture, we will overview the different approaches, spanning from solutions involving humans and a crowd of users to fully automated approaches. In the second part, we will focus our attention on the data driven verification for computational fact checking. We will review methods that combine solutions from the ML and NLP literature to build data driven verification, such as those that translate text claims into SQL queries on relational databases. We will also cover how the rich semantics in knowledge graphs and pre-trained language models can be used to verify claims and produce explanations, which is a key requirement in this space. Better access to data and new algorithms are pushing computational fact checking forward, with experimental results showing that verification methods enable effective labeling of claims, both in simulations and in real world efforts. However, while fact checkers start to adopt some of the resulting tools, the misinformation fight is far from being won. In the last part of this lecture, we will cover the opportunities and limitations of computational methods and their role in fighting misinformation. 

 

Participate

Apply now!

Speakers

Meet the speakers

Organizers

Organizations behind the school

Committee

The school committee

Venue

Accomodation and exploration