Keynotes & Courses
Keynotes
Emotion detection for Misinformation and Conspiracy Detection
given by Sophia Ananiadou
Misinformation evokes emotional reactions associated with high arousal emotions: anger, anxiety, fear, disgust. Emotions play a significant part in the detection of misinformation, fake news and rumours. Most studies on emotion-based misinformation detection overlook regression tasks (sentiment strength, emotion intensity) which provide more fine-grained affective features. I will present a suite of open Emotional Large Language Models, with instruction-tuning datasets, and an evaluation benchmark for multi-task affective analysis. The goal is to evaluate and enhance the comprehensive and complex affective analysis capabilities of LLMs. Based on these emotional LLMs, I will introduce another suite of LLMs able to detect conspiracies based on affective information.
ML and Generative AI for data system
given by Tim Kraska
Machine learning (ML) and Generative AI (GAI) is changing the way we build, operate, and use data systems. For example, ML-enhanced algorithms, such as learned scheduling algorithms and indexes/storage layouts are being deployed in commercial data services, GAI-code assistant help to more quickly develop features, ML-based techniques simplify operations by automatically tuning system knobs, and GenAI-based assistants help to debug operational issues. Most importantly though, Generative AI is reshaping the way users interact with data systems. Even today, all leading cloud providers already offer natural language to SQL (NL2SQL) features as part of their Python Notebook or SQL editors to increase the productivity of analysts. Business-line users are starting to use natural language as part of their visualization platforms or enterprise search, whereas application developers are exploring new ways to expose (structured) data as part of their GAI-based experiences using RAG and other techniques. Some even go so far and say that ``English will become the new SQL'' despite the obvious challenges that English is often more ambiguous.
Arguably, industry is leading many of these efforts and they are happening at unprecedented speed - almost every week there is a new product announcement. Yet, a lot of the work feels ad-hoc and many challenges remain to make ML/GAI for systems in all these areas really practical despite all the product announcements. In this talk I will provide an overview of some of these recent developments and outline how the academic solution often differs from the ones deployed in industry. Finally, I will list several opportunities for academia to not only contribute but also build a better, more grounded foundation.
Algorithmic fairness beyond statistical parity: The case of fair matching and ranking
given by Francesco Bonchi
Matching and ranking algorithms are used routinely in many decision-making processes in spheres such as health (e.g., recipients list for solid organs transplantation, triage in pandemic), education (e.g., university admission), or employment (e.g., selection for a job), which can have a direct tangible impact on people’s life. Although many valid solutions may exist to a given problem instance, when the elements that shall be matched or ranked in a solution correspond to individuals, it becomes of paramount importance that the solution is selected fairly. In this talk we present a recent line of research about individual fairness in combinatorial problems.
From Data to Discovery: Harnessing HPC and DOE Facilities for Scientific Progress
given by Michael Papka
DOE user facilities like particle accelerators and light sources have been crucial in U.S. scientific research. Serving thousands of researchers annually, these facilities have significantly increased data production due to technological advancements. Computational science, supported by institutions such as Argonne National Laboratory and its Leadership Computing Facility (ALCF), has been instrumental in driving discoveries across diverse fields using this data. As next-generation and upgraded DOE facilities generate even more data, integrating HPC resources becomes essential for sustained scientific progress. Drawing on years of experience, Argonne develops software frameworks and deploys advanced computing resources that facilitate the integration of experimental and computational methods. The ALCF continues to evolve to meet the growing demands of data-driven science, fostering collaboration between experimental and computational researchers and leveraging the unique capabilities of DOE user facilities. This talk highlights the synergy between DOE user facilities and ASCR computing resources as pivotal in advancing knowledge and uncovering new scientific opportunities.
Courses
Modern Query Processing Techniques for Graph-structured Relations
given by Semih Salihoğlu
This mini-course will introduce several query processing techniques for database management systems, such as predefined joins, worst-case optimal joins, and factorization, which find their best applications on graph workloads, i.e., workloads over records that depict many-to-many relationships. We overview: (i) the foundations of these techniques; (ii) the current design choices different DBMSs have made to integrate these techniques; and (iii) the challenges for existing implementation approaches, which provide promising avenues for further research. In the last part of the course, we will go through an exercise of optimizing one of these techniques, to get a hands-on experience of the benefits and shortcomings of these techniques.
Exploring Large Language Models: Evolution, Evaluation, and Real-World Applications
given by Nassos Katsamanis, Sokratis Sofianopoulos, Giorgos Paraskevopoulos, Prokopis Prokopidis
In this course, participants will gain a comprehensive understanding of large language models (LLMs) and their practical applications. The course begins with a deep dive into the evolution of LLMs, tracing their development from early transformers to advanced models like GPT-2 and GPT-3. Attendees will explore the benchmarks and metrics used to evaluate LLMs, learning how to assess their performance effectively. The course then shifts to practical adaptation techniques, with a specific focus on fine-tuning LLMs for a low-resourced language. In this section, we will be sharing valuable insights gained by our recent experience training and releasing the first open-source LLM for Greek. Next, participants will learn about Retrieval-Augmented Generation (RAG), an innovative approach to enhance LLM performance by integrating retrieval mechanisms. This session includes an overview of the tools and methodologies used to implement RAG. The course culminates in a hands-on lab where attendees will apply their newfound knowledge to implement RAG for a specific application. This practical experience will solidify their understanding and equip them with the skills needed to leverage LLMs in real-world scenarios.
The course will include the following parts:
1. Part 1: "The Evolution of Large Language Models: From Transformers to GPT"
- Focus on the historical development and advancements in LLMs.
2. Part 2: "Evaluating LLMs: Benchmarks and Metrics"
- Detailed discussion on how LLMs are evaluated using various benchmarks and metrics.
3. Part 3: "Fine-Tuning LLMs: Adapting to the Greek Language with Low Resources”
- Insights into the process of fine-tuning LLMs for specific languages and low-resource scenarios.
4. Part 4: "Retrieval-Augmented Generation: Enhancing LLM Performance in the Real World”
- Explanation and tools for implementing retrieval-augmented generation.
5. Part 5: "Hands-On Lab: Implementing Retrieval-Augmented Generation"
- Practical lab session focused on applying RAG techniques to specific applications.
Data Science with Graphs
given by Matteo Lissandrini
The course will cover topics related to graph data management applied to data exploration and analysis.
It will guide towards modelling data in different domains with network models, property graphs, and knowledge graphs models and how they enable different data unerstanding tasks.
We will then cover few representative algorithms and techniques to analyze data modelled as graphs.
Proactive Streaming Analytics at Scale: A Journey from the State-of-the-art to a Production Platform
given by Nikos Giatrakos
Proactive streaming analytics continuously extract real-time business value from massive data that stream in data centers or clouds. This requires (a) to process the data while they are still in motion; (b) to scale the processing to multiple machines, often over various, dispersed computer clusters, with diverse Big Data technologies; and (c) to forecast complex business events for proactive decision-making. Combining the necessary facilities for proactive streaming analytics at scale entails: (I) deep knowledge of the relevant state-of-the-art, (II) cherry-picking cutting edge research outcomes based on desired features and with the prospect of building interoperable components, and (III) building components and deploying them into a holistic architecture within a real-world platform. In this tutorial, we drive the audience through the whole journey from (I) to (III), delivering cutting edge research into a commercial analytics platform, for which we provide a hands-on/demo experience.
Can computers understand what is happening? An introduction to complex event recognition
given by Alexander Artikis, Periklis Mantenoglou
Complex Event Recognition (CER) refers to the activity of detecting patterns in streams of continuously arriving “event” data over (geographically) distributed sources. CER is a key ingredient of many contemporary Big Data applications that require the processing of such event streams in order to obtain timely insights and implement reactive and proactive measures. Examples of such applications include the recognition of human activities on video content, emerging stories and trends on the Social Web, traffic and transport incidents in smart cities, error conditions in smart energy grids, violations of maritime regulations, cardiac arrhythmias and epidemic spread. In each application, CER allows to make sense of streaming data, react accordingly, and prepare for counter-measures. In this course, we will present formal methods for CER, and illustrate them using the domain of maritime situational awareness.
Invited & Sponsor Talks
Data Management innovation at Amazon Web Services (Invited Talk)
given by Ippokratis Pandis
Amazon Web Services is the largest provider of data management services in the world, even though it only started providing such services in 2011. In this talk we will try to explain some of the reasons behind this success. First we'll talk about the anatomy of a data management service in the cloud, and then we'll show that AWS is leading the way and constantly building on all layers of these systems.
Cutting-edge research in PPC
given by George Papadakis
Public Power Corporation (PPC) constitutes the leading energy utility in Southeastern Europe. Through its multidisciplinary R&D group, PPC is exploring advances in all aspects of energy generation and consumption as well as in e-mobility and energy transition. To this end, PPC currently participates in 23 European research projects, which span from smart grids and energy management to cyber-security and digitalization of charging stations or power plants of any type (i.e., thermal, solar or wind). In this talk, we will briefly discuss PPC's main research topics and will then delve into the main data- and AI-driven applications we are developing in the context of our research projects. Special care will be taken to highlight the impact of these applications in the energy domain.
Challenges and achievements in Kpler R&D
given by Kostantina Bereta
Kpler Research Labs is a department within Kpler dedicated to the development of cutting-edge technologies in AI, Big Data, IoT, and hardware development. Our mission is to create innovative solutions that cater to the needs of users and stakeholders in the maritime, commodities, and power sectors.
In this talk, Dr. Konstantina Bereta, Director of Kpler Research Labs, will provide an overview of Kpler’s recent R&D achievements. She will also discuss the challenges and successes encountered in transforming research into market-ready products.