Human-Centric Machine Learning (HCML) Reading Group

A reading group focused on human-centric machine learning organised by the PhD students at ELLIS Alicante.

The HCML reading group aims to gather researchers and students interested in both getting a wide vision of the topic and also deeply diving into it. Reading papers about different topics inside HCML, and also discussing new problem set-ups, different approaches, and sources of bias will lead us to a broad understanding of how algorithmic and human decisions influence each other.

Communication:

Google Group, where all group communications take place. Important links, suggested papers and so on are written in this channel.
ELLIS PhD & Postdoc Slack channel: #rg-human-centric-ml
~~Signal Channel~~: we moved the conversation from Signal to Google Groups

Meeting link: bit.ly/ellis-hcml-rg

Read and follow our code of conduct!

Meetings

iLLuMinaTE: An LLM-XAI Framework Leveraging Social Science Explanation Theories Towards Actionable Student Performance Feedback

Authors: Vinitra Swamy, Davide Romano, Bhargav Srinivasa Desikan, Oana-Maria Camburu, Tanja Käser (2025)

Article link: https://arxiv.org/abs/2409.08027

Abstract: Recent advances in eXplainable AI (XAI) for education have highlighted a critical challenge: ensuring that explanations for state-of-the-art models are understandable for non-technical users such as educators and students. In response, we introduce iLLuMinaTE, a zero-shot, chain-of-prompts LLM-XAI pipeline inspired by Miller (2019)’s cognitive model of explanation. iLLuMinaTE is designed to deliver theory-driven, actionable feedback to students in online courses. iLLuMinaTE navigates three main stages — causal connection, explanation selection, and explanation presentation — with variations drawing from eight social science theories (e.g. Abnormal Conditions, Pearl’s Model of Explanation, Necessity and Robustness Selection, Contrastive Explanation). We extensively evaluate 21,915 natural language explanations of iLLuMinaTE extracted from three LLMs (GPT-4o, Gemma2-9B, Llama3-70B), with three different underlying XAI methods (LIME, Counterfactuals, MC-LIME), across students from three diverse online courses. Our evaluation involves analyses of explanation alignment to the social science theory, understandability of the explanation, and a real-world user preference study with 114 university students containing a novel actionability simulation. We find that students prefer iLLuMinaTE explanations over traditional explainers 89.52% of the time. Our work provides a robust, ready-to-use framework for effectively communicating hybrid XAI-driven insights in education, with significant generalization potential for other human-centric fields.

Presenter: Vinitra Swamy

Date: 2025-03-25 15:00 (CET)

Online: Meeting link
Aligned LLMs Are Not Aligned Browser Agents

Authors: Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Elaine T Chang, Vaughn Robinson, Shuyan Zhou, Matt Fredrikson, Sean M. Hendryx, Summer Yue, Zifan Wang (2025)

Article link: https://openreview.net/forum?id=NsFZZU9gvk

Abstract: Despite significant efforts spent by large language model (LLM) developers to align model outputs towards safety and helpfulness, there remains an open question if this safety alignment, typically enforced in chats, generalize to non-chat and agentic use cases? Unlike chatbots, agents equipped with general-purpose tools, such as web browsers and mobile devices, can directly influence the real world, making it even more crucial to ensure the safety of LLM agents. In this work, we primarily focus on red-teaming browser agents, LLMs that interact with and extract information from web browsers. To this end, we introduce Browser Agent Red teaming Toolkit (BrowserART), a comprehensive test suite consisting of 100 diverse browser-related harmful behaviors and 40 synthetic websites, designed specifically for red-teaming browser agents. Our empirical study on state-of-the-art browser agents reveals a significant alignment gap between the base LLMs and their downstream browser agents. That is, while the LLM demonstrates alignment as a chatbot, the corresponding agent does not. Moreover, attack methods designed to jailbreak aligned LLMs in chat settings transfer effectively to browser agents - with simple human rewrites, GPT-4o and GPT-4-turbo-based browser agents attempted all 100 harmful behaviors. We plan to publicly release BrowserART and call on LLM developers, policymakers, and agent developers to collaborate on enhancing agent safety.

Presenter: Erik Derner & Kristina Batistič

Date: 2025-03-04 15:00 (CET)

Online: Meeting link
On the Privacy Risks of Algorithmic Fairness

Authors: Hongyan Chang and Reza Shokri (2021)

Article link: https://ieeexplore.ieee.org/abstract/document/9581219

Abstract: Algorithmic fairness and privacy are essential pillars of trustworthy machine learning. Fair machine learning aims at minimizing discrimination against protected groups by, for example, imposing a constraint on models to equalize their behavior across different groups. This can subsequently change the influence of training data points on the fair model, in a disproportionate way. We study how this can change the information leakage of the model about its training data. We analyze the privacy risks of group fairness (e.g., equalized odds) through the lens of membership inference attacks: inferring whether a data point is used for training a model. We show that fairness comes at the cost of privacy, and this cost is not distributed equally: the information leakage of fair models increases significantly on the unprivileged subgroups, which are the ones for whom we need fair learning. We show that the more biased the training data is, the higher the privacy cost of achieving fairness for the unprivileged subgroups will be. We provide comprehensive empirical analysis for general machine learning algorithms.

Presenter: Gergely D. Nemeth

Date: 2025-01-28 15:00 (CET)

Online: Meeting link
Creating Suspenseful Stories: Iterative Planning with Large Language Models

Authors: Kaige Xie, Mark Riedl (2024)

Article link: https://arxiv.org/abs/2402.17119

Abstract: Automated story generation has been one of the long-standing challenges in NLP. Among all dimensions of stories, suspense is very common in human-written stories but relatively under-explored in AI-generated stories. While recent advances in large language models (LLMs) have greatly promoted language generation in general, state-of-the-art LLMs are still unreliable when it comes to suspenseful story generation. We propose a novel iterative-prompting-based planning method that is grounded in two theoretical foundations of story suspense from cognitive psychology and narratology. This theory-grounded method works in a fully zero-shot manner and does not rely on any supervised story corpora. To the best of our knowledge, this paper is the first attempt at suspenseful story generation with LLMs. Extensive human evaluations of the generated suspenseful stories demonstrate the effectiveness of our method.

Presenter: Maria Hartikainen

Date: 2024-10-29 15:00 (CET)

Online: Meeting link
Describing Differences in Image Sets with Natural Language

Authors: Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy (2024)

Article link: https://openaccess.thecvf.com/content/CVPR2024/html/Dunlap_Describing_Differences_in_Image_Sets_with_Natural_Language_CVPR_2024_paper.html

Abstract: How do two sets of images differ? Discerning set-level differences is crucial for understanding model behaviors and analyzing datasets, yet manually sifting through thousands of images is impractical. To aid in this discovery process, we explore the task of automatically describing the differences between two sets of images, which we term Set Difference Captioning. This task takes in image sets DA and DB , and outputs a description that is more often true on DA than DB . We outline a two-stage approach that first proposes candidate difference descriptions from image sets and then re-ranks the candidates by checking how well they can differentiate the two sets. We introduce VisDiff, which first captions the images and prompts a language model to propose candidate descriptions, then re-ranks these descriptions using CLIP. To evaluate VisDiff, we collect VisDiffBench, a dataset with 187 paired image sets with ground truth difference descriptions. We apply VisDiff to various domains, such as comparing datasets (e.g., ImageNet vs. ImageNetV2), comparing classification models (e.g., zero-shot CLIP vs. supervised ResNet), characterizing differences between generative models (e.g., StableDiffusionV1 and V2), and discovering what makes images memorable. Using VisDiff, we are able to find interesting and previously unknown differences in datasets and models, demonstrating its utility in revealing nuanced insights.

Presenter: Piera Riccio

Date: 2024-09-24 15:00 (CEST)

Online: Meeting link
What is Beautiful is Still Good: The Attractiveness Halo Effect in the era of Beauty Filters

Authors: Aditya Gulati, Marina Martinez-Garcia, Daniel Fernandez, Miguel Angel Lozano, Bruno Lepri, Nuria Oliver (2024)

Article link: https://doi.org/10.21203/rs.3.rs-4698268/v1

Abstract: The impact of cognitive biases on decision-making in the digital world remains under-explored despite its well-documented effects in physical contexts. This study addresses this gap by investigating the attractiveness halo effect using AI-based beauty filters. We conduct a large-scale online user study involving 2,748 participants who rated facial images from a diverse set of 462 distinct individuals in two conditions: original and attractive after applying a beauty filter. Our study reveals that the same individuals receive statistically significantly higher ratings of attractiveness and other traits, such as intelligence and trustworthiness, in the attractive condition. We also study the impact of age, gender, and ethnicity and identify a weakening of the halo effect in the beautified condition, resolving conflicting findings from the literature and suggesting that filters could mitigate this cognitive bias. Finally, our findings raise ethical concerns regarding the use of beauty filters.

Presenter: Aditya Gulati

Date: 2024-07-30 15:00 (CEST)

Online: Meeting link
Pruning for feature preserving circuits in CNNs

Authors: Chris Hamblin, Talia Konkle, George Alvarez (2022)

Article link: https://arxiv.org/abs/2206.01627

Abstract: Deep convolutional neural networks are a powerful model class for a range of computer vision problems, but it is difficult to interpret the image filtering process they implement, given their sheer size. In this work, we introduce a method for extracting 'feature-preserving circuits' from deep CNNs, leveraging methods from saliency-based neural network pruning. These circuits are modular sub-functions, embedded within the network, containing only a subset of convolutional kernels relevant to a target feature. We compare the efficacy of 3 saliency-criteria for extracting these sparse circuits. Further, we show how 'sub-feature' circuits can be extracted, that preserve a feature's responses to particular images, dividing the feature into even sparser filtering processes. We also develop a tool for visualizing 'circuit diagrams', which render the entire image filtering process implemented by circuits in a parsable format.

Presenter: Julien Colin

Date: 2024-06-25 15:00 (CEST)

Online: Meeting link
Annotations from speech and heart rate: impact on multimodal emotion recognition

Authors: Kaushal Sharma and Guillaume Chanel (2023)

Article link: https://dl.acm.org/doi/10.1145/3577190.3614165

Abstract: The focus of multimodal emotion recognition has often been on the analysis of several fusion strategies. However, little attention has been paid to the effect of emotional cues, such as physiological and audio cues, on external annotations used to generate the Ground Truths (GTs). In our study, we analyze this effect by collecting six continuous arousal annotations for three groups of emotional cues: speech only, heartbeat sound only and their combination. Our results indicate significant differences between the three groups of annotations, thus giving three distinct cue-specific GTs. The relevance of these GTs is estimated by training multimodal machine learning models to regress speech, heart rate and their multimodal fusion on arousal. Our analysis shows that a cue(s)-specific GT is better predicted by the corresponding modality(s). In addition, the fusion of several emotional cues for the definition of GTs allows to reach a similar performance for both unimodal models and multimodal fusion. In conclusion, our results indicates that heart rate is an efficient cue for the generation of a physiological GT; and that combining several emotional cues for GTs generation is as important as performing input multimodal fusion for emotion prediction.

Presenter: Kaushal Sharma

Date: 2024-03-26 15:00 (CET)

Online: Meeting link
Fairness and Inclusivity in Urban Transportation Design Using Reinforcement Learning

Article link: https://alaworkshop2023.github.io/papers/ALA2023_paper_51.pdf

Abstract: Public transportation networks are the foundation of urban living. Designing transportation networks, however, is a complex task that involves physical, social, political, and legal constraints. This complexity is further compounded when considering the trade-off between efficiency and fairness. While efficient lines can boost ridership and reduce car dependency, thereby contributing to environmental sustainability, they may also prioritize densely populated central areas while neglecting other potentially underserved communities, exacerbating existing inequalities. It is therefore crucial to develop tools that address these challenges and prioritize fairness. Recent advancements in Artificial Intelligence offer promising solutions. In this presentation, I will showcase our work in using Reinforcement Learning to design public transportation networks. I will highlight the potential unfairness it can cause and propose strategies to mitigate them. Finally, I will introduce a conceptual framework aimed at fostering an inclusive design process that uses input from local communities and adapts its behaviour accordingly.

Presenter: Dimitris Michailidis

Date: 2024-02-27 15:00 (CET)

Online: Meeting link
Unprocessing Seven Years of Algorithmic Fairness

Authors: André F. Cruz, and Moritz Hardt (2023)

Article link: https://arxiv.org/abs/2306.07261

Abstract: Seven years ago, researchers proposed a postprocessing method to equalize the error rates of a model across different demographic groups. The work launched hundreds of papers purporting to improve over the postprocessing baseline. We empirically evaluate these claims through thousands of model evaluations on several tabular datasets. We find that the fairness-accuracy Pareto frontier achieved by postprocessing contains all other methods we were feasibly able to evaluate. In doing so, we address two common methodological errors that have confounded previous observations. One relates to the comparison of methods with different unconstrained base models. The other concerns methods achieving different levels of constraint relaxation. At the heart of our study is a simple idea we call unprocessing that roughly corresponds to the inverse of postprocessing. Unprocessing allows for a direct comparison of methods using different underlying models and levels of relaxation.

Presenter: André F. Cruz

Date: 2024-01-30 15:00 (CET)

Online: Meeting link
Benchmarking the Generation of Fact Checking Explanations

Authors: Daniel Russo, Serra Sinem Tekiroğlu, Marco Guerini (2023)

Article link: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00601/117871/Benchmarking-the-Generation-of-Fact-Checking

Abstract: Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of fake news produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e., news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.

Presenter: Daniel Russo

Date: 2023-11-28 15:00 (CET)

Online: Meeting link
Bias and Fairness in Large Language Models: A Survey

Authors: Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed (2023)

Article link: https://arxiv.org/abs/2309.00770

Abstract: Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

Presenter: Erik Derner

Date: 2023-10-24 15:00 (CEST)

Online: Meeting link
AI Art Curation: Re-imagining the city of Helsinki in occasion of its Biennial

Authors: Ludovica Schaerf, Pepe Ballestreros, Valentine Bernasconi, Iacopo Neri, Dario Neguerela del Castillo (2023)

Article link: https://arxiv.org/abs/2306.03753

Abstract: Art curatorial practice is characterized by the presentation of an art collection in a knowledgeable way. Machine processes are characterized by their capacity to manage and analyze large amounts of data. This paper envisages AI curation and audience interaction to explore the implications of contemporary machine learning models for the curatorial world. This project was developed for the occasion of the 2023 Helsinki Art Biennial, entitled New Directions May Emerge. We use the Helsinki Art Museum (HAM) collection to re-imagine the city of Helsinki through the lens of machine perception. We use visual-textual models to place indoor artworks in public spaces, assigning fictional coordinates based on similarity scores. We transform the space that each artwork inhabits in the city by generating synthetic 360 art panoramas. We guide the generation estimating depth values from 360 panoramas at each artwork location, and machine-generated prompts of the artworks. The result of this project is an AI curation that places the artworks in their imagined physical space, blurring the lines of artwork, context, and machine perception. The work is virtually presented as a web-based installation on this this link, where users can navigate an alternative version of the city while exploring and interacting with its cultural heritage at scale.

Presenter: Ludovica Schaerf

Date: 2023-09-26 15:00 (CEST)

Online: Meeting link
Human-Aligned Calibration for AI-Assisted Decision Making

Authors: Nina L. Corvelo Benz, Manuel Gomez Rodriguez (2023)

Article link: https://arxiv.org/abs/2306.03753

Abstract: Whenever a binary classifier is used to provide decision support, it typically provides both a label prediction and a confidence value. Then, the decision maker is supposed to use the confidence value to calibrate how much to trust the prediction. In this context, it has been often argued that the confidence value should correspond to a well calibrated estimate of the probability that the predicted label matches the ground truth label. However, multiple lines of empirical evidence suggest that decision makers have difficulties at developing a good sense on when to trust a prediction using these confidence values. In this paper, our goal is first to understand why and then investigate how to construct more useful confidence values. We first argue that, for a broad class of utility functions, there exist data distributions for which a rational decision maker is, in general, unlikely to discover the optimal decision policy using the above confidence values -- an optimal decision maker would need to sometimes place more (less) trust on predictions with lower (higher) confidence values. However, we then show that, if the confidence values satisfy a natural alignment property with respect to the decision maker's confidence on her own predictions, there always exists an optimal decision policy under which the level of trust the decision maker would need to place on predictions is monotone on the confidence values, facilitating its discoverability. Further, we show that multicalibration with respect to the decision maker's confidence on her own predictions is a sufficient condition for alignment. Experiments on four different AI-assisted decision making tasks where a classifier provides decision support to real human experts validate our theoretical results and suggest that alignment may lead to better decisions.

Presenter: Nina L. Corvelo Benz

Date: 2023-08-22 15:00 (CEST)

Online: Meeting link
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do

Authors: Patrick Schramowski, Cigdem Turan, Nico Andersen, Constantin A. Rothkopf and Kristian Kersting (2023)

Article link: https://arxiv.org/abs/2103.11790

Abstract: Artificial writing is permeating our lives due to recent advances in large-scale, transformer-based language models (LMs) such as BERT, its variants, GPT-2/3, and others. Using them as pre-trained models and fine-tuning them for specific tasks, researchers have extended state of the art for many NLP tasks and shown that they capture not only linguistic knowledge but also retain general knowledge implicitly present in the data. Unfortunately, LMs trained on unfiltered text corpora suffer from degenerated and biased behaviour. While this is well established, we show that recent LMs also contain human-like biases of what is right and wrong to do, some form of ethical and moral norms of the society —they bring a “moral direction” to surface. That is, we show that these norms can be captured geometrically by a direction, which can be computed, e.g., by a PCA, in the embedding space, reflecting well the agreement of phrases to social norms implicitly expressed in the training texts and providing a path for attenuating or even preventing toxic degeneration in LMs. Being able to rate the (non-)normativity of arbitrary phrases without explicitly training the LM for this task, we demonstrate the capabilities of the “moral direction” for guiding (even other) LMs towards producing normative text and showcase it on RealToxicityPrompts testbed, preventing the neural toxic degeneration in GPT-2.

Presenter: Mona Schirmer

Date: 2023-07-25 15:00 (CEST)

Online: Meeting link
On the Richness of Calibration

Authors: Benedikt Höltgen, Robert C Williamson (2023)

Article link: https://arxiv.org/abs/2302.04118

Abstract: Probabilistic predictions can be evaluated through comparisons with observed label frequencies, that is, through the lens of calibration. Recent scholarship on algorithmic fairness has started to look at a growing variety of calibration-based objectives under the name of multi-calibration but has still remained fairly restricted. In this paper, we explore and analyse forms of evaluation through calibration by making explicit the choices involved in designing calibration scores. We organise these into three grouping choices and a choice concerning the agglomeration of group errors. This provides a framework for comparing previously proposed calibration scores and helps to formulate novel ones with desirable mathematical properties. In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions and formally demonstrate advantages of such approaches. We also characterise the space of suitable agglomeration functions for group errors, generalising previously proposed calibration scores. Complementary to such population-level scores, we explore calibration scores at the individual level and analyse their relationship to choices of grouping. We draw on these insights to introduce and axiomatise fairness deviation measures for population-level scores. We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.

Presenter: Benedikt Höltgen

Date: 2023-06-27 15:00 (CEST)

Online: Meeting link
Latent Space Smoothing for Individually Fair Representations

Authors: Momchil Peychev, Anian Ruoss, Mislav Balunović, Maximilian Baader, Martin Vechev (2022)

Article link: https://arxiv.org/abs/2111.13650

Abstract: Fair representation learning transforms user data into a representation that ensures fairness and utility regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation learning method for certifying individual fairness of high-dimensional data. Our key insight is to leverage recent advances in generative modeling to capture the set of similar individuals in the generative latent space. This enables us to learn individually fair representations that map similar individuals close together by using adversarial training to minimize the distance between their representations. Finally, we employ randomized smoothing to provably map similar individuals close together, in turn ensuring that local robustness verification of the downstream application results in end-to-end fairness certification. Our experimental evaluation on challenging real-world image data demonstrates that our method increases certified individual fairness by up to 90% without significantly affecting task utility.

Presenter: Jonas Klesen

Date: 2023-05-30 15:00 (CEST)

Online: Meeting link
Uncalibrated Models Can Improve Human-AI Collaboration

Authors: Kailas Vodrahalli, Tobias Gerstenberg, James Zou (2022)

Article link: https://arxiv.org/abs/2202.05983

Abstract: In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of 'confidence' that the human can use to calibrate how much they depend on or trust the advice. In this paper, we present an initial exploration that suggests showing AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice). We first train a model to predict human incorporation of AI advice using data from thousands of human-AI interactions. This enables us to explicitly estimate how to transform the AI's prediction confidence, making the AI uncalibrated, in order to improve the final human prediction. We empirically validate our results across four different tasks--dealing with images, text and tabular data--involving hundreds of human participants. We further support our findings with simulation analysis. Our findings suggest the importance of jointly optimizing the human-AI system as opposed to the standard paradigm of optimizing the AI model alone.

Presenter: Kajetan Schweighofer

Date: 2023-04-25 15:00 (CEST)

Online: Meeting link
Generating stereotypes from implicitly hateful posts with Influence Functions

Abstract: Substantial progress has been made on detecting explicit forms of hate, while implicitly hateful posts containing, e.g., microaggressions and condescension, still pose a major challenge. In light of high error rates, explanations accompanying model decisions are especially important. Since implicit abuse cannot be put down to the use of an individual slur, but arises out of the wider sentence context, highlighting individual tokens as an explanation is of limited use. In this paper, we generate full-text verbalisations of stereotypes that underlie implicitly hateful posts. We test the hypothesis that providing more context to the model - such as a small set of related samples - will lower the bar for generating the implied stereotype. For a given post, instance attribution methods, such as Influence Functions, are used to source similar examples from the training data. Then BART is trained to generate the underlying stereotype from an original input and its most similar neighbours.

Presenter: Alina Leidinger

Date: 2023-03-21 15:00 (CET)

Online: Meeting link
Translating Principles into Practices of Digital Ethics: Five Risks of Being Unethical

Authors: Luciano Floridi (2019)

Article link: https://link.springer.com/article/10.1007/s13347-019-00354-x

Abstract: It has taken a very long time, 1 but today, the debate on the ethical impact and implications of digital technologies has reached the front pages of newspapers. This is understandable: digital technologies—from web-based services to Artificial Intelligence (AI) solutions—increasingly affect the daily lives of billions of people, so there are many hopes but also concerns about their design, development, and deployment.

Presenter: Adrián Arnaiz Rodriguez

Date: 2023-02-28 15:00 (CET)

Online: Meeting link
Extracting Training Data from Diffusion Models

Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace (2023)

Article link: https://arxiv.org/abs/2301.13188

Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

Presenter: Piera Riccio

Date: 2023-02-14 15:00 (CET)

Online: Meeting link
Multi Scale Ethics—Why We Need to Consider the Ethics of AI in Healthcare at Different Scales

Authors: Melanie Smallman (2022)

Article link: https://link.springer.com/article/10.1007/s11948-022-00396-z

Abstract: Many researchers have documented how AI and data driven technologies have the potential to have profound effects on our lives—in ways that make these technologies stand out from those that went before. Around the world, we are seeing a significant growth in interest and investment in AI in healthcare. This has been coupled with rising concerns about the ethical implications of these technologies and an array of ethical guidelines for the use of AI and data in healthcare has arisen. Nevertheless, the question of if and how AI and data technologies can be ethical remains open to debate. This paper aims to contribute to this debate by considering the wide range of implications that have been attributed to these technologies and asking whether current ethical guidelines take these factors into account. In particular, the paper argues that while current ethics guidelines for AI in healthcare effectively account for the four key issues identified in the ethics literature (transparency; fairness; responsibility and privacy), they have largely neglected wider issues relating to the way in which these technologies shape institutional and social arrangements. This, I argue, has given current ethics guidelines a strong focus on evaluating the impact of these technologies on the individual, while not accounting for the powerful social shaping effects of these technologies. To address this, the paper proposes a Multiscale Ethics Framework, which aims to help technology developers and ethical evaluations to consider the wider implications of these technologies.

Presenter: Kaylin Bolt

Date: 2023-01-31 15:00 (CET)

Online: Meeting link
The Grey Hoodie Project: Big Tobacco, Big Tech, and the Threat on Academic Integrity

Authors: Mohamed Abdalla, Moustafa Abdalla (2021)

Article link: https://dl.acm.org/doi/10.1145/3461702.3462563

Abstract: As governmental bodies rely on academics' expert advice to shape policy regarding Artificial Intelligence, it is important that these academics not have conflicts of interests that may cloud or bias their judgement. Our work explores how Big Tech can actively distort the academic landscape to suit its needs. By comparing the well-studied actions of another industry (Big Tobacco) to the current actions of Big Tech we see similar strategies employed by both industries. These strategies enable either industry to sway and influence academic and public discourse. We examine the funding of academic research as a tool used by Big Tech to put forward a socially responsible public image, influence events hosted by and decisions made by funded universities, influence the research questions and plans of individual scientists, and discover receptive academics who can be leveraged. We demonstrate how Big Tech can affect academia from the institutional level down to individual researchers. Thus, we believe that it is vital, particularly for universities and other institutions of higher learning, to discuss the appropriateness and the tradeoffs of accepting funding from Big Tech, and what limitations or conditions should be put in place.

Presenter: Gergely D. Németh

Date: 2023-01-17 15:00 (CET)

Online: Meeting link
What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

Authors: Julien Colin, Thomas FEL, Remi Cadene, Thomas Serre (2022)

Article link: https://openreview.net/forum?id=59pMU2xFxG

Abstract: A multitude of explainability methods has been described to try to help users better understand how modern AI systems make decisions. However, most performance metrics developed to evaluate these methods have remained largely theoretical -- without much consideration for the human end-user. In particular, it is not yet clear (1) how useful current explainability methods are in real-world scenarios; and (2) whether current performance metrics accurately reflect the usefulness of explanation methods for the end user. To fill this gap, we conducted psychophysics experiments at scale () to evaluate the usefulness of representative attribution methods in three real-world scenarios. Our results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varies widely across these scenarios. This suggests the need to move beyond quantitative improvements of current attribution methods, towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.

Presenter: Julien Colin

Date: 2022-12-13 15:00 (CET)

Online: Meeting link
BIASeD: Bringing Irrationality into Automated System Design

Authors: Aditya Gulati, Miguel Angel Lozano, Bruno Lepri, Nuria Oliver (2022)

Article link: https://arxiv.org/abs/2210.01122

Abstract: Human perception, memory and decision-making are impacted by tens of cognitive biases and heuristics that influence our actions and decisions. Despite the pervasiveness of such biases, they are generally not leveraged by today's Artificial Intelligence (AI) systems that model human behavior and interact with humans. In this theoretical paper, we claim that the future of human-machine collaboration will entail the development of AI systems that model, understand and possibly replicate human cognitive biases. We propose the need for a research agenda on the interplay between human cognitive biases and Artificial Intelligence. We categorize existing cognitive biases from the perspective of AI systems, identify three broad areas of interest and outline research directions for the design of AI systems that have a better understanding of our own biases.

Presenter: Aditya Gulati

Date: 2022-11-08 15:00 (CET)

Online: Meeting link
Understanding and Creating Art with AI: Review and Outlook

Authors: Eva Cetinic, James She (2022)

Article link: https://dl.acm.org/doi/full/10.1145/3475799

Abstract: Technologies related to artificial intelligence (AI) have a strong impact on the changes of research and creative practices in visual arts. The growing number of research initiatives and creative applications that emerge in the intersection of AI and art motivates us to examine and discuss the creative and explorative potentials of AI technologies in the context of art. This article provides an integrated review of two facets of AI and art: (1) AI is used for art analysis and employed on digitized artwork collections, or (2) AI is used for creative purposes and generating novel artworks. In the context of AI-related research for art understanding, we present a comprehensive overview of artwork datasets and recent works that address a variety of tasks such as classification, object detection, similarity retrieval, multimodal representations, and computational aesthetics, among others. In relation to the role of AI in creating art, we address various practical and theoretical aspects of AI Art and consolidate related works that deal with those topics in detail. Finally, we provide a concise outlook on the future progression and potential impact of AI technologies on our understanding and creation of art.

Presenter: Nuria Oliver

Date: 2022-07-14 15:00 (CEST)

Online: Meeting link
Psychoanalyzing artifcial intelligence: the case of Replika

Authors: Luca M. Possati (2022)

Article link: https://link.springer.com/content/pdf/10.1007/s00146-021-01379-7.pdf

Abstract: The central thesis of this paper is that human unconscious processes infuence the behavior and design of artifcial intelligence (AI). This thesis is discussed through the case study of a chatbot called Replika, which intends to provide psychological assistance and friendship but has been accused of inciting murder and suicide. Replika originated from a trauma and a work of mourning lived by its creator. The traces of these unconscious dynamics can be detected in the design of the app and the narratives about it. Therefore, a process of de-psychologization and de-humanization of the unconscious takes place through AI. This psychosocial approach helps criticize and overcome the so-called “standard model of intelligence” shared by most AI researchers. It facilitates a new interpretation of some classic problems in AI, such as control and responsibility.

Presenter: Erik Derner

Date: 2022-06-16 15:00 (CEST)

Online: Meeting link
Performative Power

Authors: Moritz Hardt, Meena Jagadeesan and Celestine Mendler-Dünner (2022)

Article link: https://arxiv.org/abs/2203.17232

Abstract: We introduce the notion of performative power, which measures the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to steer a population. We relate performative power to the economic theory of market power. Traditional economic concepts are well known to struggle with identifying anti-competitive patterns in digital platforms—a core challenge is the difficulty of defining the market, its participants, products, and prices. Performative power sidesteps the problem of market definition by focusing on a directly observable statistical measure instead. High performative power enables a platform to profit from steering participant behavior, whereas low performative power ensures that learning from historical data is close to optimal. Our first general result shows that under low performative power, a firm cannot do better than standard supervised learning on observed data. We draw an analogy with a firm being a price-taker, an economic condition that arises under perfect competition in classical market models. We then contrast this with a market where performative power is concentrated and show that the equilibrium state can differ significantly. We go on to study performative power in a concrete setting of strategic classification where participants can switch between competing firms. We show that monopolies maximize performative power and disutility for the participant, while competition and outside options decrease performative power. We end on a discussion of connections to measures of market power in economics and of the relationship with ongoing antitrust debates.

Presenter: Miriam Rateike

Date: 2022-06-02 15:00 (CEST)

Online: Meeting link
An Introduction to AI Safety

Abstract: A discussion about why powerful AI systems can be very dangerous and why it is important for everyone in the Machine Learning community to at least understand the basic problems.

Presenter: Marius Hobbhahn

Date: 2022-05-19 15:00 (CEST)

Online: Meeting link
Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction

Authors: Nina Grgic-Hlaca, Elissa M. Redmiles, Krishna P. Gummadi & Adrian Weller (2018)

Article link: https://doi.org/10.1145/3178876.3186138

Abstract: As algorithms are increasingly used to make important decisions that affect human lives, ranging from social benefit assignment to predicting risk of criminal recidivism, concerns have been raised about the fairness of algorithmic decision making. Most prior works on algorithmic fairness normatively prescribe how fair decisions ought to be made. In contrast, here, we descriptively survey users for how they perceive and reason about fairness in algorithmic decision making. A key contribution of this work is the framework we propose to understand why people perceive certain features as fair or unfair to be used in algorithms. Our framework identifies eight properties of features, such as relevance, volitionality and reliability, as latent considerations that inform people»s moral judgments about the fairness of feature use in decision-making algorithms. We validate our framework through a series of scenario-based surveys with 576 people. We find that, based on a person»s assessment of the eight latent properties of a feature in our exemplar scenario, we can accurately (> 85%) predict if the person will judge the use of the feature as fair. Our findings have important implications. At a high-level, we show that people»s unfairness concerns are multi-dimensional and argue that future studies need to address unfairness concerns beyond discrimination. At a low-level, we find considerable disagreements in people»s fairness judgments. We identify root causes of the disagreements, and note possible pathways to resolve them.

Presenter: Stratis Tsirtsis

Date: 2022-05-05 15:00 (CEST)

Online: Meeting link
Lessons for artificial intelligence from the study of natural stupidity

Authors: Alexander S. Rich & Todd M. Gureckis (2019)

Article link: https://doi.org/10.1038/s42256-019-0038-z

Abstract: Artificial intelligence and machine learning systems are increasingly replacing human decision makers in commercial, healthcare, educational and government contexts. But rather than eliminate human errors and biases, these algorithms have in some cases been found to reproduce or amplify them. We argue that to better understand how and why these biases develop, and when they can be prevented, machine learning researchers should look to the decades-long literature on biases in human learning and decision-making. We examine three broad causes of bias—small and incomplete datasets, learning from the results of your decisions, and biased inference and evaluation processes. For each, findings from the psychology literature are introduced along with connections to the machine learning literature. We argue that rather than viewing machine systems as being universal improvements over human decision makers, policymakers and the public should acknowledge that these system share many of the same limitations that frequently inhibit human judgement, for many of the same reasons.

Presenter: Aditya Gulati

Date: 2022-03-31 15:00 (CEST)

Online: Meeting link
An Introduction to Reliable and Robust AI

Abstract: An introduction to the design of reliable and robust AI systems in computer vision based on a review paper by Francesco Galati.

Presenter: Francesco Galati

Date: 2022-03-17 15:00 (CET)

Online: Meeting link
Improving human decision-making with machine learning

Authors: Hamsa Bastani, Osbert Bastani, Wichinpong Park Sinchaisri

Article link: https://arxiv.org/abs/2108.08454

Abstract: A key aspect of human intelligence is their ability to convey their knowledge to others in succinct forms. However, despite their predictive power, current machine learning models are largely blackboxes, making it difficult for humans to extract useful insights. Focusing on sequential decision-making, we design a novel machine learning algorithm that conveys its insights to humans in the form of interpretable 'tips'. Our algorithm selects the tip that best bridges the gap in performance between human users and the optimal policy. We evaluate our approach through a series of randomized controlled user studies where participants manage a virtual kitchen. Our experiments show that the tips generated by our algorithm can significantly improve human performance relative to intuitive baselines. In addition, we discuss a number of empirical insights that can help inform the design of algorithms intended for human-AI interfaces. For instance, we find evidence that participants do not simply blindly follow our tips; instead, they combine them with their own experience to discover additional strategies for improving performance.

Presenter: Putra Manggala

Date: 2022-03-03 15:00 (CET)

Online: Meeting link
Guest talk by Qualcomm AI Research: Natural Graph Networks

Authors: Pim de Haan, Taco Cohen, Max Welling (2020)

Article link: https://arxiv.org/abs/2007.08349

Abstract: On 17th February at 15.00 CET, the ELLIS Human-Centric Machine Learning reading group will host the first guest session receiving distinguished researchers from Qualcomm AI research and ELLIS Scholars. Pim de Haan, Research Associate at Qualcomm AI Research, will present his paper Natural Graph Networks, which explores how we can use the local symmetries of graphs to build more expressive graph networks. We will be further exploring the relationship between graph structured data and human-centric problems and applications in a round table made up by Manuel Gómez Rodríguez (MPI-SWS), Carlos Castillo (UPF) and Efstratios Gavves (director of the Qualcomm-UvA Deep Vision Lab). This relationship mainly arises from the fact that a lot of human interaction data is expressed as network structured data. Additionally, many advantages of GNNs, such as capturing complex structures between data or information flow, could lead to GNNs being an outstanding tool for addressing HCML problems. This session will be held online and is open to everyone interested in it.

Presenter: Pim de Hann (Qualcomm AI Research), Round Table: Manuel Gómez Rodríguez (MPI-SWS), Carlos Castillo (UPF) and Efstratios Gavves (Qualcomm-UvA Deep Vision Lab)

Date: 2022-02-17 15:00 (CET)

Online: Meeting link
CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

Authors: Mina Lee, Percy Liang, Qian Yang (2022)

Article link: https://arxiv.org/abs/2201.06796

Abstract: Large language models (LMs) offer unprecedented language generation capabilities and exciting opportunities for interaction design. However, their highly context-dependent capabilities are difficult to grasp and are often subjectively interpreted. In this paper, we argue that by curating and analyzing large interaction datasets, the HCI community can foster more incisive examinations of LMs' generative capabilities. Exemplifying this approach, we present CoAuthor, a dataset designed for revealing GPT-3's capabilities in assisting creative and argumentative writing. CoAuthor captures rich interactions between 63 writers and four instances of GPT-3 across 1445 writing sessions. We demonstrate that CoAuthor can address questions about GPT-3's language, ideation, and collaboration capabilities, and reveal its contribution as a writing 'collaborator' under various definitions of good collaboration. Finally, we discuss how this work may facilitate a more principled discussion around LMs' promises and pitfalls in relation to interaction design. The dataset and an interface for replaying the writing sessions are publicly available at https://coauthor.stanford.edu/.

Presenter: Gergely D. Németh

Date: 2022-02-03 15:00 (CET)

Online: Meeting link
Towards a Theory of Justice for Artificial Intelligence

Authors: Iason Gabriel (2021)

Article link: https://arxiv.org/abs/2110.14419

Abstract: This paper explores the relationship between artificial intelligence and principles of distributive justice. Drawing upon the political philosophy of John Rawls, it holds that the basic structure of society should be understood as a composite of socio-technical systems, and that the operation of these systems is increasingly shaped and influenced by AI. As a consequence, egalitarian norms of justice apply to the technology when it is deployed in these contexts. These norms entail that the relevant AI systems must meet a certain standard of public justification, support citizens rights, and promote substantively fair outcomes -- something that requires specific attention be paid to the impact they have on the worst-off members of society.

Presenter: Nazaal Ibrahim

Date: 2022-01-20 15:00 (CET)

Online: Meeting link
Rethinking of Marxist perspectives on big data, artificial intelligence (AI) and capitalist economic development

Authors: Nigel Waltona, Bhabani Shankar Nayak (2021)

Article link: https://www.sciencedirect.com/science/article/abs/pii/S0040162521000081

Abstract: AI and big data are not ideologically neutral scientific knowledge that drives economic development and social change. AI is a tool of capitalism which transforms our societies within an environment of technological sin- gularity that helps in the expansion of the capitalist model of economic development. Such a development process ensures the precarity of labour. This article highlights the limits of traditional Marxist conceptualisation of labour, value, property and production relations. It argues for the rethinking of Marxist perspectives on AI led economic development by focusing on conceptual new interpretation of bourgeois and proletariat in the infor- mation driven data-based society. This is a conceptual paper which critically outlines different debates and challenges around AI driven big data and its implications. It particularly focuses on the theoretical challenges faced by labour theory of value and its social and economic implications from a critical perspective. It also offers alternatives by analysing future trends and developments for the sustainable use of AI. It argues for developing policies on the use of AI and big data to protect labour, advance human development and enhance social welfare by reducing risks.

Presenter: Bhargav Srinivasa Desikan

Date: 2021-12-16 15:00 (CET)

Online: Meeting link
Machine Learning for the Developing World

Authors: De-Arteaga, M., Herlands, W., Neill, D. B., & Dubrawski, A. (2018)

Article link: https://www.sciencedirect.com/science/article/abs/pii/S0040162521000081

Abstract: Researchers from across the social and computer sciences are increasingly using machine learning to study and address global development challenges. This article examines the burgeoning field of machine learning for the developing world (ML4D). First, we present a review of prominent literature. Next, we suggest best practices drawn from the literature for ensuring that ML4D projects are relevant to the advancement of development objectives. Finally, we discuss how developing world challenges can motivate the design of novel machine learning methodologies. This article provides insights into systematic differences between ML4D and more traditional machine learning applications. It also discusses how technical complications of ML4D can be treated as novel research questions, how ML4D can motivate new research directions, and where machine learning can be most useful.

Presenter: Felix Grimberg

Date: 2021-12-02 15:00 (CET)

Online: Meeting link

Human-Centric Machine Learning (HCML) Reading Group

Meetings

iLLuMinaTE: An LLM-XAI Framework Leveraging Social Science Explanation Theories Towards Actionable Student Performance Feedback

Aligned LLMs Are Not Aligned Browser Agents

On the Privacy Risks of Algorithmic Fairness

Creating Suspenseful Stories: Iterative Planning with Large Language Models

Describing Differences in Image Sets with Natural Language

What is Beautiful is Still Good: The Attractiveness Halo Effect in the era of Beauty Filters

Pruning for feature preserving circuits in CNNs

Annotations from speech and heart rate: impact on multimodal emotion recognition

Fairness and Inclusivity in Urban Transportation Design Using Reinforcement Learning

Unprocessing Seven Years of Algorithmic Fairness

Benchmarking the Generation of Fact Checking Explanations

Bias and Fairness in Large Language Models: A Survey

AI Art Curation: Re-imagining the city of Helsinki in occasion of its Biennial

Human-Aligned Calibration for AI-Assisted Decision Making

Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do

On the Richness of Calibration

Latent Space Smoothing for Individually Fair Representations

Uncalibrated Models Can Improve Human-AI Collaboration

Generating stereotypes from implicitly hateful posts with Influence Functions

Translating Principles into Practices of Digital Ethics: Five Risks of Being Unethical

Extracting Training Data from Diffusion Models

Multi Scale Ethics—Why We Need to Consider the Ethics of AI in Healthcare at Different Scales

The Grey Hoodie Project: Big Tobacco, Big Tech, and the Threat on Academic Integrity

What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

BIASeD: Bringing Irrationality into Automated System Design

Understanding and Creating Art with AI: Review and Outlook

Psychoanalyzing artifcial intelligence: the case of Replika

Performative Power

An Introduction to AI Safety

Human Perceptions of Fairness in Algorithmic Decision Making: A Case Study of Criminal Risk Prediction

Lessons for artificial intelligence from the study of natural stupidity

An Introduction to Reliable and Robust AI

Improving human decision-making with machine learning

Guest talk by Qualcomm AI Research: Natural Graph Networks

CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities

Towards a Theory of Justice for Artificial Intelligence

Rethinking of Marxist perspectives on big data, artificial intelligence (AI) and capitalist economic development

Machine Learning for the Developing World