Alexander Frummet

profile picture

About Me

My name is Alexander Frummet, and I am a PhD Student and lecturer at the Chair of Information Science at the University of Regensburg, located in the heart of Bavaria, Germany. Before starting my doctoral studies, I earned a Bachelor's degree in General and Comparative Linguistics and Information Science. I also hold a Master's degree in Media Informatics, all from the University of Regensburg.

For more information, have a look at my CV.

Research Interests

With a background in both linguistics and computer science, my research focuses on the intersection of Natural Language Processing and Information Retrieval, particularly in the area of conversational search. Under the supervision of Dr David Elsweiler, my PhD project explores the information needs that arise when users cook with a conversational assistant. I aim to understand how these needs can be automatically detected and which algorithmic and interactive approaches are best suited to meet them during an ongoing conversation.

Teaching

Information Retrieval related courses

  • Advanced Seminar in Information Retrieval
  • Tutorial on Basics of Information Retrieval
  • Seminar: Conversational Search
  • Advanced Seminar: Assistance in the Kitchen

Natural Language Processing related courses

  • Tutorial on Information Linguistics II
  • Analysing & Visualising with Python (incl. Tutorial)

Other courses

  • Case Studies II (Research Seminar)
  • Database Management Systems
  • Web Technologies
  • Tutorial on Knowledge Representation

Publications

2024

Paper:

DOI PDF

Abstract:

Conversational Agents are increasingly integrated into our daily routines, assisting us with various tasks, from simple commands such as scheduling events to more complex conversational search interactions. Such conversational search systems are traditionally evaluated with word-overlap metrics such as F1 score and accuracy. The full-day workshop on Search-Oriented Conversational Artificial Intelligence (SCAI) at CHIIR 2024 explored the evaluation of conversational search systems from the user's perspective. This interactive workshop included multiple panel discussions and working groups focused on developing and discussing innovative, user-centered evaluation methods for these systems. This paper, co-authored by both organizers and participants of the workshop, presents a summary of the insights gathered from the panel discussions and working groups.

Citation:
@article{frummet2024report,
author =                   {Alexander Frummet and Andrea Papenmeier and Maik Fr{\"o}be and Johannes Kiesel and Vaibhav Adlakha and Norbert Braunschweiler and Mateusz Dubiel and Satanu Ghosh and Marcel Gohsen and Christin Kreutz and Milad Momeni and Markus Nilles and Sachin Pathiyan Cherumanal and Abbas Pirmoradi and Paul Thomas and Johanne R. Trippas and Ines Zelch and Oleg Zendel},
editor =                   {Tirthankar Ghosal and Josiane Mothe and Juli{\'a}n Urbano},
journal =                  {{SIGIR Forum}},
publisher =                {ACM},
title =                    {{Report on the 8th Workshop on Search-Oriented Conversational Artificial Intelligence (SCAI 2024) at CHIIR 2024}},
url =                      {http://sigir.org/wp-content/uploads/2024/07/p05.pdf},
volume =                   58,
number =                   1,
year =                     2024}
Paper:

DOI PDF

Abstract:

Conversational systems are widely used for various tasks, from answering general questions to domain-specific procedural tasks, such as cooking. While the effectiveness of metrics for evaluating general question answering (QA) tasks has been extensively studied, the evaluation of procedural QA remains a challenge as we do not know what answer types users prefer in such tasks. Existing studies on metrics evaluation often focus on general QA tasks and typically limit assessments to one answer type, such as short, SQuAD-like responses or longer passages. This research aims to achieve two objectives. Firstly, it seeks to identify the desired traits of conversational QA systems in procedural tasks, particularly in the context of cooking (RQ1). Second, it assesses how commonly used conversational QA metrics align with these traits and perform across various categories of correct and incorrect answers (RQ2). Our findings reveal that users generally favour concise conversational responses, except in time-sensitive scenarios where brief, clear answers hold more value (e.g. when heating in oil). While metrics effectively identify inaccuracies in short responses, several commonly employed metrics tend to assign higher scores to incorrect conversational answers when compared to correct ones. We provide a selection of metrics that reliably detect correct and incorrect information in short and conversational answers.

Citation:
@inproceedings{frummet2024decoding,
title = {{Decoding the Metrics Maze: Navigating the Landscape of Conversational Question Answering System Evaluation in Procedural Tasks}},
author = "Frummet, Alexander  and
    Elsweiler, David",
editor = "Balloccu, Simone  and
    Belz, Anya  and
    Huidrom, Rudali  and
    Reiter, Ehud  and
    Sedoc, Joao  and
    Thomson, Craig",
booktitle = {{Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024}},
year = "2024",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.humeval-1.8",
pages = "81--90"}
Paper:

DOI PDF

Abstract:

With the emergence of voice assistants and large language models, conversational interaction with information has become part of everyday life. The eighth edition of the search-oriented conversational AI (SCAI) workshop brings together practitioners and researchers from various disciplines to discuss challenges and advances in conversational search systems. This year’s edition focuses on evaluations beyond relevance and accuracy and looks at conversational search from the user’s perspective. The workshop features a shared task on user-centered evaluation datasets and metrics, challenging participants to develop new and innovative ways to evaluate conversational search systems while accounting for the needs and preferences of users.

Citation:
@inproceedings{frummet2024scai,
author       = {Alexander Frummet and
                Andrea Papenmeier and
                Maik Fr{\"{o}}be and
                Johannes Kiesel},
editor       = {Paul D. Clough and
                Morgan Harvey and
                Frank Hopfgartner},
title        = {{The Eighth Workshop on Search-Oriented Conversational Artificial Intelligence
                (SCAI'24)}},
booktitle    = {{Proceedings of the 2024 {ACM} {SIGIR} Conference on Human Information
                Interaction and Retrieval, {CHIIR} 2024, Sheffield, United Kingdom,
                March 10-14, 2024}},
pages        = {433--435},
publisher    = {{ACM}},
year         = {2024},
url          = {https://doi.org/10.1145/3627508.3638310}}
Paper:

DOI PDF

Abstract:

Conversational agents have become increasingly integrated into our daily lives, including assisting with cooking-related tasks. To address these issues and supplement other datasets, we introduce QookA—a unique dataset featuring spoken queries, associated information needs, and answers rooted in cooking recipes. QookA overcomes shortcomings in existing datasets, laying the foundation for more effective conversational agents tailored to cooking tasks. This paper outlines the dataset construction process, analyzes the data, and explores research applications, providing a valuable resource to enhance conversational agents in the cooking domain.

Citation:
@inproceedings{frummet2024qooka,
author       = {Alexander Frummet and
                David Elsweiler},
editor       = {Paul D. Clough and
                Morgan Harvey and
                Frank Hopfgartner},
title        = {{QookA: {A} Cooking Question Answering Dataset}},
booktitle    = {{Proceedings of the 2024 {ACM} {SIGIR} Conference on Human Information
                Interaction and Retrieval, {CHIIR} 2024, Sheffield, United Kingdom,
                March 10-14, 2024}},
pages        = {406--410},
publisher    = {{ACM}},
year         = {2024},
url          = {https://doi.org/10.1145/3627508.3638311}}
Paper:

DOI PDF

Abstract:

We present two empirical studies to investigate users’ expectations and behaviours when using digital assistants, such as Alexa and Google Home, in a kitchen context: First, a survey (N = 200) queries participants on their expectations for the kinds of information that such systems should be able to provide. While consensus exists on expecting information about cooking steps and processes, younger participants who enjoy cooking express a higher likelihood of expecting details on food history or the science of cooking. In a follow-up Wizard-of-Oz study (N = 48), users were guided through the steps of a recipe either by an active wizard that alerted participants to information it could provide or a passive wizard who only answered questions that were provided by the user. The active policy led to almost double the number of conversational utterances and 1.5 times more knowledge-related user questions compared to the passive policy. Also, it resulted in 1.7 times more knowledge communicated than the passive policy. We discuss the findings in the context of related work and reveal implications for the design and use of such assistants for cooking and other purposes such as DIY and craft tasks, as well as the lessons we learned for evaluating such systems.

Citation:
@article{frummet2024cooking,
author       = {Alexander Frummet and
                Alessandro Speggiorin and
                David Elsweiler and
                Anton Leuski and
                Jeff Dalton},
title        = {{Cooking with Conversation: Enhancing User Engagement and Learning
                with a Knowledge-Enhancing Assistant}},
journal      = {{ACM} Trans. Inf. Syst.},
volume       = {42},
number       = {5},
pages        = {122:1--122:29},
year         = {2024},
url          = {https://doi.org/10.1145/3649500}}

2023

Paper:

DOI PDF

Abstract:

The applicability of retrieval algorithms to real data relies heavily on the quality of the training data. Currently, the creation process of training and test collections for retrieval systems is often based on annotations produced by human assessors following a set of guidelines. Some concepts, however, are prone to subjectivity, which could restrict the utility of any algorithm developed with the resulting data in real world applications. One such concept is credibility, which is an important factor in user’s judgements on whether retrieved information helps to answer an information need. In this paper, we evaluate anexisting set of assessmentguidelines withrespecttotheir ability to generate reliable credibility judgements across multiple raters. We identify reasons for disagreement and adapt the guidelines to create an actionable and traceable annotation scheme that i) leads to higher inter-annotator reliability, and ii) can inform about why a rater made a specific credibility judgement. We provide promising evidence about the robustness of the new guidelines and conclude that they could be a valuable resource for building future test collections for misinformation detection.

Citation:
@inproceedings{pichel2023improving,
author       = {Marcos Fern{\'{a}}ndez{-}Pichel and
                Selina Meyer and
                Markus Bink and
                Alexander Frummet and
                David E. Losada and
                David Elsweiler},
editor       = {Marinella Petrocchi and
                Marco Viviani},
title        = {{Improving the Reliability of Health Information Credibility Assessments}},
booktitle    = {{Proceedings of the 3rd Workshop on Reducing Online Misinformation
                through Credible Information Retrieval 2023 co-located with The 45th
                European Conference on Information Retrieval {(ECIR} 2023), Dublin,
                Ireland, April 2, 2023}},
series       = {{CEUR} Workshop Proceedings},
volume       = {3406},
pages        = {43--50},
publisher    = {CEUR-WS.org},
year         = {2023},
url          = {https://ceur-ws.org/Vol-3406/paper4\_jot.pdf}}

2022

Paper:

DOI PDF

Abstract:

As conversational search becomes more pervasive, it becomes increasingly important to understand the users’ underlying information needs when they converse with such systems in diverse domains. We conduct an in situ study to understand information needs arising in a home cooking context as well as how they are verbally communicated to an assistant. A human experimenter plays this role in our study. Based on the transcriptions of utterances, we derive a detailed hierarchical taxonomy of diverse information needs occurring in this context, which require different levels of assistance to be solved. The taxonomy shows that needs can be communicated through different linguistic means and require different amounts of context to be understood. In a second contribution, we perform classification experiments to determine the feasibility of predicting the type of information need a user has during a dialogue using the turn provided. For this multi-label classification problem, we achieve average F1 measures of 40% using BERT-based models. We demonstrate with examples which types of needs are difficult to predict and show why, concluding that models need to include more context information in order to improve both information need classification and assistance to make such systems usable.

Citation:
@article{frummet2022whatcani,
    author = {Alexander Frummet and
                David Elsweiler and
                Bernd Ludwig},
    title  = {"What Can {I} Cook with these Ingredients?" - Understanding Cooking-Related
                Information Needs in Conversational Search},
    journal = {ACM Trans. Inf. Syst.},
    volume = {40},
    number = {4},
    pages = {81:1--81:32},
    year = {2022},
    url = {https://doi.org/10.1145/3498330}}
Paper:

DOI PDF

Abstract:

Online retail has become a popular alternative to in-store shopping. However, unlike in traditional stores, users of online shops need to find the right product on their own without support from expert salespersons. Conversational search could provide a means to compensate for the shortcomings of traditional product search engines. To establish design guidelines for such virtual product search assistants, we studied conversations in a user study (N = 24) where experts supported users in finding the right product for their needs. We annotated the conversations concerning their content and conversational structure and identified recurring conversational strategies. Our findings show that experts actively elicit the users’ information needs using funneling techniques. They also use dialogue-structuring elements and frequently confirm having understood what the client was saying by using discourse markers, e.g., “mhm”. With this work, we contribute insights and design implications for conversational product search assistants.

Citation:
@inproceedings{papenmeier2022mhm,
author       = {Andrea Papenmeier and
                Alexander Frummet and
                Dagmar Kern},
editor       = {David Elsweiler},
title        = {{"Mhm..." - Conversational Strategies For Product Search Assistants}},
booktitle    = {{{CHIIR} '22: {ACM} {SIGIR} Conference on Human Information Interaction
                and Retrieval, Regensburg, Germany, March 14 - 18, 2022}},
pages        = {36--46},
publisher    = {{ACM}},
year         = {2022},
url          = {https://doi.org/10.1145/3498366.3505809}}

2021

Paper:

DOI PDF

Abstract:

The Future Conversations workshop at CHIIR'21 looked to the future of search, recommendation, and information interaction to ask: where are the opportunities for conversational interactions? What do we need to do to get there? Furthermore, who stands to benefit? The workshop was hands-on and interactive. Rather than a series of technical talks, we solicited position statements on opportunities, problems, and solutions in conversational search in all modalities (written, spoken, or multimodal). This paper -co-authored by the organisers and participants of the workshop- summarises the submitted statements and the discussions we had during the two sessions of the workshop. Statements discussed during the workshop are available at https://bit.ly/FutureConversations2021Statements.

Citation:
@article{spina2021report,
author       = {Damiano Spina and
                Johanne R. Trippas and
                Paul Thomas and
                Hideo Joho and
                Katriina Bystr{\"{o}}m and
                Leigh Clark and
                Nick Craswell and
                Mary Czerwinski and
                David Elsweiler and
                Alexander Frummet and
                Souvick Ghosh and
                Johannes Kiesel and
                Irene Lopatovska and
                Daniel McDuff and
                Selina Meyer and
                Ahmed Mourad and
                Paul Owoicho and
                Sachin Pathiyan Cherumanal and
                Daniel Russell and
                Laurianne Sitbon},
title        = {{Report on the future conversations workshop at {CHIIR} 2021}},
journal      = {{{SIGIR} Forum}},
volume       = {55},
number       = {1},
pages        = {6:1--6:22},
year         = {2021},
url          = {https://doi.org/10.1145/3476415.3476421}}

2020

Paper:

DOI PDF

Abstract:

Systematic and repeatable measurement of information systems via test collections, the Cranfield model, has been the mainstay of Information Retrieval since the 1960s. However, this may not be appropriate for newer, more interactive systems, such as Conversational Search agents. Such systems rely on Machine Learning technologies, which are not yet sufficiently advanced to permit true human-like dialogues, and so research can be enabled by simulation via human agents. In this work we compare dialogues obtained from two studies with the same context, assistance in the kitchen, but with different experimental setups, allowing us to learn about and evaluate conversational IR systems. We discover that users adapt their behaviour when they think they are interacting with a system and that human-like conversations in one of the studies were unpredictable to an extent we did not expect. Our results have implications for the development of new studies in this area and, ultimately, the design of future conversational agents.

Citation:
@article{elsweiler2020comparing,
author       = {David Elsweiler and
                Alexander Frummet and
                Morgan Harvey},
title        = {{Comparing Wizard of Oz {\&} Observational Studies for Conversational
                {IR} Evaluation}},
journal      = {{Datenbank-Spektrum}},
volume       = {20},
number       = {1},
pages        = {37--41},
year         = {2020},
url          = {https://doi.org/10.1007/s13222-020-00333-z}}

2019

Paper:

DOI PDF

Abstract:

Systematic and repeatable measurement of information systems via test collections, the Cranfield model, has been the mainstay of Information Retrieval since the 1960s. However, this may not be appropriate for newer, more interactive systems, such as Conversational Search agents. Such systems rely on Machine Learning technologies, which are not yet sufficiently advanced to permit true human-like dialogues, and so research can be enabled by simulation via human agents. In this work we compare dialogues obtained from two studies with the same context, assistance in the kitchen, but with different experimental setups, allowing us to learn about and evaluate conversational IR systems. We discover that users adapt their behaviour when they think they are interacting with a system and that human-like conversations in one of the studies were unpredictable to an extent we did not expect. Our results have implications for the development of new studies in this area and, ultimately, the design of future conversational agents.

Citation:
@inproceedings{frummet2019detecting,
author       = {Alexander Frummet and
                David Elsweiler and
                Bernd Ludwig},
editor       = {Mehwish Alam and
                Valerio Basile and
                Felice Dell'Orletta and
                Malvina Nissim and
                Nicole Novielli},
title        = {{Detecting Domain-specific Information needs in Conversational Search
                Dialogues}},
booktitle    = {{Proceedings of the 3rd Workshop on Natural Language for Artificial
                Intelligence co-located with the 18th International Conference of
                the Italian Association for Artificial Intelligence {(AIIA} 2019),
                Rende, Italy, November 19th-22nd, 2019}},
series       = {{CEUR} Workshop Proceedings},
volume       = {2521},
publisher    = {CEUR-WS.org},
pages        = {2:1--2:15},
year         = {2019},
url          = {https://ceur-ws.org/Vol-2521/paper-02.pdf},
timestamp    = {Fri, 10 Mar 2023 16:23:01 +0100},
biburl       = {https://dblp.org/rec/conf/aiia/FrummetEL19.bib},
bibsource    = {dblp computer science bibliography, https://dblp.org}}

Academic Service

Reviewer

  • ACM Transactions on Information Systems (TOIS)
  • Information Processing and Management (IPM)
  • JASIST
  • ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24)
  • ACM CHI Conference on Human Factors in Computing Systems (CUI '22)

Local Organiser

  • ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR '22)

Invited Talks

2021

2020

Contact