Explore

IR-RAG @ SIGIR24

Information Retrieval's Role in RAG Systems

Find the schedule

here⁠

⁠

Find the important dates

here⁠

⁠

Find the accepted papers

here⁠

⁠

If you are a RAG-enthusiast, this is the workshop for you

Overview

In recent years, Retrieval Augmented Generation (RAG) systems have emerged as a pivotal component in the field of artificial intelligence, gaining significant attention and importance across various domains. These systems, which combine the strengths of information retrieval and generative models, have shown promise in enhancing the capabilities and performance of machine learning applications. However, despite their growing prominence, RAG systems are not without their limitations and continue to be in need of exploration and improvement. This workshop seeks to delve into the critical aspect of information retrieval and its integral role within RAG frameworks. We argue that current efforts have undervalued the role of Information Retrieval (IR) in the RAG and have concentrated their attention on the generative part. As the cornerstone of these systems, IR's effectiveness dramatically influences the overall performance and outcomes of RAG models. We call for papers that will seek to revisit and emphasize the fundamental principles underpinning RAG systems. At the end of the workshop, we aim at having a clearer understanding of how robust information retrieval mechanisms can significantly enhance the capabilities of RAG systems. Participants will engage in discussions and presentations focusing on the latest research, challenges, and potential pathways for advancing the information retrieval component within RAG systems. The workshop will serve as a platform for experts, researchers, and practitioners. We intend to foster discussions, share insights, and encourage research that underscores the vital role of Information Retrieval in the future of generative systems.

Important Dates

Submission deadline: May 9, 2024

Acceptance Notification: May 23, 2024

IR-RAG Workshop: July 18, 2024

Camera-ready versions of accepted papers due: August 5, 2024

Deadlines refer to 23:59 (11:59pm) in the AoE (Anywhere on Earth) time zone.

Call for Papers

The primary purpose of this workshop is to shift the focus onto the often-overlooked retriever mechanism of Retrieval-Augmented Generation (RAG) systems while pondering the question:

Should research in information retrieval change now that RAG systems exist?

By gathering experts, practitioners, and enthusiasts in a dedicated forum, the workshop seeks to spotlight and deliberate on challenges and innovative ideas associated with the retrieval aspect of RAG systems, aiming to foster a solid community around this critical topic. The intent is to generate a collective effort to understand and enhance the retrieval mechanisms better, ensuring they are given as much importance as the generative components. This collaborative environment will not only help in sharing knowledge and best practices but also in inspiring new research and development, ultimately leading to more effective and reliable RAG systems. Through this workshop, we aspire to build a strong foundation and a vibrant community committed to advancing the state of the art in information retrieval for RAG systems.

We invite submissions related to (but not limited to):

Use Of The Retrieved Context By The LLM:

Recent work has demonstrated that RAG systems are sensible to the order and the nature of the retrieved context. These can be considered preliminary results that pave the way for future research.

(Query) Representation Learning:

Improving how queries are represented can significantly enhance the retriever's ability to find relevant documents. This could involve using more advanced natural language processing techniques to understand the context and nuances of the query better.

Incorporating Contextual Information:

Including more context in the retrieval process can improve the relevance of the documents retrieved. This could mean taking into account the broader conversation, user preferences, or historical interactions

Updating the Document Database:

Keeping the document database up-to-date ensures that the retriever has access to the latest and most relevant information. This is particularly important for topics that are rapidly evolving.

Reducing Computational Load:

Optimizing the retriever for speed and efficiency, especially when dealing with large databases, can improve its usability in real-time applications. This might involve techniques for reducing the dimensionality of data or faster search algorithms.

Bias Mitigation:

Actively working to identify and mitigate biases in the retrieval process can improve the fairness and reliability of the retrieved content.

Cross-Lingual Retrieval Capabilities:

For systems operating in multilingual environments, improving the retriever's ability to handle and retrieve documents in various languages can enhance its effectiveness.

Multimodality:

Most of the current research has focused on textual RAG, even though multimodality is highly needed in many applications.

Other:

One of the goals of this workshop is to collect new ideas and challenges, so proposals in this sense are very much welcomed.

Submission Instructions

All submissions will be peer reviewed (double-blind) by the program committee and judged by their relevance to the workshop, especially to the main themes identified above, and their potential to generate discussion. All submission must be written in English and formatted according to the latest ACM SIG proceedings template available at http://www.acm.org/publications/proceedings-template.

Submissions must describe work that is not previously published, not accepted for publication elsewhere, and not currently under review elsewhere.

The workshop follows a double-blind reviewing process. Please note that at least one of the authors of each accepted paper must register for the workshop and present the paper.

We invite research contributions, position, demo and opinion papers. Submissions must either be short (at most 4 pages) or full papers (at most 9 pages). References do not count against the page limit.

We encourage but do not require authors to release any code and/or datasets associated with their paper.

Submission website:

https://easychair.org/conferences/?conf=irragsigir24⁠

⁠

Accepted papers

A Product-Aware Query Auto-Completion Framework for E-Commerce Search via Retrieval-Augmented Generation Method Fangzheng Sun, Tianqi Zheng, Aakash Kolekar, Rohit Patki, Ziheng Cai, David Liu, Hossein Khazaei, Xuan Guo, Ruirui Li, Yupin Huang, Dante Everaert, Hanqing Lu, Garima Petal and Monica Cheng

Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems Laura Caspari, Kanishka Ghosh Dastidar, Saber Zerhoudi, Jelena Mitrovic and Michael Granitzer

Chain-of-Thought to Enhance Document Retrieval in Certified Medical Chatbots Leonardo Sanna, Simone Magnolini, Patrizio Bellan, Saba Ghanbari Haez, Marina Segala, Monica Consolandi and Mauro Dragoni

Enhancing Fusion-in-Decoder for Multi-Granularity Ranking Haeju Park, Kyungjae Lee, Sunghyun Park and Moontae Lee

Generating Query Recommendations via LLMs Andrea Bacciu, Enrico Palumbo, Andreas Damianou, Nicola Tonellotto and Fabrizio Silvestri

Improving RAG Systems via Sentence Clustering and Reordering Marco Alessio, Guglielmo Faggioli, Nicola Ferro, Franco Maria Nardini and Raffaele Perego

Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion Anton Korikov, George Saad, Ethan Baron, Mustafa Khan, Manav Shah and Scott Sanner

PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents Saber Zerhoudi and Michael Granitzer

The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs Mert Yazan, Frederik Bungaran Ishak Situmeang and Suzan Verberne

THoRR: Complex Table Retrieval and Refinement for RAG Kihun Kim, Mintae Kim, Hokyung Lee, Seongik Park, Youngsub Han and Byoung-Ki Jeon

Towards Incorporating Personalized Context for Conversational Information Seeking Hai-Tao Yu, Lingzhen Zheng, Kaiyu Yang, Sumio Fujita and Hideo Joho

RAGSys: Item-Cold-Start Recommender as RAG System Emile Contal and Garrin McGoldrick

Schedule

Start Time

End Time

Event

Notes

9:00 AM

9:30 AM

Opening

9:30 AM

10:30 AM

Keynote speaker (1)

Nicola Tonellotto, University of Pisa

10:30 AM

11:00 AM

Coffee break

☕ ☕ ☕ ☕ ☕

11:00 AM

12:30 PM

Poster presentations

12:30 PM

1:30 PM

Lunch break

🍝 🍷 🍕 🥗 🍨

1:30 PM

2:30 PM

Keynote speaker (2)

Yuhao Zhang, Samaya AI

2:30 PM

3:00 PM

Oral presentations (1)

1) Improving RAG Systems via Sentence Clustering and Reordering 2) Multi-Aspect Reviewed-Item Retrieval via LLM Query Decomposition and Aspect Fusion

3:00 PM

3:30 PM

Refreshment break

🥪 🍯 🧀 🍎

3:30 PM

4:00 PM

Oral presentations (2)

1) PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents 2) Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems

4:00 PM

5:00 PM

Breakout Session & discussion among participants

5:00 PM

5:30 PM

Round up & concluding remarks

There are no rows in this table

⁠

Keynote Speakers

⁠

Nicola Tonellotto

Filling the Gap between Retrieval and Generation

The impressive development of deep learning in the past years has paved the way to new improvements in IR. The representation of queries and documents as dense vectors allowed us to design new solutions to many of the most important research problem in IR. The rich mathematical structure of the vector spaces where the latent representations live can be exploited to further improve the performance of IR systems. Moreover, the interplay between the latent representations of texts in LLMs and IR systems when deployed in retrieval-augmented generation (RAG) pipelines is under active investigation. This talk highlights how we can leverage representation space properties to boost IR systems, including our recent works in dense IR, and the new challenges in making IR and NLP systems work together.

⁠

Yuhao Zhang

A Tale of Two Aspects: On Robustness and Faithfulness of RAG Systems

With the wide adoption of large language models (LLMs) that offer strong contextual understanding capabilities, retrieval-augmented generation (RAG) systems have become a foundational component in many modern AI applications. While these RAG systems easily excel human beings on scalability, this often comes with the tradeoff on quality. In this talk, I will focus on our recent work that examine RAG systems on two critical aspects for real-world applications: domain robustness and faithfulness. In the first part, I am going to introduce RobustQA, a public benchmark that evaluates the domain generalizability of RAG-QA systems. I will then highlight a follow-up effort that aims at extending RobustQA to benchmark LLMs’ long-form QA capabilities. In the second part, I am going to zoom into the LLM component used in RAG systems, and focus on its faithfulness issues. I will present a recently discovered tradeoff between faithfulness and instruction following capabilities of these LLMs, and present a simple remedy that obtains the best of both worlds. I will conclude the talk by highlighting other active and important research areas related to these directions.