Generative-IR 2024

Accepted Papers

Schedule

Explore

Gen-IR@SIGIR24

The Second Workshop on Generative Information Retrieval

⁠

Accepted papers

here⁠

⁠

Overview

Generative information retrieval (Gen-IR) is a fast-growing interdisciplinary research area that investigates how to leverage advances in generative Artificial Intelligence (AI) to improve information retrieval systems. Gen-IR has attracted interest from the information retrieval, natural language processing, and machine learning communities, among others. Since the dawn of Gen-IR last year, there has been an explosion of Gen-IR systems that have launched and are now widely used. Interest in this area across academia and industry is only expected to continue to grow as new research challenges and application opportunities arise. The goal of this workshop, The Second Workshop on Generative Information Retrieval (Gen-IR @ SIGIR 2024) is to provide an interactive venue for exploring a broad range of foundational and applied Gen-IR research. The workshop will focus on tasks such as generative document retrieval, grounded answer generation, generative recommendation, and generative knowledge graphs, all through the lens of model training, model behavior, and broader issues.

The Gen-IR@SIGIR24 workshop will be held as a full-day onsite event in conjunction with

SIGIR 2024⁠

We are aware that Generative IR means different things to different people. For this reason, we provide the following definitions to standardize key terminology.

Some Definitions

Some Definitions

Term

Acronym

Defintion

Generative Information Retrieval

Gen-IR

Find information in a corpus of documents (the web, wikipedia, movies, medical records, ...) given a particular query, in a generative way. “generative” can be understood in lot of ways: including, but not limited to, some mentioned in the topics below. We tentatively define four main fields below.

Generative Document Retrieval

GDR

Given a query, retrieve a ranked list of existing documents via an encoder-decoder architecture. Oftentimes this involves a custom/learned indexing strategy.

Grounded Answer Generation

GAG

Retrieve a human readable generated answer that matches a query; the answer can link to or refer to a document.

Generative Recommendation

Generate a set of recommendations in an auto-regressive or diffusion fashion. Research covers non-sequential and sequential recommendation on the user-item matrix, on its embedding or on its graph representation.

Generative Knowledge Graph

GKG

Given a query, generate a knowledge graph with a LLM or use a graph as a knowledge-base for a LLM.

Pretrained (Large) Language Model

PLM

Language models that are trained in a conventional self-supervised manner with no particular downstream task (e.g. GPT, T5, BART).

There are no rows in this table

⁠

Beyond these definitions, we provide the following list of papers that exemplify what generative Information Retrieval is all about.

⁠

https://github.com/gabriben/awesome-generative-information-retrieval⁠

⁠

Call for Papers

We invite submissions related to Generative IR, including (but not limited to):

Models

Pre-training: custom architectures and pre-training strategies to learn GDR and GAR models from scratch

Fine-tuning: retrieval-focused training regimes that leverage existing pre-trained model architectures (e.g., GPT, T5, BART, etc.)

Incorporating information about ranking, entity disambiguation, causal relationships between ranking tasks at pre-training or fine-tuning time

Training

Generalization / transfer learning: how to adapt Gen-IR models to different search tasks

Incremental learning (or continual learning): how to develop Gen-IR systems that can adapt to dynamic corpora (i.e., how to add and remove documents from a model-indexed corpus)

Training objectives: what training objectives, such as those from Learning to Rank, can be used in conjunction with, or in addition to, standard seq2seq objectives. Retrieval enhanced LLMs are also related to the training regime. Alternatively, in a post-hoc manner, LLMs’s generations can be augmented with attribution (citing sources).

Document identifiers: what are strategies for representing documents in Gen-IR models (e.g., unstructured atomic identifiers, semantically structured identifiers, article titles, URLs, etc.)

Evaluation

Empirical evaluation of GDR or GAR on a diverse range of information retrieval data sets under different scenarios (zero-shot, few-shot, fine-tuned, etc.)

Metrics design for Gen-IR systems

Human evaluation design and / or interfaces for Gen-IR systems

Interpretability and causality (e.g., attribution), uncertainty estimates

OOD and adversarial robustness

Efficiency-focused evaluations

IR perspectives on truthfulness, harmlessness, honesty, helpfulness, etc. in GAR models

Applications and other IR fields

Prompt engineering

Summarization, fact verification, recommender systems, learning to rank, ...

Applications of Gen-IR to custom corpora (e.g., medical reports, academic papers, etc.)

Applications of Gen-IR in practical / industry settings (e.g., finance, streaming platforms, etc.)

Submission Instructions

All submissions will be peer reviewed (double-blind) by the program committee and judged by their relevance to the workshop, especially to the main themes identified above, and their potential to generate discussion. All submission must be written in English and formatted according to the latest ACM SIG proceedings template available at

http://www.acm.org/publications/proceedings-template⁠

. In the header, add

\documentclass[sigconf,natbib=true,anonymous=true]{acmart}

We accept submissions that were previously on arXiv or that got rejected from the main SIGIR conference. The workshop proceedings will be purely digital and non-archival.

The workshop follows a double-blind reviewing process. Please note that at least one of the authors of each accepted paper must register for the workshop and present the paper either remote or on location (strongly preferred).

We invite research contributions, position, demo and opinion papers. Submissions must either be short (at most 4 pages) or full papers (at most 9 pages). References do not count against the page limit. We also allow for an unlimited number of pages for appendices in the same PDF.

We encourage but do not require authors to release any code and/or datasets associated with their paper.

Submission website:

OpenReview⁠

⁠

Important Dates

Time zone:

Anywhere on Earth (AoE)⁠

⁠

Submission deadline

Acceptance notification

SIGIR Gen-IR 2024 workshop

April 25, 2024 ➡ May 2nd 2024

May 23, 2024 ➡May 30, 2024

July 18, 2024

The Team

⁠

Gabriel Bénédict⁠

⁠

Amazon

Spain

⁠

⁠

Ruqing Zhang⁠

⁠

ICT, Chinese Academy of Sciences

China

⁠

Donald Metzler⁠

⁠

Google Research

USA

⁠

Andrew Yates⁠

⁠

University of Amsterdam

The Netherlands

⁠

Ziyan Jiang⁠

⁠

Amazon

USA

Overview

Some Definitions

Call for Papers

Submission Instructions

Important Dates

The Team

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.