The First Workshop on Generative Information Retrieval
Generative IR has experienced substantial growth across multiple research communities (e.g., information retrieval, computer vision, natural language processing, and machine learning), and has been highly visible in the popular press. Theoretical, empirical, and actual user-facing products have been released that retrieve documents (via generation) or directly generate answers given an input request.
We would like to investigate whether end-to-end generative models are just another trend or, as some claim, a paradigm change for IR. This involves new metrics, theoretical grounding, evaluation methods, task definitions, models, user interfaces, etc.
The goal of this workshop is to focus on previously explored Generative IR techniques like document retrieval and direct response generation, while also offering a venue for the discussion and exploration of how Generative IR can be applied to domains like recommendation systems, summarization, etc. The format of the workshop is interactive, including roundtable and keynote sessions.
The Gen-IR@SIGIR23 workshop will be held as a full-day onsite event in conjunction with
We are aware that Generative IR means different things to different people. For this reason, we provide the following definitions to standardize key terminology.
Generative Information Retrieval
Find information in a corpus of documents (the web, wikipedia, movies, medical records, ...) given a particular query, in a generative way. “generative” can be understood in lot of ways: including, but not limited to, some mentioned in the topics below. We tentatively define two main fields below, Generative Document Retrieval and Grounded Answer Generation
Generative Document Retrieveal
Given a query, retrieve a ranked list of existing documents via an encoder-decoder architecture. Oftentimes this involves a custom/learned indexing strategy.
Grounded Answer Generation
Retrieve a human readable generated answer that matches a query; the answer can link to or refer to a document.
Pretrained (Large) Language Model
Language models that are trained in a conventional self-supervised manner with no particular downstream task (e.g. GPT, T5, BART)
There are no rows in this table
Beyond these definitions, we provide the following list of papers that exemplify what generative Information Retrieval is all about.
We invite submissions related to Generative IR, including (but not limited to):
Pre-training: custom architectures and pre-training strategies to learn GDR and GAR models from scratch
Fine-tuning: retrieval-focused training regimes that leverage existing pre-trained model architectures (e.g., GPT, T5, BART, etc.)
Incorporating information about ranking, entity disambiguation, causal relationships between ranking tasks at pre-training or fine-tuning time
Generalization / transfer learning: how to adapt Gen-IR models to different search tasks
Incremental learning (or continual learning): how to develop Gen-IR systems that can adapt to dynamic corpora (i.e., how to add and remove documents from a model-indexed corpus)
Training objectives: what training objectives, such as those from Learning to Rank, can be used in conjunction with, or in addition to, standard seq2seq objectives
Document identifiers: what are strategies for representing documents in Gen-IR models (e.g., unstructured atomic identifiers, semantically structured identifiers, article titles, URLs, etc.)
Empirical evaluation of GDR or GAR on a diverse range of information retrieval data sets under different scenarios (zero-shot, few-shot, fine-tuned, etc.)
Metrics design for Gen-IR systems
Human evaluation design and / or interfaces for Gen-IR systems
Interpretability and causality (e.g., attribution), uncertainty estimates
OOD and adversarial robustness
IR perspectives on truthfulness, harmlessness, honesty, helpfulness, etc. in GAR models
Applications and other IR fields
Summarization, fact verification, recommender systems, learning to rank, ...
Applications of Gen-IR to custom corpora (e.g., medical reports, academic papers, etc.)
Applications of Gen-IR in practical / industry settings (e.g., finance, streaming platforms, etc.)
All submissions will be peer reviewed (double-blind) by the program committee and judged by their relevance to the workshop, especially to the main themes identified above, and their potential to generate discussion. All submission must be written in English and formatted according to the latest ACM SIG proceedings template available at
Submissions must describe work that is not previously published, not accepted for publication elsewhere, and not currently under review elsewhere. We accept submissions that were previously on arXiv or that got rejected from the main SIGIR conference. The workshop proceedings will be purely digital and non-archival.
The workshop follows a double-blind reviewing process. Please note that at least one of the authors of each accepted paper must register for the workshop and present the paper either remote or on location (strongly preferred).
We invite research contributions, position and opinion papers. Submissions must either be short (at most 4 pages) or full papers (at most 9 pages). References do not count against the page limit. We also allow for an unlimited number of pages for appendices in the same PDF.
We encourage but do not require authors to release any code and/or datasets associated with their paper.