Towards Simulating & Evaluating User Interaction in
Information Retrieval Using Test Collections
Acta Universitatis Tamperensis, No. 1563
By Heikki Keskustalo
Tampere University Press
Distributed by Coronet Books
$77.50 Paper Original
The thesis aims at extending traditional test collection-based evaluation (TCE) experiments of information retrieval (IR) towards real life usage while remaining within the bounds of TCE. In traditional TCE there are no interactive search processes, nor explicit assumptions of users. Instead, batch-mode retrieval experiments are assumed entailing one query per topic, well-defined and relatively verbose requests, and binary relevance judgments. In real life, on the contrary, interaction is vital. The users interact with IR systems by using a trial-and-error process trying out multiple query candidates; they vary their browsing effort, and may require only few, highly relevant documents. Importantly, users as well as searching situations may differ from each other in many ways. The individual studies of the thesis focus on query-based interaction using simulations.
Two different types of interaction simulations are performed: relevance feedback (RF) and session strategy (SS) simulations. In both cases more than one query per query session is used. In RF simulations the initial query is modified by adding feedback terms gathered automatically from relevant documents observed by the simulated user. The interaction decisions include the eagerness of the user to browse the list of retrieved documents; the effort to give document-level relevance feedback, and the relevance threshold to accept a document as feedback. These attributes are justified based on literature. In SS simulations direct query reformulations are performed based on prototypical query modifications. We also introduce the concept of negative higher-order relevance, and discuss evaluation issues when interaction and graded relevance judgments are brought to the setting.
Our main experimental results suggest that mixed-quality RF is more effective than an attempt to use solely highly relevant feedback, and that sequences of very short queries are surprisingly effective in finding relevant documents. Because interaction is an essential property of system usage in real life, we suggest that in the future test collection-based IR research should not continue excluding interaction but instead bring interaction simulations into the research forefront.
Return to Coronet Books main page