Informatics Seminar on Wednesday, May 6th, 16.30 - Shengli Wu

Facoltà di scienze informatiche - Segreterie degli studi

Data d'inizio: 6 maggio 2009

Data di fine: 7 maggio 2009

The Faculty of Informatics is pleased to announce a seminar given by Shengli Wu

TITLE: A Geometric Framework for Data Fusion in Information Retrieval
SPEAKER: Shengli Wu, School of Computing and Mathematics, University of Ulster, UK
DATE: May 6th, 2009
PLACE: USI Università della Svizzera italiana, room A22, Red building (Via G. Buffi 13)
TIME: 16.30

ABSTRACT:
Data fusion in information retrieval has been investigated by many researchers and quite a few data fusion methods have been proposed, but questions such as why data fusion can bring improvement in effectiveness and what are the favourable conditions for data fusion algorithms are only partially or vaguely answered. The reason for this is the measures used for retrieval evaluation are ranking-based measures. For example, average precision, recall-level precision and almost all commonly used measures are ranking-based measures. Using any of ranking-based measures and considering the fused result for any individual query, the effectiveness of the fused result is uncertain: sometimes it is better than the average effectiveness of all component systems, sometimes it is better than the best component system, but sometimes it is worst than the worst component system.
In this talk, we use a geometric framework to formally describe data fusion, in which each component result returned from an information retrieval system for a given query is represented as a point in a multiple dimensional space and the effectiveness of every retrieval result can be evaluated by the Euclidean distance of the result to the ideal result. Then all the component results and the fused results can be explained using geometrical principles. In such a framework, data fusion becomes a deterministic problem. The effectiveness of the fused result is determined by the performances of all component results and the similarities among all component results. It becomes clear why data fusion can bring improvement in effectiveness. Several interesting features of the centroid-based data fusion method and the linear combination method can be deduced.
Some experiments have also been carried out to demonstrate the strong correlation between the Euclidean distance and ranking-based measures. Therefore, those conclusions using the Euclidean distance hold to a large extent when we use ranking-based measures.
In short, as a formal model of data fusion, this framework makes us have a thorough understanding of the nature of data fusion and use the data fusion technique more precisely and effectively

HOST: Prof. Fabio Crestani