Seminars at the Faculty of Informatics
Investigating performance and scalability for rank learning with regression tree ensembles
When ranking Web pages against user queries, a large number of signals can be leveraged to determine relevance. Signals include measures of document-query similarity, anchortext-query similarity, document-user profile similarity, PageRank, etc. Rank learning algorithms provide a coherent framework for determining the best way to combine these signals in order to maximise retrieval performance. As such, they have become a crucial component of current Information Retrieval infrastructure.
State-of-the-art rank-learning techniques discover non-linear combinations of features and are mostly based on ensembles of regression trees, using either bagged & randomised regressors (as in Random Forests) or boosted ensembles (as in Gradient-boosted methods). With an interest in both the performance and scalability of these algorithms, we investigate the importance of three different aspects: (i) the number of negative examples used to train the algorithm, (ii) the size of the subsample used to learn individual trees, and (iii) the type of objective function used to recursively partition the feature space.
Mark Carman is a senior lecturer at Monash University in Melbourne, Australia. He joined Monash in 2010 after three years as a postdoc at the University of Lugano. He received his PhD from the University of Trento in 2006 after working at the Fondazione Bruno Kessler and the Information Sciences Institute of USC. Mark works primarily in information retrieval, applying and extending statistical machine learning techniques to the modelling of users and user-generated content. He has served on the program committees of many IR/DM conferences (SIGIR, ECIR, KDD, CIKM, EMNLP, AAAI, ACML, etc.) and is an Associate Editor for TOIS.