Dimensionality reduction for search in high dimensions

Staff - Faculty of Informatics

Start date: 6 March 2017

End date: 7 March 2017

Speaker: Ioannis Emiris
  National Kapodistrian University of Athens, Greece
Date: Monday, March 6, 2017
Place: USI Lugano Campus, room SI-006, Informatics building (Via G. Buffi 13)
Time: 15:30

 

Abstract:

Search and retrieval in vector and metric spaces are fundamental theoretical as well as practical problems. Geometric approximation algorithms and geometric data-structures have been trying to address the curse of dimensionality in similarity search, in the past decade or so. This talk focuses on recent and current work in Approximate Nearest Neighbor Search. Tree-based methods have good practical performance but query time complexity exponential in the space dimension. The main paradigm for obtaining complexity bounds polynomial in the dimension has been Locality-Sensitive Hashing (LSH), mainly developed at MIT. We take a different tack, based on random projections of the input pointset to lower-dimensional spaces. We discuss two methods with search time complexity sublinear in the number of input points, linear complexity in the dimension and, moreover, optimal space consumption, which can be important in applications. 

Our first method is a randomised dimensionality reduction technique that generalizes the seminal Johnson-Lindenstrauss lemma and projects points to a space of significantly lower dimension, while maintaining sufficient information on proximity relations. Then, a set of approximate neighbors in the projection contain the image of an approximate neighbor in the original dataset. Our second method proposes a simple data structure for any metric space admitting an LSH family of functions; such functions are used to randomly project the input points to vertices of the 0/1 cube. The query point is similarly projected to the cube, then the algorithm examines points assigned to the same or nearby vertices. Both methods are implemented in C++ on the public domain. Experiments in up to 1000 dimensions and 1 Million points, with both synthetic and real datasets, including SIFT and GIST image collections, show that query time is about two orders of magnitude faster than brute force; compared to LSH, queries are somewhat slower, but space usage and preprocessing are more efficient.

 

Biography:

Prof. Ioannis Emiris obtained his BScEng from Princeton University in 1989, and his MSc and PhD from UC Berkeley in 1991 and 1994, respectively, all in Computer Science. He has been Tenured researcher at INRIA Sophia-Antipolis (France) since 1995, on leave since 2002, when he joined the Department of Informatics and Telecommunications at National Kapodistrian University of Athens. Since 2012 he is also Adjunct researcher at ATHENA Research & Innovation Center. His research interests include Computer Algebra, Geometric Computing, Data Science, and applications to Robotics and Bioinformatics. He is founder and director of the Lab on Geometric and Algebraic Algorithms, and has been involved in 6 European research projects, including a Sklodowska-Curie Network he currently coordinates and two FET-Open STREPs, and a number of bilateral and national research projects. He has been adviser to 11 PhD and 8 postdoctoral fellows. He is the author of one book in Computational Geometry (in Greek), co-editor of two volumes published by Springer, author of more than 100 journal articles and peer-reviewed conference publications, of which two earned the Distinguished Paper Award at ACM International Symposium on Symbolic & Algebraic Computation (ISSAC). He is Associate editor of "Journal of Symbolic Computation" (Elsevier) and "Mathematics for Computer Science" (Birkhauser), and has been guest co-editor of "Theoretical Computer Science", "Computational Geometry: Theory & Applications", and "J. Symbolic Computation". He was Program Committee Chair of ISSAC 2011, and an Invited Speaker at ISSAC 2016.

 

Host: Prof. Evanthia Papadopoulou