Home About Research Publications Lectures Teaching Academic Collaborators Personalia

Metric learning and similarity-sensitive hashing

Multimodal data comparison

Multi-modal medical images are incomparable as apples and oranges.

Similarity is one of the most fundamental notions in many problems, especially in image sciences such as computer vision and pattern recognition. The need to quantify similarity or dissimilarity of some data is central to broad categories of problems involving comparison, search, matching, or reconstruction. For example, in content-based image or video retrieval, similarity between images and videos or their representation is used to rank the matches. In object detection and recognition, one of the classical problems in computer vision, similarity of regions in an image to some object model is used to decide whether there is an object there or not. Finally, different types of inverse problems encountered in engineering fields involve a criterion of similarity between the observed data and the data one tries to estimate.

The notion of similarity is problem- and application-dependent, and in many cases, very hard or impossible to model, since the structure of the data space is very far from being Euclidean. The data can further come from multiple modalities, such as images captured using infrared and visible spectrum imaging devices, different medical imaging modalities such as CT, MRI, and PET, or be represented using different representation or different versions of the same representation. Such data might be generated by unrelated physical processes, have distinct statistics, dimensionality, and structure. The need to compute similarity across modalities arises in many important problems such as data fusion from different sensors, medical image alignment, and comparison of different versions and representations. An attempt to directly compare objects belonging to different modalities (e.g., T1- and T2-weighted MRI images) is similar to comparing apples to oranges, as the modalities are often incommensurable.

In many cases, examples of similarity on a subset of the data are available. For instance, it is possible to acquire an image of a known object using different imaging devices and have examples of how the same object looks like in infrared and visible light. In such cases, similarity can be learned in a supervised manner, by generalizing the similarity given on a training set of examples. In particular, if the data in the problem manifest some variability or can undergo a certain class of transformation, generating examples of transformed data allows learning the invariance to such transformations. Embedding into some metric space can be used in this case to parametrize the learned data similarity. In this class of similarity- or metric-learning problems, metric geometry approaches are very intimately related to machine learning algorithms. In particular, embedding into the Hamming space can be considered as a weak binary classification problem, and constructed using boosting techniques widely used in machine learning.


  • C. Strecha, A. M. Bronstein, M. M. Bronstein, P. Fua, "LDAHash: improved matching with smaller descriptors", IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), Vol. 34/1, pp. 66-78, January 2012.

  • M. M. Bronstein, A. M. Bronstein, F. Michel, N. Paragios, "Data fusion through cross-modality metric learning using similarity-sensitive hashing", Proc. Computer Vision and Pattern Recognition (CVPR), 2010.

  • A. M. Bronstein, M. M. Bronstein, M. Ovsjanikov, L. J. Guibas, "WaldHash: sequential similarity-preserving hashing", Techn. Report CIS-2010-03, Dept. of Computer Science, Technion, Israel, May 2010.

  • A. M. Bronstein, M. M. Bronstein, R. Kimmel, "The Video Genome", arXiv:1003.5320v1, 27 March 2010.

  • See also

  • Multimodal hashing (CVPR trailer video)

  • Video Genome