Estimating the intrinsic dimension of discrete-metric spaces

Staff - Faculty of Informatics

Date: 7 November 2022 / 14:30 - 15:30

USI Campus Est, room A1.05, Sector A // Online on Microsoft Teams

You can join here 

Iuri Macocco, International School of Advanced studies (SISSA), Trieste, Italy

Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In the first part of the talk, I'll introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. I'll show its accuracy on benchmark datasets and apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID -of order 2-, suggesting that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences’ space. In the second part, I'll apply the ID estimator on unweighted networks and show how this information can be used to validate generating models or, possibly, infer their parameters.

Iuri Macocco is a 4th year phd student at International School of Advanced studies (SISSA), Trieste, Italy. His research is focused on the analysis and characterization of datasets naturally described by discrete metrics through their intrinsic dimension. He will present his work carried out under the supervision of prof. A. Laio (SISSA) and J. Grilli (ICTP):

Host: Antonietta Mira