Efficient and scalable solution techniques for small data learning problems

Staff - Faculty of Informatics

Date: 19 June 2023 / 11:00 - 13:30

Online

You are cordially invited to attend the PhD Dissertation Defence of Edoardo Vecchi on Monday 19 June 2023 at 11:00 online.

Abstract:
Classification problems in the small data regime are hindered by the discrepancy between the size of the data statistics T and the usually larger dimension of the feature space D. In this context, the common machine learning (ML) and deep learning (DL) tools tend to show a lack of robustness, quickly overfitting the training data and ultimately achieving a poor performance on the test set. To address this issue, we propose two methods extending and complementing the Scalable Probabilistic Approximation (SPA) framework: the advanced entropic Scalable Probabilistic Approximation (eSPA+) and the regularized Scalable Probabilistic Approximation (rSPA). Our algorithms are aimed at addressing standard learning problems -- i.e., binary classification and image denoising, respectively -- which are further exacerbated in the small data regime and by the presence of concept drift. In the case of eSPA+, for each of the subproblems solved by the algorithm at every iteration, we prove the existence of closed-form solutions that lead to a linear iteration-cost scaling. This allows the proposed algorithm -- when applied to a wide range of both synthetic and real-world problems -- to dramatically outperform its state-of-the-art competitors from ML and DL both in terms of classification quality and computational cost. On the other hand, the novelty of the rSPA algorithm consists in the simultaneous solution of image segmentation and denoising problems, which results in its successful application to the denoising of extremely noisy computed tomography (CT) images in the low- and ultra-low-radiation regime. The rSPA algorithm is also supplemented by a further extension based on domain decomposition -- named DD-rSPA -- which allows the solution on a commodity laptop of denoising problems which would otherwise require expensive hardware and facilities. Finally -- in order to assess the real potential of eSPA+ in the solution of small data problems -- we present a case study based on stock forecasting. Specifically, we show that the proposed method can formulate accurate predictions by discriminating the relevant classification features from the redundant information, and that its easily interpretable output can significantly support the solution of financial decision-making problems.

Dissertation Committee:
- Prof. Illia Horenko, Università della Svizzera italiana, Switzerland (Research Advisor)
- Prof. Igor Pivkin, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Olaf Schenk, Università della Svizzera italiana, Switzerland (Internal Member)
- Prof. Michael Bortz, Fraunhofer ITWM, Germany (External Member)
- Prof. Rupert Klein, Freie Universität Berlin, Germany (External Member)