Statistical kernel methods and their applications in machine learning

Staff - Faculty of Informatics

Start date:

End date:

You are cordially invited to attend the PhD Dissertation Defense of Somayeh DANAFAR on Tuesday, January 13th 2015 at 16h30 in room SI-006 (Informatics building)

Machine learning methods typically work independently on each individual data point which limits their ability to grasp underlying structures in the data as a whole. Assuming the data as a set of random variables drawn from a probability distribution instead, allows in principle the algorithm to capture such intrinsic structure. Statistical kernel methods embed probability distributions (sets of random variables) into a Reproducing Kernel Hilbert Space (RKHS) allowing linear statistics in RKHS while random variables have nonlinear statistical relationships in the original space. A positive definite kernel that injectively maps a probability distribution into RKHS is called a characteristic kernel. Each distribution is uniquely represented by its characteristic kernel mean in the associated RKHS. This rich probability distribution embedding takes information from higher order statistics into account, allowing for statistical inference via linear operations on the kernel means. The distance between empirical mean elements in RKHS define a distance between distributions, comparing the mean element of the joint distribution and the mean product element of marginal distributions in RKHS determines a dependence measure.

Statistical kernel methods successfully analyze high-dimensional and structured data without density estimation requirement. They are easy to implement and assure a fast convergence of the empirical statistics to their population counterparts. The prior knowledge about the most relevant statistical properties of distributions to the learning task is integrated in the definition of an adequate kernel. These attributes make statistical kernel methods a good replacement for information-theoretic approaches that have been used in distribution analysis for decades. Of the many learning problems that can be addressed by statistical kernel methods, I will focus on the following problems in this thesis:

  • defining proper characteristic kernels for structured data analysis,
  • comparing probability distributions in statistical hypothesis testing,
  • extracting independent explanatory variables of regression models in model selection, and
  • finding the most predictive subspace of a regression function with distributions as both input and output (distribution regression) while providing a concise output representation.

I will assess the efficiency of the introduced state-of-the-art statistical kernel methods by solving learning problems in computer vision, robotics, and neuroscience.

Dissertation Committee:

  • Prof. Jürgen Schmidhuber, IDSIA/Università della Svizzera italiana, Switzerland (Research Advisor)
  • Prof. Illia Horenko, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. Igor Pivkin, Università della Svizzera italiana, Switzerland (Internal Member)
  • Prof. John Shawe-Taylor, University College London (UCL), United Kingdom (External Member)
  • Prof. Gert Lanckriet, University of California San Diego (UCSD), United States (External Member)