To master the mathematical tools for learning and prediction tasks, estimation of dependency measures, and hypothesis testing on structured domains.

Description of the course

Many real-world applications involve objects with an explicit or implicit structure. Social networks, protein-protein interaction networks, molecules, DNA sequences and syntactic tags are instances of explicitly structured data while texts, images, videos, biomedical signals are examples with implicit structure. The focus of the course is solving learning and prediction tasks, estimation of dependency measures, and hypothesis testing under this complex/structured assumption.

While in learning and prediction problems the case of structured inputs has been investigated for about three decades, structural assumption on the output side is a significantly more challenging and less understood area of statistical learning. The first part of the course provides a transversal and comprehensive overview on the recent advances and tools for this exploding field of structured output learning, including graphical models, max margin approaches as well as deep learning. The covered methods can be categorized into two sub-classes: scoring and energy-based techniques, and structured output regression algorithms.

The second part of the course gives an alternative view on the structured problem family, dealing with topics on dependency estimation and hypothesis testing. Emerging methods in these fields can not only lead to state-of-the-art algorithms in several application areas (such as blind signal separation, feature selection, outlier-robust image registration, regression problems on probability distributions), but they also come with elegant performance guarantees, complementing the regular statistical tools restricted to unstructured Euclidean domains. We are going to construct features of probability distributions which will enable us to define easy-to-estimate independence measures and distances of random variables. As a byproduct, we will get nonparametric extensions of the classical t-test (two-sample test) and the Pearson correlation test (independence test).

Prerequisites: The course requires a basic knowledge of kernel methods, graphical models, deep learning, optimization and functional analysis.

Exam: Project. Topics: link prediction, image/document understanding, drug activity prediction, molecule prediction, functional prediction, information theoretical optimization (including two-sample and independence testing).