Our research: Theory of machine learning

Machine learning

In the last decade, machine learning has developed into a key area of technology. Its applications are everywhere, they are changing the world we are living in, and they are influencing the way scientific discoveries take place. However, machine learning algorithms are never perfect - an algorithm that always makes a correct prediction cannot exist. Rather, different algorithms are based on different assumptions, make different types of mistakes, and have different strengths and weaknesses. This is where our research comes in. Our focus is to understand machine learning algorithms from a theoretical point of view, understand their implicit mechanisms, and to give formal statistical guarantees for their performance. This enables us to reveal fundamental assumptions, biases and strenghts and weaknesses of widely used machine learning algorithms.

Explainable machine learning

The outcome of many machine learning algorithms is so complex that their behaviour is hard to understand, even for experts. The field of explainable machine learning tries to solve this problem by constructing algorithms that ``explain'' the model's behaviour. In our research group we investigate whether such explanation algorithms are reliable or not. Here are links to some example papers of our group:
  • Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts. link to arxiv
  • Explaining the Explainer: A First Theoretical Analysis of LIME, 2020 pdf

When machine learning algorithms provably fail

Most machine learning researchers work on deriving new or better machine learning algorithms. In their analysis, the focus is to demonstrate that their algorithms work well in many cases. We find it intriguing to reveil under which circumstances popular machine learning algorithms provably fail. Some example publications in this direction are:
  • Too Relaxed to Be Fair. 2020. pdf
  • When do random forests fail? 2018. pdf
  • Peer grading in a course on algorithms and data structures: machine learning algorithms do not improve over simple baselines, 2016 pdf
  • Getting lost in space: Large sample analysis of the commute distance, 2010. pdf
  • A Sober Look on Clustering Stability, 2005. pdf
  • Limits of spectral clustering, 2004. pdf

Understanding implicit assumptions and hidden biases

Machine learning algorithms always come with hidden assumptions and biases. However, these are rarely stated explicitly; they are implicit consequences of the way the algorithm has been built. Part of our work is to make these biases explicit, formalize them in the language of mathematics, and to reveal the implicit inductive principles that are invoked by particular algorithms. Some examples for this line of work are the following papers:
  • NetGAN without GAN: From Random Walks to Low-Rank Approximations, 2020 pdf
  • Explaining the Explainer: A First Theoretical Analysis of LIME, 2020 pdf
  • How the result of graph clustering methods depends on the construction of the graph, 2013. pdf
  • Phase transition in the family of p-resistances, 2011 pdf

Formal mathematical guarantees

Machine learning algorithms are never perfect - yet some of them can be proven to survive some basic suitability tests, while others do not. For example, one might be able to prove that an algorithm is statistically consistent: when the algorithm gets to see an infinite amount of data, its prediction performance is going to converge to the best possible one. Part of our work consists in giving such formal guarantees. Example papers are:
  • Foundations of Comparison-Based Hierarchical Clustering, 2019 pdf
  • Uniqueness of ordinal embedding, 2014 pdf
  • Consistency of spectral clustering, 2008 pdf

Many more directions

Our research encompasses many other directions, please simply check out our publication list.