Machine Learning Theory: Crowdsourcing algorithms and their statistical analysis

Seminar MSc students, WS 2016/17 by Ulrike von Luxburg and Debarghya Ghoshdastidar

Schedule for presentation

Date: February 16, 2017 (Thursday)     Venue: A104 (Sand)
Date: February 17, 2017 (Friday)     Venue: A104 (Sand)


Crowdsourcing is a popular mechanism by which humans are involved in the generation of machine learning data or evaluation of machine learning results. In this seminar we study many algorithms in this domain. In particular, we are interested in the theoretical properties of such algorithms, eg. what kind of guarantees can we give on the outcome of machine learning algorithms, how many people do we have to involve in order to have enough data, etc.

The second, rather high-level intention of this seminar is to learn about, get used to and practice scientific work.


  • Each student gets assigned one main paper in the first meeting. At the end of the semester everybody has to give an oral presentation of the paper.
  • Writing reviews: before being published, scientific papers go through a peer review process. We will learn how such a review is supposed to look like, and practice to write a review. By the middle of the semester, everybody has to hand in a written seminar essay about the paper. It is supposed to summarize the contents, evaluate the scientific impact of the work, and provide a scientific review. In this essay, you will also have to judge the scientific impact of a paper. This is not so easy, in particular if you are new to the field. We will learn what are the tricks and tools to get at least some idea about it.
  • Peer review: It is a standard part of the scientific process to give reviews and be reviewed. We will do the same with the essays: around the middle of the semester, every student has to review the essays of about three other students. In the same way, everybody gets feedback about his/her essay by the reviews of the others.
  • Scientific discussions: Critically discussing scientific results is an important part of science, and it is similarly important to get used to ask questions in a talk (in a lecture as well, as a matter of fact). We are going to practice this in our block seminar. For each session, we will have a session chair who leads the discussion, the person who presents the talk, an "opponent" who plays the role of a devil's advocate (and who has read the paper as well), and many questions from the remaining participants.

Time plan

We hold the main part of the seminar as a block seminar at the end of the winter term, with a couple of intermediate meetings.
  • October 20, 8:15 - 10:00 (A104 Sand): first meeting - to discuss organization and distribute the work
  • November 17, 10:15 - 12:00 (A104 Sand): meeting - how scientific publications and peer reviews work, and guidelines for the reviews to be submitted; also everyone is paired to discuss slides and second paper allotment
  • November 28: submit reviews for example paper (send PDF via email)
    Example paper: Jain, Jamieson, Nowak, Finite Sample Prediction and Recovery Bounds for Ordinal Embedding. NIPS 2016.
  • December 1, 10:15 - 12:00 (A104 Sand): meeting - to discuss the example paper and the reviews, and guidelines for the presentation
  • December 20: submit reviews for main paper (send PDF via email)
  • January 13: submit first version of the slides (send PDF via email); after this, everyone discusses slides in pairs
  • February 16 - 17 (A104 Sand): all presentations as a block seminar


It will be helpful to have some background knowledge in machine learning. You should be interested in theory, all the papers are going to have a theoretical focus. Your MSc program can be in computer science or maths or related areas.