Machine Learning Theory: Crowdsourcing algorithms and their statistical analysis

Seminar MSc students, WS 2016/17 by Ulrike von Luxburg and Debarghya Ghoshdastidar

Schedule for presentation

Date: February 16, 2017 (Thursday) Venue: A104 (Sand)

09:30 - 10:00 Introduction by Ulrike
10:00 - 11:00 Paper presentation
Shah, Zhou, Peres, Approval Voting and Incentives in Crowdsourcing. ICML 2015.
Presenter: Vivian Fresen / Main Questioner: Robert Geirhos
11:00 - 12:00 Paper presentation
Zou, Chaudhuri, Kalai: Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons. 2015.
Presenter: Konstantin Lübeck / Main Questioner: Alisa Volkert
12:00 - 13:00 Lunch break
13:00 - 14:00 Paper presentation
Oh, Thekumparampil, Xu: Collaboratively learning preferences from ordinal data NIPS 2015.
Presenter: Mahdi Sadeghi / Main Questioner: David Hildner
14:00 - 15:00 Paper presentation
Heckel, Shah, Ramchandran, Wainwright: Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don’t Help. Arxiv, 2016.
Presenter: Magdalena Sannwald / Main Questioner: Sebastian Penhouet
15:00 - 15:30 Coffee break
15:30 - 16:30 Paper presentation
Shah, Zhou: No Oops, You Wont Do It Again: Mechanisms for Self-correction in Crowdsourcing. ICML 2016.
Presenter: Kanghyun Yu / Main Questioner: Vivian Fresen

Date: February 17, 2017 (Friday) Venue: A104 (Sand)

10:00 - 11:00 Paper presentation
Mozafari, Sarkar, Franklin, Jordan, Madden: Scaling up crowd-sourcing to very large datasets: a case for active learning.
Presenter: David Hildner / Main Questioner: Mahdi Sadeghi
11:00 - 12:00 Paper presentation
Lahouti, Hassibi: Fundamental Limits of Budget-Fidelity Trade-off in Label Crowdsourcing. NIPS 2016.
Presenter: Alisa Volkert / Main Questioner: Konstantin Lübeck
12:00 - 13:00 Lunch break
13:00 - 14:00 Paper presentation
Steinhardt, Valiant, Charikar: Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction. NIPS 2016.
Presenter: Robert Geirhos / Main Questioner: Kanghyun Yu
14:00 - 15:00 Paper presentation
Christiano: Provably Manipulation-Resistant Reputation Systems COLT 2016.
Presenter: Sebastian Penhouet / Main Questioner: Magdalena Sannwald

Crowdsourcing is a popular mechanism by which humans are involved in the generation of machine learning data or evaluation of machine learning results. In this seminar we study many algorithms in this domain. In particular, we are interested in the theoretical properties of such algorithms, eg. what kind of guarantees can we give on the outcome of machine learning algorithms, how many people do we have to involve in order to have enough data, etc.

The second, rather high-level intention of this seminar is to learn about, get used to and practice scientific work.

How

Each student gets assigned one main paper in the first meeting. At the end of the semester everybody has to give an oral presentation of the paper.
Writing reviews: before being published, scientific papers go through a peer review process. We will learn how such a review is supposed to look like, and practice to write a review. By the middle of the semester, everybody has to hand in a written seminar essay about the paper. It is supposed to summarize the contents, evaluate the scientific impact of the work, and provide a scientific review. In this essay, you will also have to judge the scientific impact of a paper. This is not so easy, in particular if you are new to the field. We will learn what are the tricks and tools to get at least some idea about it.
Peer review: It is a standard part of the scientific process to give reviews and be reviewed. We will do the same with the essays: around the middle of the semester, every student has to review the essays of about three other students. In the same way, everybody gets feedback about his/her essay by the reviews of the others.
Scientific discussions: Critically discussing scientific results is an important part of science, and it is similarly important to get used to ask questions in a talk (in a lecture as well, as a matter of fact). We are going to practice this in our block seminar. For each session, we will have a session chair who leads the discussion, the person who presents the talk, an "opponent" who plays the role of a devil's advocate (and who has read the paper as well), and many questions from the remaining participants.

Time plan

We hold the main part of the seminar as a block seminar at the end of the winter term, with a couple of intermediate meetings.

October 20, 8:15 - 10:00 (A104 Sand): first meeting - to discuss organization and distribute the work
November 17, 10:15 - 12:00 (A104 Sand): meeting - how scientific publications and peer reviews work, and guidelines for the reviews to be submitted; also everyone is paired to discuss slides and second paper allotment
November 28: submit reviews for example paper (send PDF via email)
Example paper: Jain, Jamieson, Nowak, Finite Sample Prediction and Recovery Bounds for Ordinal Embedding. NIPS 2016.
December 1, 10:15 - 12:00 (A104 Sand): meeting - to discuss the example paper and the reviews, and guidelines for the presentation
December 20: submit reviews for main paper (send PDF via email)
January 13: submit first version of the slides (send PDF via email); after this, everyone discusses slides in pairs
February 16 - 17 (A104 Sand): all presentations as a block seminar

Prerequisite

It will be helpful to have some background knowledge in machine learning. You should be interested in theory, all the papers are going to have a theoretical focus. Your MSc program can be in computer science or maths or related areas.

Machine Learning Theory: Crowdsourcing algorithms and their statistical analysis

Schedule for presentation

Contents

How

Time plan

Prerequisite