Tips for writing a Bachelor / Master thesis

This page is permanently under construction and not exhaustive. Whenever an important point occurs to me I am going to add it.

How long should it be?

A BSc thesis around 20-30 pages, a MSc thesis between 30 and 50 pages, in a standard page style. It can be shorter, in particular if it is a theoretical thesis. If you have many plots or many references, it can become a bit longer. But I am not going to read 100 text pages, for sure.

Outline

The outline of your thesis can be quite flexible. But it has to ask a concrete question in the beginning, explain your approach to answer this question (think about whether your approach is really appropriate!), and give a concrete answer in the end.

By which criteria do I evaluate your thesis?

The most important aspect of a thesis is NOT that you prove your own new theorem, or you invent your own new algorithm. It is supposed to demonstrate that you can argue scientifically. My focus of evaluation is on the following aspects: Is the question well formulated? Did he/she attack the question in a reasonable way, obeying scientific standards? How did he/she evaluate the results? Did he argue in a correct way? Did he/she see the limitations of the own approach? Are the results convincingly described and interpreted? Just a tiny fraction of the overall mark has to do with ``novelty" or ``originality" in the sense of coming up with new ideas / proofs / algorithms.

German or english?

I do not care, as long as the language is correct and understandable. If you realize that you have difficulty to express yourself in English, it might be better to write in German. For people whose English is already good, but who still want to improve, I warmly recommend the following book: Joseph Willimas, Joseph Bizup: Style: lessons in clarity and grace.

How plots are supposed to look like

A plot always has axis labels in a readable font and a title (describing parameter settings, for example). A plot always has a caption, and this caption needs to contain a concise summary about what the plot shows. The plot should be understandable without flipping back and forth between text and plot. If the plot shows experimental results, the caption should summarize the setup of the experiment (e.g., parameter choices), so that one would be able to reproduce the plot. The interpretation/discussion of the results in the plot typically end up in the main text, not in the caption. If possible, plots should be black-and-white readable (if I print it on my black-and-white printer, I should be able to distinguish different curves).

References

Use bibtex to generate references and use the natbib style in the author-year format:
\usepackage[round,comma]{natbib}\bibliographystyle{plainnat}
This is important so people who know the field do not have to look up every single reference in the list of references. In the reference list, try to cite journal or conference papers, not technical reports or arxiv-preprints (unless the paper only exists in this form). If you cite a book, always mention the section / page / theorem you refer to (so the reader can find it quickly in the book if he/she wants to look it up). When you download bibtex references from google scholar, you will need to hand-edit many of them, they are often not accurate! Please also check out the comments on the paper writing page.

Evaluate your algorithms thoroughly

In many theses, you will have to implement an algorithm and evaluate its performance. My experience shows that nearly all students underestimate what this means. The goal of such an evaluation is NOT to show that it works in simple cases. The goal of an evaluation is to really test in which cases an algorithm it works, in which cases it does not work, whether you can break its performance, how it behaves when you change its parameters and when you choose different types of input data. Note that I do not talk about a correct implementation here. I talk about the all the questions that arise once you have a correct implementation.

Example: Suppose you come up with a new algorithm to find clusters in a graph and you want to evaluate its performance and compare to the state of the art. Then what you do is the following:
  • In the beginning you might just want to play with your algorithm and data sets to gain intuition about its behavior (exploratory phase). But at some point, you will have to formulate a concrete question that you want to answer by simulations, and think about the most appropriate setup to answer that question. Both the question and the choice of the setup should be discussed in the thesis.
  • You generate lots of toy graphs for which you know the ground truth (the cluster structure). To this end, you try to choose a large variety of such graphs that cover lots of different properties a graph could have:hidden partition model (expander graph, small world), k-nearest neighbor graph (no expander, long paths), preferential attachment graph (power law behavior), and so on.
  • In all these models, you vary lots of parameters (number of sample points, number of clusters, clusters of different sizes or densities, dimensionality of your data points, ... )
  • Then you run your algorithm and you systematically play with the parameters your algorithm has.
  • You think about various different criteria to evaluate the result of an algorithm on a data set (error with respect to ground truth, cut size, balancedness, running time, etc)
  • You generate lots (!) of plots (!), not tables, that systematically show the evaluation criteria against the parameters and changes in the model. Take your time to think about what and how to visualize.
  • Up to here, everything is pretty automatic, but now the work starts: you actually look at your plots. For each scenario you think about what would be the result you had expected, and whether this is what you see in the plot. If yes, good. If no, you start thinking, debugging, understanding. Often, if plots do not follow your expectation, you still have bugs in your code. If you are reasonably sure that this is not the problem, then there might be something about the algorithm that you still have to understand. This implies that you are not done once you produced 100 plots and put them in the thesis. You need to tell me why they are interesting, whether they show what you expected, what I am supposed to conclude from them. This, in my opinion, is the one of the most important part of your thesis.
  • In your thesis, you might just describe a condensed version of all your results. What is the most important message? I am not going to look at 100 plots, pick the most relevant ones.

How much time do I need to grade your thesis?

If you need to receive the grade for your thesis by a certain date, then you have to inform me well in advance. Typically, I need a time window of about four weeks to read and grade your thesis (longer during vacation time).