Collecting coupons

Every time you buy a pizza from Pizza Hut, they give you a coupon selected at random from a set of {n} different coupons. How many pizzas do you need to buy on average in order to collect all possible coupons? We know from kindergarten that the answer is {\Omega(n\log n)}. Here’s an intuitive explanation why.

First, let’s look at a cool trick.

Suppose a biased coin with probability of showing heads to be {p} is being tossed repeatedly. How many times, on average, does it need to be tossed to get a heads for the first time? I promise this is related, and the relation will be clear soon.

Anyway, this is a standard geometric distribution and thus we can plug things into known formulas and get the value {1/p} for our answer. Intuitively, it does look like that should be the answer, since if we rolled a die with {1/p} faces repeatedly, we would expect to see face #1 after an average of {1/p} rolls. We can also derive the formula ourselves by writing down an infinite series and using standard summation tricks. But here’s a nice trick that can be used to get to the result immediately.

Let {E} be the expected number of tosses need. Then if the first toss shows a heads, we know that {E} is 1, and if the first toss shows a tails, we know that {E} is {1+E}. Writing this “in math” we get {E = p + (1-p)\cdot E}, which solves to {E = p}.

Anyway, back to coupon collection. Suppose we already have {x} coupons. What’s the probability that if we buy a pizza, we get a coupon that we do not already have? Clearly, {\frac{n-x}{n}}. Thus the expected number of pizzas needed to get one new coupon, from the previous calculations, is {\frac{n}{n-x}}. This means the expected number of pizzas required to collect all coupons can just be written as a the sum {\sum_{x=1}^{n-1} \frac{n}{n-x}}, which solves to {\Omega(n\log n)} after using the well-known approximation for the Harmonic number.

Which game should you play?

I give you a choice of two games to play.

In Game #1, I will keep a box in front of you that has 10 red and 10 blue balls. You will have to choose a color and then pick a ball randomly from the box. If the color you chose matches with the color of the ball you pick, I will give you $100; otherwise you will get nothing.

In Game #2, I will adversarially construct a box that will have some number of red and some number of blue balls. The rest of the game is the same. If, for example, I have a hunch that you are going to choose red as your color, I will make sure that the box contains no red balls so that you lose the game.

Which of the games should you pick? Clearly, since I am writing a blog post about this, the answer cannot be Game #1. So the answer is Game #2. But why?

Note that if you pick Game #2 and then choose your color uniformly at random from the set {red, blue}, this game becomes exactly like Game #1, no matter what the ratio of red and blue balls I adversarially choose to put in the box. Thus you can always achieve an expected profit of $50 by using this randomized strategy. In addition, if you know something about me, you can use that information to only enhance the performance. This means you should always pick Game #2.

So who won the debate?

A friend of mine got his driver’s license today. He was worried that he may not pass the driver’s test, but I kept saying he would. So his passing the test gives me a perfect opportunity to go all “I told you so!” on him. But mathematically speaking, am I justified in doing that?

Things are much easier in a deterministic world, or even in a world where all our wagers were deterministic. So let’s talk about that world for a while. Suppose your friend says he will definitely fail a test and you say he will definitely pass the test. Then it is very clear who won the debate once you know the outcome of the test. Of course, you win if your friend passes, he wins if he fails.

But the friend in question is mathematically more sophisticated. When I told him there was no need to worry and that he was going to pass the test, he didn’t say he was definitely going to fail. He said that there was a greater than 25% chance that he was going to fail.

Let’s assume, for simplicity, that I’d claimed his passing to be an absolute certainty. His claim estimated the probability of passing to a modest 75%. Now given that he did pass, who won this debate?

The answer is that it’s complicated. We can’t say that I won, because perhaps the true probability of his passing was indeed 75%, and this specific instance of the test happened to be drawn from the 75% of the instances where he does pass. Can we say that I lost? No, because perhaps the true probability was actually 100%.

The real answer is that in the middle of all these probabilities, we should not expect to have a definitive winner of the debate. Rather, all we should expect to extract from this event is a probability that I was the winner. A mathematically correct arbiter will start with an impartial prior probability about who’s the winner and use the outcome of the test to merely update this probability using the Bayes theorem.

I’m going to meet this friend in about 20 mins. Mathematically speaking, am I justified in saying, “I told you so!”? No. But am I going to do it? Yes.

Closed-form formulas are overrated

tl;dr A closed-form formula is a means of expressing a variable in terms of functions that we have got names for. The set of functions that we have got names for is a pure accident of human history. Thus having a closed-form formula for an object of study is also merely an accident of human history and doesn’t say anything fundamental about the object.

The essence of scientific investigation

Scientists like understanding things. A good test of understanding is the ability to predict. For example, we can claim that we have understood gravity because we can predict with amazing accuracy where the moon, for example, is going to be at any given time in the future.

In the next few paragraphs, I am going to want to make extremely general claims and that will require me to talk about some very abstract concepts. So let me talk about those abstract concepts first.

Most of the things that we have tried to understand in the history of scientific investigation can be thought of as an abstract number crunching device. The moon, for example, is something that we see in the sky at a particular angle at a given time. So we can think of the moon as a device that takes time as input and turns it into a particular position in the sky. We can denote time by a number t and the position by two numbers x and y. Thus moon converts t into x and y.

The number crunching that the moon does is not arbitrary. If one observes the moon for a while, it is easy to start seeing some patterns. Some obvious patterns are immediately visible. For example, there is a certain continuity in the way it moves, i.e., its position in the sky does not change too much in a short period of time. There are other very non-obvious patterns too. These patterns, in fact, required centuries of scientific investigation to uncover.

When we are trying to understand the moon, we are trying to understand this pattern. More precisely, we want to write down a set of rules that perform the same number crunching as the moon does, i.e., if we start with a t and apply those rules on t one by one, we get an x and a y whose values match exactly with the values that the moon’s number crunching would have given us. Now, I am not claiming that understanding the relationship between x, y and t tells us everything about the moon. Of course, it doesn’t say anything about whether there is oil on the moon’s surface. But let me just use “understanding the moon” as a metaphor in the rest of the article for understanding this specific aspect of the moon’s motion.

This is not specific to the moon, by the way. Consider some other subject of investigation. For example, the flu virus. One crude way of modelling the flu virus as a number crunching device is to say that it converts time into the expected number of people infected. That’s a very high level picture and we can make the model more informative by adding some more parameters to the input. For example, say, the average temperature that year, the humidity etc. The output can also be modified. We can, for example, make the output a vector of probabilities, where probability number i tells us how likely is it that person number i will get infected by the flu virus. There could be many ways of understanding the flu virus, but once we have asked one specific question about it, we have essentially modelled it as a number crunching device that converts some set of numbers into another set of numbers.

The main challenge of scientific investigation is that we do not usually have access to the inner workings of the number crunching device under investigation. In this sense, it is a black box. We only get to see the numbers that go in and the numbers that come out. Just by observing a large number of these input-output pairs, we take up the task of figuring out what’s going on inside the black box. We know that we have figured it out if we can replicate it, i.e., once we have constructed a set of our own rules that have the same behavior as the black box.

Things get interesting once we try to understand what kinds of rules we are allowed to write. For example, do we really have to write those rules? Is it fine if I hire a person who knows the rules and when given a time t, always outputs the correct x and y, the x and y that the moon itself would have churned out? Is it still fine if the person I have hired only understands the rules and can replicate the correct input-output behavior but can not explain the rules to me? If that is fine, then how about creating a machine, instead of hiring a person, that manifests the same input-output behavior in some way? For example, may be, the machine is simply a screen with a pointer and a dial so that when you set a specific t on the dial, the pointer moves to the correct x and y coordinates on the screen? Is that fine? Or may be, the machine is just a giant rock revolving around a bigger rock so that when a person standing on the bigger rock looks up at time t, he can see the smaller rock exactly at coordinates described by the corresponding numbers x and y?

I don’t know which of the scenarios above should be considered a “valid” understanding of the moon and which ones should not. But it seems clear that there can be several different ways of “writing” the set of rules. The primitive way of doing this was to write the set of rules as a closed-form formula.

What is a closed-form formula?

x = 2t + 1 is a closed-form formula. So is x = sin(t) + cos(t).

Until high school, I was under the impression that in order to understand the moon, one was required to present some such closed-form formula, i.e., express both x and y as functions of t. But that’s an unnatural constraint.

For example, what if x was a slightly weirder function? Say, x was 2t+1 for t < 1000 and sin(t) + cos(t) for t > 1000? May be we would still accept that, mainly because there exists a conventional way of writing such piecewise functions in math. But what if x was something even more weird? For example, say x was equal to the smallest prime factor of t? Or may be x was something that just cannot be written in one sentence? May be x was just given by a sequence of instructions based on the value of t, so that if you started with a value of t and followed those instructions one by one, you would end up with the value of x?

The punchline of this argument is that sin(t) (or even 2t+1, for that matter) is already such a set of instructions. Just because human beings, at some point, decided to give it a name doesn’t mean it is more fundamental than any other set of instructions for converting t into x. Thus in the process of understanding the moon, one should not worry about coming up with a closed-form formula.

At the same time, it is clear that some ways of writing the rules are better than others. For example, having a moon’s life-size replica revolve around the earth’s life-size replica as your set of rules is a bit inconvenient from the point of view of making predictions.

What, then, is the “correct” way of writing the rules? I want to claim that the answer to this question can be found by understanding computation and, specifically, the area of computational complexity. But I will not make this article any longer.

Euclidean Minimum Weight Matchings

The exact complexity of computing the minimum weight perfect bipartite matching in the Euclidean case is an open problem in computational geometry. This problem fits into the common theme of taking standard optimization problems on general weighted graphs and giving them a geometric flavor by forcing all the edge-weights to be Euclidean distances. Doing this often makes the problem easier to solve than the problem on general weighted graphs. Examples include minimum spanning tree (it’s open whether the Euclidean version can be done in linear time or not; the general version is known to take at least {\Omega(n\log n)} time) and the travelling salesman problem (the general version is hard to approximate, but the Euclidean case has a PTAS).

More formally, consider two sets {A} and {B} of {n} points each in the two dimensional plane. This defines a complete weighted bipartite graph where we create a node for each point in {A\cup B} and an edge {(a, b)} for all {a\in A} and {b\in B}. To each edge {(a, b)}, we assign a weight equal to the Euclidean distance between {a} and {b}. The question, then, is to compute the minimum weight perfect matching in this graph in {o(n^2)} time. Currently, the best known algorithm takes {\tilde{O}(n^2)} time where {\tilde{O}} hides logarithmic factors. If we don’t care about the accuracy, it is possible to reach almost linear time, that is, there exists a near linear time algorithm that finds a {(1+\epsilon)}-factor approximation for any {\epsilon > 0}. Getting a subquadratic approximation algorithm is a good sign because often approximation algorithms can be made exact by setting the {\epsilon} appropriately, if we know something about the solution space. For example, if we know that the set of all possible weights achievable by a perfect matching is an integer in the range {[1..n^2]}, we can get an exact solution by setting {\epsilon} to be something slightly smaller than {1/n^2}. Of course, this approach has obvious caveats, including a) that we do not know anything about the set of possible weights achievable by the perfect matchings and b) that setting {\epsilon} to be a polynomial in {1/n} will blow up the running time.

An interesting special case is when all the points are promised to belong to a {\Delta\times\Delta} integer grid. In this case an *additive* approximation algorithm is known that runs in {\tilde{O}(n^{3/2+\delta})} time, {\delta} being a small positive constant. Here the {\tilde{O}} hides logarithmic factors in {n} and {\Delta} and polynomial factors in {1/\epsilon}. From now on, we will also hide the {n^\delta} in the {\tilde{O}}.

Being on an integer grid has some advantages. For example, the weight of a perfect matching, then, is the sum of square roots of {2n} integers, each in the range {[0..\Delta]}. Sums of square roots of integers are, for many reasons, very interesting for the algorithms community and thus have been studied extensively. It is known, for example, that for any two sets of {n} integers each, the difference between the sum of square roots of the integers in one set and the sum of square roots of the integers in the other set is lower bounded by {1/f(n, \Delta)} where {f(n, \Delta)} is polynomial in {\Delta} but doubly exponential in {n}. That doesn’t quite help us yet, because setting {\epsilon} to be something doubly exponential in {n} is horrible for the running time.

In a recent paper by R. Sharathkumar, this problem was circumvented with a clever trick and a {\tilde{O}(n^{3/2})} time exact algorithm was shown for the case when points lie on a {\Delta\times\Delta} integer grid. The algorithm is really neat and works by combining a few ideas in the right way. One black box it uses is the fact that if instead of a complete bipartite graph in the two dimensional plane, you are given a planar graph, then the minimum weight perfect matching can be found using planar separators in {\tilde{O}(n^{3/2})} time. Thus his main idea is that given the complete bipartite graph, extract from it a subset of edges such that a) the subset is planar and b) it contains the minimum weight perfect matching of the complete bipartite graph. He shows that such a subset can be found in {\tilde{O}(n^{3/2})} time. To do this, he builds up on the additive approximation algorithm and uses the fact that sums of square roots of two sets of integers cannot be arbitrarily close to each other.

Perfect matchings with a high stabbing number

Once upon a time, an idiosyncratic king set up a peculiar system for settling marriages in his kingdom. Once every year, he would invite all the couples that wished to tie the knot to a grand ceremony. Upon arrival, the couples would be taken to a large open area with chairs that were fixed to the ground spread all around and asked to get seated. The arrangements would be so made that the chairs would neither be surplus nor in shortage. Thus each individual would get exactly one chair and no more.

Finally, the king himself would arrive, examine the seated guests, draw one long line on the ground and stand on one of its two sides. That’s when the marriages would be decided. Couples where both partners were seated on the side of the line the king stood on would get married and couples separated by the holy line would be forbidden from seeing each other ever again.

The king wanted to slow down the recent exponential growth in population in his kingdom and so he wanted as few couples to be married as possible. Since he had complete knowledge of who wanted to get married to whom, he could, in principle, devise an evil arrangement of chairs and draw one really mean line that would separate most of the couples at the ceremony. On the other hand, the couples were allowed to collude with each other upon seeing the arrangement of chairs and decide who got to sit where. Thus perhaps they could formulate a clever strategy that would let most of them be on the same side as the king no matter what line he chose to draw?

Year after year passed by and the king, drawing upon the wisdom of the entire royal ministry, managed to hoodwink his people and successfully stalled most of the romance in his kingdom. The lack of expertise in computational geometry among the general public proved to be detrimental to them. The grand ceremony, having the flavor of a gripping puzzle, got the king addicted and very soon, by developing progressively sophisticated and elaborate strategies, he unknowingly brought his own kingdom to what could be described as extinction.

Centuries later, in the year 1989, two researchers, trying to design an efficient data structure to perform range searching queries on a point-set proved an interesting theorem. They weren’t aware that the theorem held the key to a centuries old conundrum that could have saved an entire kingdom from going extinct. What they proved essentially amounted to this:

“No matter what the arrangement of chairs, the couples can always collude with each other and compute an assignment of chairs to each individual, so that no matter what line the king draws and no matter what side he stands on, at least a polynomial number of them get married.”

In fact, they proved something even stronger. Their theorem does not so much depend on the fact that the shape the king draws is a line. Other geometric shapes, such as circles, rectangles, squares, triangles, can all be plugged into the theorem in place of “line” and the statement will still hold true.

As long as the shape satisfies the property that its dual shatter function is polynomial, the theorem works. The dual shatter function for a shape is the maximum number of cells one can get in a Venn diagram obtained by drawing n of those shapes. For example, for the case of halfplanes (i.e., a line and one of its sides), one can easily show using induction on the number of lines that the dual shatter function is polynomial. Notice that when incrementally adding a halfplane to a partially built Venn diagram of halfplanes, the number of new cells created is equal to the number of cells this new halfplane’s boundary intersects. Since a new line can intersect an old line at most once, the number of cells it intersects is at most the number of lines already present. Thus the dual shatter function is O(n^2). Simiarly, for any shape that satisfies the property that boundaries of two instances of the shape always intersect in a constant number of places, the dual shatter function is bounded from above by a polynomial.

Actually, the theorem does not just hold in the geometric setting. It holds for general set systems. Thus if the ceremony were organized in interstellar space with chairs occupying co-ordinates in three dimensions, or in some bizarre abstract space, a polynomial number of marriages could be saved as long as the shape chosen by the king had a polynomial dual shatter function.

(Bonus points if you can correctly guess the definition of stabbing number without looking it up.)

Another theorem of Turán

A graph with {n} isolated vertices has a maximum independent set of size {n} and a complete graph has a maximum independent set of size 1. As you increase the number of edges, you should get smaller and smaller maximum independent sets.

This intuition is quantified by a theorem by Turán that says that a graph with {n} vertices and {e} edges has a maximum independent set of size at least {\frac{n^2}{2e+n}}.

In particular, graphs with linear number of edges, for example, planar graphs, or graphs with max. degree bounded by a constant are guaranteed to have a linear sized independent set.

Note that the theorem only says that small number of edges guarantees a large independent set. The converse is not true, i.e., a large independent set does not imply a small number of edges. Example: complete bipartite graphs. They have {\frac{n}{2}^2} edges and an independent set of size {n/2}.

Also, the theorem is constructive. So you can actually find the independent set in question in polynomial time.