## So who won the debate?

A friend of mine got his driver’s license today. He was worried that he may not pass the driver’s test, but I kept saying he would. So his passing the test gives me a perfect opportunity to go all “I told you so!” on him. But mathematically speaking, am I justified in doing that?

Things are much easier in a deterministic world, or even in a world where all our wagers were deterministic. So let’s talk about that world for a while. Suppose your friend says he will definitely fail a test and you say he will definitely pass the test. Then it is very clear who won the debate once you know the outcome of the test. Of course, you win if your friend passes, he wins if he fails.

But the friend in question is mathematically more sophisticated. When I told him there was no need to worry and that he was going to pass the test, he didn’t say he was definitely going to fail. He said that there was a greater than 25% chance that he was going to fail.

Let’s assume, for simplicity, that I’d claimed his passing to be an absolute certainty. His claim estimated the probability of passing to a modest 75%. Now given that he did pass, who won this debate?

The answer is that it’s complicated. We can’t say that I won, because perhaps the true probability of his passing was indeed 75%, and this specific instance of the test happened to be drawn from the 75% of the instances where he does pass. Can we say that I lost? No, because perhaps the true probability was actually 100%.

The real answer is that in the middle of all these probabilities, we should not expect to have a definitive winner of the debate. Rather, all we should expect to extract from this event is a *probability* that I was the winner. A mathematically correct arbiter will start with an impartial prior probability about who’s the winner and use the outcome of the test to merely update this probability using the Bayes theorem.

I’m going to meet this friend in about 20 mins. Mathematically speaking, am I justified in saying, “I told you so!”? No. But am I going to do it? Yes.

## Closed-form formulas are overrated

tl;dr A closed-form formula is a means of expressing a variable in terms of functions that we have got names for. The set of functions that we have got names for is a pure accident of human history. Thus having a closed-form formula for an object of study is also merely an accident of human history and doesn’t say anything fundamental about the object.

**The essence of scientific investigation**

Scientists like understanding things. A good test of understanding is the ability to predict. For example, we can claim that we have understood gravity because we can predict with amazing accuracy where the moon, for example, is going to be at any given time in the future.

In the next few paragraphs, I am going to want to make extremely general claims and that will require me to talk about some very abstract concepts. So let me talk about those abstract concepts first.

Most of the things that we have tried to understand in the history of scientific investigation can be thought of as an abstract number crunching device. The moon, for example, is something that we see in the sky at a particular angle at a given time. So we can think of the moon as a device that takes time as input and turns it into a particular position in the sky. We can denote time by a number t and the position by two numbers x and y. Thus moon converts t into x and y.

The number crunching that the moon does is not arbitrary. If one observes the moon for a while, it is easy to start seeing some patterns. Some obvious patterns are immediately visible. For example, there is a certain continuity in the way it moves, i.e., its position in the sky does not change too much in a short period of time. There are other very non-obvious patterns too. These patterns, in fact, required centuries of scientific investigation to uncover.

When we are trying to understand the moon, we are trying to understand this pattern. More precisely, we want to write down a set of rules that perform the same number crunching as the moon does, i.e., if we start with a t and apply those rules on t one by one, we get an x and a y whose values match exactly with the values that the moon’s number crunching would have given us. Now, I am not claiming that understanding the relationship between x, y and t tells us everything about the moon. Of course, it doesn’t say anything about whether there is oil on the moon’s surface. But let me just use “understanding the moon” as a metaphor in the rest of the article for understanding this specific aspect of the moon’s motion.

This is not specific to the moon, by the way. Consider some other subject of investigation. For example, the flu virus. One crude way of modelling the flu virus as a number crunching device is to say that it converts time into the expected number of people infected. That’s a very high level picture and we can make the model more informative by adding some more parameters to the input. For example, say, the average temperature that year, the humidity etc. The output can also be modified. We can, for example, make the output a vector of probabilities, where probability number i tells us how likely is it that person number i will get infected by the flu virus. There could be many ways of understanding the flu virus, but once we have asked one specific question about it, we have essentially modelled it as a number crunching device that converts some set of numbers into another set of numbers.

The main challenge of scientific investigation is that we do not usually have access to the inner workings of the number crunching device under investigation. In this sense, it is a black box. We only get to see the numbers that go in and the numbers that come out. Just by observing a large number of these input-output pairs, we take up the task of figuring out what’s going on inside the black box. We know that we have figured it out if we can replicate it, i.e., once we have constructed a set of our own rules that have the same behavior as the black box.

Things get interesting once we try to understand what kinds of rules we are allowed to write. For example, do we really have to *write* those rules? Is it fine if I hire a person who knows the rules and when given a time t, always outputs the correct x and y, the x and y that the moon itself would have churned out? Is it still fine if the person I have hired only understands the rules and can replicate the correct input-output behavior but can not explain the rules to me? If that is fine, then how about creating a *machine*, instead of hiring a person, that manifests the same input-output behavior in some way? For example, may be, the machine is simply a screen with a pointer and a dial so that when you set a specific t on the dial, the pointer moves to the correct x and y coordinates on the screen? Is that fine? Or may be, the machine is just a giant rock revolving around a bigger rock so that when a person standing on the bigger rock looks up at time t, he can see the smaller rock exactly at coordinates described by the corresponding numbers x and y?

I don’t know which of the scenarios above should be considered a “valid” understanding of the moon and which ones should not. But it seems clear that there can be several different ways of “writing” the set of rules. The primitive way of doing this was to write the set of rules as a *closed-form formula.*

**What is a closed-form formula?**

x = 2t + 1 is a closed-form formula. So is x = sin(t) + cos(t).

Until high school, I was under the impression that in order to understand the moon, one was required to present some such closed-form formula, i.e., express both x and y as functions of t. But that’s an unnatural constraint.

For example, what if x was a slightly weirder function? Say, x was 2t+1 for t < 1000 and sin(t) + cos(t) for t > 1000? May be we would still accept that, mainly because there exists a conventional way of writing such *piecewise* functions in math. But what if x was something even more weird? For example, say x was equal to the smallest prime factor of t? Or may be x was something that just cannot be written in one sentence? May be x was just given by a sequence of instructions based on the value of t, so that if you started with a value of t and followed those instructions one by one, you would end up with the value of x?

The punchline of this argument is that sin(t) (or even 2t+1, for that matter) is already such a set of instructions. Just because human beings, at some point, decided to give it a name doesn’t mean it is more fundamental than any other set of instructions for converting t into x. Thus in the process of understanding the moon, one should not worry about coming up with a closed-form formula.

At the same time, it is clear that some ways of writing the rules are better than others. For example, having a moon’s life-size replica revolve around the earth’s life-size replica as your set of rules is a bit inconvenient from the point of view of making predictions.

What, then, is the “correct” way of writing the rules? I want to claim that the answer to this question can be found by understanding computation and, specifically, the area of computational complexity. But I will not make this article any longer.

## Euclidean Minimum Weight Matchings

The exact complexity of computing the minimum weight perfect bipartite matching in the Euclidean case is an open problem in computational geometry. This problem fits into the common theme of taking standard optimization problems on general weighted graphs and giving them a geometric flavor by forcing all the edge-weights to be Euclidean distances. Doing this often makes the problem easier to solve than the problem on general weighted graphs. Examples include minimum spanning tree (it’s open whether the Euclidean version can be done in linear time or not; the general version is known to take at least time) and the travelling salesman problem (the general version is hard to approximate, but the Euclidean case has a PTAS).

More formally, consider two sets and of points each in the two dimensional plane. This defines a complete weighted bipartite graph where we create a node for each point in and an edge for all and . To each edge , we assign a weight equal to the Euclidean distance between and . The question, then, is to compute the minimum weight perfect matching in this graph in time. Currently, the best known algorithm takes time where hides logarithmic factors. If we don’t care about the accuracy, it is possible to reach almost linear time, that is, there exists a near linear time algorithm that finds a -factor approximation for any . Getting a subquadratic approximation algorithm is a good sign because often approximation algorithms can be made exact by setting the appropriately, if we know something about the solution space. For example, if we know that the set of all possible weights achievable by a perfect matching is an integer in the range , we can get an exact solution by setting to be something slightly smaller than . Of course, this approach has obvious caveats, including a) that we do not know anything about the set of possible weights achievable by the perfect matchings and b) that setting to be a polynomial in will blow up the running time.

An interesting special case is when all the points are promised to belong to a integer grid. In this case an *additive* approximation algorithm is known that runs in time, being a small positive constant. Here the hides logarithmic factors in and and polynomial factors in . From now on, we will also hide the in the .

Being on an integer grid has some advantages. For example, the weight of a perfect matching, then, is the sum of square roots of integers, each in the range . Sums of square roots of integers are, for many reasons, very interesting for the algorithms community and thus have been studied extensively. It is known, for example, that for any two sets of integers each, the difference between the sum of square roots of the integers in one set and the sum of square roots of the integers in the other set is lower bounded by where is polynomial in but doubly exponential in . That doesn’t quite help us yet, because setting to be something doubly exponential in is horrible for the running time.

In a recent paper by R. Sharathkumar, this problem was circumvented with a clever trick and a time exact algorithm was shown for the case when points lie on a integer grid. The algorithm is really neat and works by combining a few ideas in the right way. One black box it uses is the fact that if instead of a complete bipartite graph in the two dimensional plane, you are given a planar graph, then the minimum weight perfect matching can be found using planar separators in time. Thus his main idea is that given the complete bipartite graph, extract from it a subset of edges such that a) the subset is planar and b) it contains the minimum weight perfect matching of the complete bipartite graph. He shows that such a subset can be found in time. To do this, he builds up on the additive approximation algorithm and uses the fact that sums of square roots of two sets of integers cannot be arbitrarily close to each other.

## Perfect matchings with a high stabbing number

Once upon a time, an idiosyncratic king set up a peculiar system for settling marriages in his kingdom. Once every year, he would invite all the couples that wished to tie the knot to a grand ceremony. Upon arrival, the couples would be taken to a large open area with chairs that were fixed to the ground spread all around and asked to get seated. The arrangements would be so made that the chairs would neither be surplus nor in shortage. Thus each individual would get exactly one chair and no more.

Finally, the king himself would arrive, examine the seated guests, draw one long line on the ground and stand on one of its two sides. That’s when the marriages would be decided. Couples where both partners were seated on the side of the line the king stood on would get married and couples separated by the holy line would be forbidden from seeing each other ever again.

The king wanted to slow down the recent exponential growth in population in his kingdom and so he wanted as few couples to be married as possible. Since he had complete knowledge of who wanted to get married to whom, he could, in principle, devise an evil arrangement of chairs and draw one really mean line that would separate most of the couples at the ceremony. On the other hand, the couples were allowed to collude with each other upon seeing the arrangement of chairs and decide who got to sit where. Thus perhaps they could formulate a clever strategy that would let most of them be on the same side as the king no matter what line he chose to draw?

Year after year passed by and the king, drawing upon the wisdom of the entire royal ministry, managed to hoodwink his people and successfully stalled most of the romance in his kingdom. The lack of expertise in computational geometry among the general public proved to be detrimental to them. The grand ceremony, having the flavor of a gripping puzzle, got the king addicted and very soon, by developing progressively sophisticated and elaborate strategies, he unknowingly brought his own kingdom to what could be described as extinction.

Centuries later, in the year 1989, two researchers, trying to design an efficient data structure to perform range searching queries on a point-set proved an interesting theorem. They weren’t aware that the theorem held the key to a centuries old conundrum that could have saved an entire kingdom from going extinct. What they proved essentially amounted to this:

*“No matter what the arrangement of chairs, the couples can always collude with each other and compute an assignment of chairs to each individual, so that no matter what line the king draws and no matter what side he stands on, at least a polynomial number of them get married.”*

In fact, they proved something even stronger. Their theorem does not so much depend on the fact that the shape the king draws is a *line*. Other geometric shapes, such as circles, rectangles, squares, triangles, can all be plugged into the theorem in place of “line” and the statement will still hold true.

As long as the shape satisfies the property that its *dual shatter function* is polynomial, the theorem works. The dual shatter function for a shape is the maximum number of cells one can get in a Venn diagram obtained by drawing of those shapes. For example, for the case of halfplanes (i.e., a line and one of its sides), one can easily show using induction on the number of lines that the dual shatter function is polynomial. Notice that when incrementally adding a halfplane to a partially built Venn diagram of halfplanes, the number of new cells created is equal to the number of cells this new halfplane’s boundary intersects. Since a new line can intersect an old line at most once, the number of cells it intersects is at most the number of lines already present. Thus the dual shatter function is . Simiarly, for any shape that satisfies the property that boundaries of two instances of the shape always intersect in a constant number of places, the dual shatter function is bounded from above by a polynomial.

Actually, the theorem does not just hold in the geometric setting. It holds for general set systems. Thus if the ceremony were organized in interstellar space with chairs occupying co-ordinates in three dimensions, or in some bizarre abstract space, a polynomial number of marriages could be saved as long as the shape chosen by the king had a polynomial dual shatter function.

(Bonus points if you can correctly guess the definition of stabbing number without looking it up.)

## Another theorem of Turán

A graph with isolated vertices has a maximum independent set of size and a complete graph has a maximum independent set of size 1. As you increase the number of edges, you should get smaller and smaller maximum independent sets.

This intuition is quantified by a theorem by Turán that says that a graph with vertices and edges has a maximum independent set of size at least .

In particular, graphs with linear number of edges, for example, planar graphs, or graphs with max. degree bounded by a constant are guaranteed to have a linear sized independent set.

Note that the theorem only says that small number of edges guarantees a large independent set. The converse is not true, i.e., a large independent set does not imply a small number of edges. Example: complete bipartite graphs. They have edges *and* an independent set of size .

Also, the theorem is constructive. So you can actually *find* the independent set in question in polynomial time.

## Being smart about distributing electricity

It turns out that the conventional way of distributing electricity is all wrong. I am talking about electricity distribution of the kind government does from the power plant to the consumers.

One of the main issues is that all the resources, the cables, the transformers, the hubs and so on, are built in order to support the peak load. But the peak load is rarely reached. In 2009, for example, 15% of the generation capacity was used less than 88 hours per year in Massachusetts. 88 hours per year! Out of the 8760 hours that a year has. Obviously, we are doing a lot of work that’s not needed.

However, we can’t really just cut down on the resources because if we do, those 88 hours of peak load will just blow everything up and we don’t want that to happen either.

Thus people have come up with an ingenious idea: control the electricity provided to the consumers such that they do not all get a large amount at the same time, thud reducing the peak load. This is done by a central hub that studies the usage pattern of different houses in the locality and schedules electricity to them accordingly. The hub can also ask the home owners to provide additional data. For example, people are usually flexible about exactly when they want to use power-consuming electric devices. So for example, the hub could ask the home owners to send a list of devices they want to use on a given day and the flexibility they are willing to accept. Next, the hub can decide the amount of electricity to provide to each house at a given time, the aim being to make sure that not many of the houses run heavy load devices at the same time.

Many other things can be done. Anything that can potentially bring down the peak load by 1-2% will save the governments a lot of money.

## Opinions and How They Change

An individual forms opinions based on how he can himself assimilate the facts around him and also based on what opinions his friends and other people he interacts with hold. This makes things complicated and intriguing enough that this has been an active area of research since decades.

One question is, can we formulate simple enough models that match with the data we get from real life experiments? If we could, then we would get some insight into human behavior and a tool for making useful predictions.

The simplest model that has been studied is this:

An opinion is just a real number. Each person starts with an initial opinion. Next, in each time step, he looks at the opinions held by his friends and updates his own opinion to the average of his old opinion and the opinions of his friends. It doesn’t have to be a simple average. A person may have different trusts for different friends and thus he might want to take a weighted average instead. However the model doesn’t allow individuals to change the weights at any step. The weights chosen in the beginning have to be the weights always.

Using simple linear algebra tricks and borrowing known results from the Markov Chain literature, it can be shown that this kind of system converges to an equilibrium in most natural cases. An equilibrium here means a set of opinions for which the averaging step doesn’t lead to any change, i.e., for all individuals, the new opinion remains the same as the old one. In fact, it can be proved that the equilibrium that’s reached is a consensus, i.e., every individual has the same opinion.

An objection to this model that one might have is the simple representation of an opinion. Can it really be represented by just a single real number?

Anyway, DeGroot, the person who introduced this model also showed that the same thing happens if the opinions are drawn from any vector space and in each time step a person updates his opinion to some convex combination of the opinions of his friends (including himself).

That’s something.

The only issue is that in real life, people don’t reach consensus. So what’s going on?

Of course, the model seems too simple to resemble real life accurately. For one, the weights (or trust) we assign to people changes over time depending on various factors. For example, if a person seems to be changing his mind every minute, we will probably assign a lesser weight to his opinion.

Also, even though this process of repeated averaging has been shown to always converge to a consensus, we don’t really know the amount of time it takes to reach there. From what I know by quickly glancing through Bernard Chazelle’s new work on bird flocking, the time taken by a community whose size is close to the population of a country to reach a consensus is probably way more than the age of the universe.

Anyway. Friedkin and Johnsen modified this model a bit to make it more realistic. In their model, an individual has a fixed internal opinion that doesn’t change with time and during an averaging step, he takes a weighted average of the opinions of his friends (including himself) and the fixed internal opinion. Because the internal opinion can be different for different people, this system will obviously not reach a consensus always.

The system does have an equilibrium though and Friedkin and Johnsen proved that the equilibrium is almost always reached.

However, their model is different from DeGroot’s simpler model in a fundamental way. Let me explain.

Given a set of numbers , the mean is the number that minimizes the function . Thus the averaging step above can be seen as a step where a person is trying to minimize the cost incurred with respect to the cost function . Here represents the neighborhood of , i.e., the set of friends of .

With the above definition of cost, we can measure the quality of a certain opinion vector. For example, we can say that the sum of costs incurred by each person is the social cost of the whole group. And then given an opinion vector, we can decide how good it is by measuring how far it is from the opinion vector that minimizes the social cost. In particular, we can measure the quality of the opinion vector that the group converges to in equilibrium.

The fundamental difference between DeGroot’s model and Friedkin and Johnsen’s model is that in DeGroot’s model, the equilibrium reached also minimizes the total social cost but in Friedkin and Johnsen’s model, it does not necessarily.

David Bindel, Jon Kleinberg and Sigal Oren prove in their FOCS ’11 paper that the situation is not that bad. Even though the total cost at equilibrium may not minimize the total social cost, it can be worse at most by a factor of 9/8. That’s pretty cool.