## Closed-form formulas are overrated

tl;dr A closed-form formula is a means of expressing a variable in terms of functions that we have got names for. The set of functions that we have got names for is a pure accident of human history. Thus having a closed-form formula for an object of study is also merely an accident of human history and doesn’t say anything fundamental about the object.

**The essence of scientific investigation**

Scientists like understanding things. A good test of understanding is the ability to predict. For example, we can claim that we have understood gravity because we can predict with amazing accuracy where the moon, for example, is going to be at any given time in the future.

In the next few paragraphs, I am going to want to make extremely general claims and that will require me to talk about some very abstract concepts. So let me talk about those abstract concepts first.

Most of the things that we have tried to understand in the history of scientific investigation can be thought of as an abstract number crunching device. The moon, for example, is something that we see in the sky at a particular angle at a given time. So we can think of the moon as a device that takes time as input and turns it into a particular position in the sky. We can denote time by a number t and the position by two numbers x and y. Thus moon converts t into x and y.

The number crunching that the moon does is not arbitrary. If one observes the moon for a while, it is easy to start seeing some patterns. Some obvious patterns are immediately visible. For example, there is a certain continuity in the way it moves, i.e., its position in the sky does not change too much in a short period of time. There are other very non-obvious patterns too. These patterns, in fact, required centuries of scientific investigation to uncover.

When we are trying to understand the moon, we are trying to understand this pattern. More precisely, we want to write down a set of rules that perform the same number crunching as the moon does, i.e., if we start with a t and apply those rules on t one by one, we get an x and a y whose values match exactly with the values that the moon’s number crunching would have given us. Now, I am not claiming that understanding the relationship between x, y and t tells us everything about the moon. Of course, it doesn’t say anything about whether there is oil on the moon’s surface. But let me just use “understanding the moon” as a metaphor in the rest of the article for understanding this specific aspect of the moon’s motion.

This is not specific to the moon, by the way. Consider some other subject of investigation. For example, the flu virus. One crude way of modelling the flu virus as a number crunching device is to say that it converts time into the expected number of people infected. That’s a very high level picture and we can make the model more informative by adding some more parameters to the input. For example, say, the average temperature that year, the humidity etc. The output can also be modified. We can, for example, make the output a vector of probabilities, where probability number i tells us how likely is it that person number i will get infected by the flu virus. There could be many ways of understanding the flu virus, but once we have asked one specific question about it, we have essentially modelled it as a number crunching device that converts some set of numbers into another set of numbers.

The main challenge of scientific investigation is that we do not usually have access to the inner workings of the number crunching device under investigation. In this sense, it is a black box. We only get to see the numbers that go in and the numbers that come out. Just by observing a large number of these input-output pairs, we take up the task of figuring out what’s going on inside the black box. We know that we have figured it out if we can replicate it, i.e., once we have constructed a set of our own rules that have the same behavior as the black box.

Things get interesting once we try to understand what kinds of rules we are allowed to write. For example, do we really have to *write* those rules? Is it fine if I hire a person who knows the rules and when given a time t, always outputs the correct x and y, the x and y that the moon itself would have churned out? Is it still fine if the person I have hired only understands the rules and can replicate the correct input-output behavior but can not explain the rules to me? If that is fine, then how about creating a *machine*, instead of hiring a person, that manifests the same input-output behavior in some way? For example, may be, the machine is simply a screen with a pointer and a dial so that when you set a specific t on the dial, the pointer moves to the correct x and y coordinates on the screen? Is that fine? Or may be, the machine is just a giant rock revolving around a bigger rock so that when a person standing on the bigger rock looks up at time t, he can see the smaller rock exactly at coordinates described by the corresponding numbers x and y?

I don’t know which of the scenarios above should be considered a “valid” understanding of the moon and which ones should not. But it seems clear that there can be several different ways of “writing” the set of rules. The primitive way of doing this was to write the set of rules as a *closed-form formula.*

**What is a closed-form formula?**

x = 2t + 1 is a closed-form formula. So is x = sin(t) + cos(t).

Until high school, I was under the impression that in order to understand the moon, one was required to present some such closed-form formula, i.e., express both x and y as functions of t. But that’s an unnatural constraint.

For example, what if x was a slightly weirder function? Say, x was 2t+1 for t < 1000 and sin(t) + cos(t) for t > 1000? May be we would still accept that, mainly because there exists a conventional way of writing such *piecewise* functions in math. But what if x was something even more weird? For example, say x was equal to the smallest prime factor of t? Or may be x was something that just cannot be written in one sentence? May be x was just given by a sequence of instructions based on the value of t, so that if you started with a value of t and followed those instructions one by one, you would end up with the value of x?

The punchline of this argument is that sin(t) (or even 2t+1, for that matter) is already such a set of instructions. Just because human beings, at some point, decided to give it a name doesn’t mean it is more fundamental than any other set of instructions for converting t into x. Thus in the process of understanding the moon, one should not worry about coming up with a closed-form formula.

At the same time, it is clear that some ways of writing the rules are better than others. For example, having a moon’s life-size replica revolve around the earth’s life-size replica as your set of rules is a bit inconvenient from the point of view of making predictions.

What, then, is the “correct” way of writing the rules? I want to claim that the answer to this question can be found by understanding computation and, specifically, the area of computational complexity. But I will not make this article any longer.

## Euclidean Minimum Weight Matchings

The exact complexity of computing the minimum weight perfect bipartite matching in the Euclidean case is an open problem in computational geometry. This problem fits into the common theme of taking standard optimization problems on general weighted graphs and giving them a geometric flavor by forcing all the edge-weights to be Euclidean distances. Doing this often makes the problem easier to solve than the problem on general weighted graphs. Examples include minimum spanning tree (it’s open whether the Euclidean version can be done in linear time or not; the general version is known to take at least time) and the travelling salesman problem (the general version is hard to approximate, but the Euclidean case has a PTAS).

More formally, consider two sets and of points each in the two dimensional plane. This defines a complete weighted bipartite graph where we create a node for each point in and an edge for all and . To each edge , we assign a weight equal to the Euclidean distance between and . The question, then, is to compute the minimum weight perfect matching in this graph in time. Currently, the best known algorithm takes time where hides logarithmic factors. If we don’t care about the accuracy, it is possible to reach almost linear time, that is, there exists a near linear time algorithm that finds a -factor approximation for any . Getting a subquadratic approximation algorithm is a good sign because often approximation algorithms can be made exact by setting the appropriately, if we know something about the solution space. For example, if we know that the set of all possible weights achievable by a perfect matching is an integer in the range , we can get an exact solution by setting to be something slightly smaller than . Of course, this approach has obvious caveats, including a) that we do not know anything about the set of possible weights achievable by the perfect matchings and b) that setting to be a polynomial in will blow up the running time.

An interesting special case is when all the points are promised to belong to a integer grid. In this case an *additive* approximation algorithm is known that runs in time, being a small positive constant. Here the hides logarithmic factors in and and polynomial factors in . From now on, we will also hide the in the .

Being on an integer grid has some advantages. For example, the weight of a perfect matching, then, is the sum of square roots of integers, each in the range . Sums of square roots of integers are, for many reasons, very interesting for the algorithms community and thus have been studied extensively. It is known, for example, that for any two sets of integers each, the difference between the sum of square roots of the integers in one set and the sum of square roots of the integers in the other set is lower bounded by where is polynomial in but doubly exponential in . That doesn’t quite help us yet, because setting to be something doubly exponential in is horrible for the running time.

In a recent paper by R. Sharathkumar, this problem was circumvented with a clever trick and a time exact algorithm was shown for the case when points lie on a integer grid. The algorithm is really neat and works by combining a few ideas in the right way. One black box it uses is the fact that if instead of a complete bipartite graph in the two dimensional plane, you are given a planar graph, then the minimum weight perfect matching can be found using planar separators in time. Thus his main idea is that given the complete bipartite graph, extract from it a subset of edges such that a) the subset is planar and b) it contains the minimum weight perfect matching of the complete bipartite graph. He shows that such a subset can be found in time. To do this, he builds up on the additive approximation algorithm and uses the fact that sums of square roots of two sets of integers cannot be arbitrarily close to each other.

## Perfect matchings with a high stabbing number

Once upon a time, an idiosyncratic king set up a peculiar system for settling marriages in his kingdom. Once every year, he would invite all the couples that wished to tie the knot to a grand ceremony. Upon arrival, the couples would be taken to a large open area with chairs that were fixed to the ground spread all around and asked to get seated. The arrangements would be so made that the chairs would neither be surplus nor in shortage. Thus each individual would get exactly one chair and no more.

Finally, the king himself would arrive, examine the seated guests, draw one long line on the ground and stand on one of its two sides. That’s when the marriages would be decided. Couples where both partners were seated on the side of the line the king stood on would get married and couples separated by the holy line would be forbidden from seeing each other ever again.

The king wanted to slow down the recent exponential growth in population in his kingdom and so he wanted as few couples to be married as possible. Since he had complete knowledge of who wanted to get married to whom, he could, in principle, devise an evil arrangement of chairs and draw one really mean line that would separate most of the couples at the ceremony. On the other hand, the couples were allowed to collude with each other upon seeing the arrangement of chairs and decide who got to sit where. Thus perhaps they could formulate a clever strategy that would let most of them be on the same side as the king no matter what line he chose to draw?

Year after year passed by and the king, drawing upon the wisdom of the entire royal ministry, managed to hoodwink his people and successfully stalled most of the romance in his kingdom. The lack of expertise in computational geometry among the general public proved to be detrimental to them. The grand ceremony, having the flavor of a gripping puzzle, got the king addicted and very soon, by developing progressively sophisticated and elaborate strategies, he unknowingly brought his own kingdom to what could be described as extinction.

Centuries later, in the year 1989, two researchers, trying to design an efficient data structure to perform range searching queries on a point-set proved an interesting theorem. They weren’t aware that the theorem held the key to a centuries old conundrum that could have saved an entire kingdom from going extinct. What they proved essentially amounted to this:

*“No matter what the arrangement of chairs, the couples can always collude with each other and compute an assignment of chairs to each individual, so that no matter what line the king draws and no matter what side he stands on, at least a polynomial number of them get married.”*

In fact, they proved something even stronger. Their theorem does not so much depend on the fact that the shape the king draws is a *line*. Other geometric shapes, such as circles, rectangles, squares, triangles, can all be plugged into the theorem in place of “line” and the statement will still hold true.

As long as the shape satisfies the property that its *dual shatter function* is polynomial, the theorem works. The dual shatter function for a shape is the maximum number of cells one can get in a Venn diagram obtained by drawing of those shapes. For example, for the case of halfplanes (i.e., a line and one of its sides), one can easily show using induction on the number of lines that the dual shatter function is polynomial. Notice that when incrementally adding a halfplane to a partially built Venn diagram of halfplanes, the number of new cells created is equal to the number of cells this new halfplane’s boundary intersects. Since a new line can intersect an old line at most once, the number of cells it intersects is at most the number of lines already present. Thus the dual shatter function is . Simiarly, for any shape that satisfies the property that boundaries of two instances of the shape always intersect in a constant number of places, the dual shatter function is bounded from above by a polynomial.

Actually, the theorem does not just hold in the geometric setting. It holds for general set systems. Thus if the ceremony were organized in interstellar space with chairs occupying co-ordinates in three dimensions, or in some bizarre abstract space, a polynomial number of marriages could be saved as long as the shape chosen by the king had a polynomial dual shatter function.

(Bonus points if you can correctly guess the definition of stabbing number without looking it up.)

## Another theorem of Turán

A graph with isolated vertices has a maximum independent set of size and a complete graph has a maximum independent set of size 1. As you increase the number of edges, you should get smaller and smaller maximum independent sets.

This intuition is quantified by a theorem by Turán that says that a graph with vertices and edges has a maximum independent set of size at least .

In particular, graphs with linear number of edges, for example, planar graphs, or graphs with max. degree bounded by a constant are guaranteed to have a linear sized independent set.

Note that the theorem only says that small number of edges guarantees a large independent set. The converse is not true, i.e., a large independent set does not imply a small number of edges. Example: complete bipartite graphs. They have edges *and* an independent set of size .

Also, the theorem is constructive. So you can actually *find* the independent set in question in polynomial time.

## Being smart about distributing electricity

It turns out that the conventional way of distributing electricity is all wrong. I am talking about electricity distribution of the kind government does from the power plant to the consumers.

One of the main issues is that all the resources, the cables, the transformers, the hubs and so on, are built in order to support the peak load. But the peak load is rarely reached. In 2009, for example, 15% of the generation capacity was used less than 88 hours per year in Massachusetts. 88 hours per year! Out of the 8760 hours that a year has. Obviously, we are doing a lot of work that’s not needed.

However, we can’t really just cut down on the resources because if we do, those 88 hours of peak load will just blow everything up and we don’t want that to happen either.

Thus people have come up with an ingenious idea: control the electricity provided to the consumers such that they do not all get a large amount at the same time, thud reducing the peak load. This is done by a central hub that studies the usage pattern of different houses in the locality and schedules electricity to them accordingly. The hub can also ask the home owners to provide additional data. For example, people are usually flexible about exactly when they want to use power-consuming electric devices. So for example, the hub could ask the home owners to send a list of devices they want to use on a given day and the flexibility they are willing to accept. Next, the hub can decide the amount of electricity to provide to each house at a given time, the aim being to make sure that not many of the houses run heavy load devices at the same time.

Many other things can be done. Anything that can potentially bring down the peak load by 1-2% will save the governments a lot of money.

## Opinions and How They Change

An individual forms opinions based on how he can himself assimilate the facts around him and also based on what opinions his friends and other people he interacts with hold. This makes things complicated and intriguing enough that this has been an active area of research since decades.

One question is, can we formulate simple enough models that match with the data we get from real life experiments? If we could, then we would get some insight into human behavior and a tool for making useful predictions.

The simplest model that has been studied is this:

An opinion is just a real number. Each person starts with an initial opinion. Next, in each time step, he looks at the opinions held by his friends and updates his own opinion to the average of his old opinion and the opinions of his friends. It doesn’t have to be a simple average. A person may have different trusts for different friends and thus he might want to take a weighted average instead. However the model doesn’t allow individuals to change the weights at any step. The weights chosen in the beginning have to be the weights always.

Using simple linear algebra tricks and borrowing known results from the Markov Chain literature, it can be shown that this kind of system converges to an equilibrium in most natural cases. An equilibrium here means a set of opinions for which the averaging step doesn’t lead to any change, i.e., for all individuals, the new opinion remains the same as the old one. In fact, it can be proved that the equilibrium that’s reached is a consensus, i.e., every individual has the same opinion.

An objection to this model that one might have is the simple representation of an opinion. Can it really be represented by just a single real number?

Anyway, DeGroot, the person who introduced this model also showed that the same thing happens if the opinions are drawn from any vector space and in each time step a person updates his opinion to some convex combination of the opinions of his friends (including himself).

That’s something.

The only issue is that in real life, people don’t reach consensus. So what’s going on?

Of course, the model seems too simple to resemble real life accurately. For one, the weights (or trust) we assign to people changes over time depending on various factors. For example, if a person seems to be changing his mind every minute, we will probably assign a lesser weight to his opinion.

Also, even though this process of repeated averaging has been shown to always converge to a consensus, we don’t really know the amount of time it takes to reach there. From what I know by quickly glancing through Bernard Chazelle’s new work on bird flocking, the time taken by a community whose size is close to the population of a country to reach a consensus is probably way more than the age of the universe.

Anyway. Friedkin and Johnsen modified this model a bit to make it more realistic. In their model, an individual has a fixed internal opinion that doesn’t change with time and during an averaging step, he takes a weighted average of the opinions of his friends (including himself) and the fixed internal opinion. Because the internal opinion can be different for different people, this system will obviously not reach a consensus always.

The system does have an equilibrium though and Friedkin and Johnsen proved that the equilibrium is almost always reached.

However, their model is different from DeGroot’s simpler model in a fundamental way. Let me explain.

Given a set of numbers , the mean is the number that minimizes the function . Thus the averaging step above can be seen as a step where a person is trying to minimize the cost incurred with respect to the cost function . Here represents the neighborhood of , i.e., the set of friends of .

With the above definition of cost, we can measure the quality of a certain opinion vector. For example, we can say that the sum of costs incurred by each person is the social cost of the whole group. And then given an opinion vector, we can decide how good it is by measuring how far it is from the opinion vector that minimizes the social cost. In particular, we can measure the quality of the opinion vector that the group converges to in equilibrium.

The fundamental difference between DeGroot’s model and Friedkin and Johnsen’s model is that in DeGroot’s model, the equilibrium reached also minimizes the total social cost but in Friedkin and Johnsen’s model, it does not necessarily.

David Bindel, Jon Kleinberg and Sigal Oren prove in their FOCS ’11 paper that the situation is not that bad. Even though the total cost at equilibrium may not minimize the total social cost, it can be worse at most by a factor of 9/8. That’s pretty cool.

## Building the perfect world in four extremely difficult steps

(I had written this for Goodblogs a long time ago.)

A perfect world would be one where everyone just did whatever they wanted to and lived happily doing that.

If tomorrow, everyone just decides to do whatever they want to, the world will turn into a chaos. For example, no one will want to clean the garbage and as a result, we will eventually rot in our own filth. To keep the world working, it is necessary that at least some people do things that they don’t particularly like doing. Can we repair this? In general, can we design a perfect world, a world where you never have to feel guilty, a world where you can just do whatever you feel like at any given moment and that will be the best thing to do for you and for the society? If yes, then what are the steps we need to take?

To understand this, we first need to understand what is the best thing for the society. There are things that are good for some people and bad for the others and there are things, that are good for the society right now, but in the long run, will lead to the decay of mankind. Let’s, for the time being, define ‘best’ as the thing that has the best average over people and over time. That is, we take the average happiness level of the world at this time and then take the average of this average over time. The best thing then, would be the thing that maximizes this average.

Once we have this out of our way, we can understand that the essential problem is to align what an individual feels like doing at a given moment and what’s the best thing to do for the society at that moment. Since our definition of ‘best’ depends on happiness levels of people, there are two extreme approaches to solving this problem. One extreme is to reprogramme the human brain so that it feels happy or sad in a more controlled way. This extreme is slightly trivial. All we need to do is to build the perfect mood enhancing drug and make it compulsory for everyone to take it. This will suddenly boost up the total happiness level of the world. The other extreme is to leave the human brain untouched and reengineer the world in such a way that whatever we want at this moment is made possible immediately. This is perhaps impossible. It’s easy to imagine an individual getting so angry at another person that he genuinely wants to kill him, or harm him severely in some other way. If then, this were made possible immediately for him, it would create more grief to the person being harmed than happiness to the person inflicting the harm. It’s similarly easy to imagine completely outrageous, or even physically impossible wishes that a person can make. One might have to break some laws of physcis to make that possible immediately. Since this approach seems impossible and the other extreme is kind of sad, the ideal should lie somewhere between the two extremes.

One possible midway approach is the following.

Step 1 – Build robots to do all the dirty work that no one in the world wants to do but is necessary to be done. This will leave out the kinds of work that at least some people in the world like doing. Let them do that work. But there might still be problems. The person who likes doing the work X might be living in Japan and the place where X needs to be done might be in Canada. Moreover, if we consider one person who likes doing X, then he may not want to do X all the time. The time when X needs to be done in Canada, he might be in a mood to go swimming with his kids.

Step 2 – Build a global work organizer. This will be some huge global device that will take the help of the internet. It will monitor what each person in the world is in the mood of doing right now and the things that need to be done at this moment in different parts of the world. Then, it will match the tasks to the suitable people. Since the world is such a large place, we can assume that what a random person wants to do at a given moment is useful for someone somewhere in the world and what’s useful for a given person is being wanted to be done by someone somewhere in the world. If there is some task, where this doesn’t happen, we already have step 1 to take care of it. Such things are already being done. There are several outsourcing services online for tasks that do not require physical presence. For example, Mechanical Turk and oDesk are websites that are designed exactly for this purpose. Even GoodBlogs is similar. The previous post I wrote was originally intended to be an email to my mother. But somehow somewhere in the world, there was a group of people who agreed to give me $20 for it. However, building this global work organizer will still not build the perfect world. For that, we will need the next step.

Step 3 – Upgrade the human mind so that its emotions are in control. For example, no one should ever feel like seriously inflicting any harm on anyone else. Not just this, but one will also need to make sure to not inflict any harm on the future self. Even if everyone is full of kindness and love towards others, one still might want to smoke a cigarette and thus suffer with lung cancer later in life. If not, then one may simultaneously want to learn how to play the piano *and* to not practise, or, to get a girl *and* to not develop the skills to woo women etc etc.

Step 4 – Make learning easy. Even if we are the masters of our own emotions, have robot servants and are never forced to do something that we don’t want to do, there might be a situation where we want to learn something quickly but we can’t. For example, one one may be craving good food, but not know how to cook. Then it will be good to have a plugin that they can install on their mind to give them that feature in a few seconds.