Wednesday, June 17, 2009

Wednesday Math, Vol. 76: Minimal spanning trees

The word "graph" has two meanings in math. The most common usage is any pictoral representation of numerical stuff, like pie charts and histograms, or a line drawing that corresponds to an equation, like a line or a parabola, or a jagged line that tracks some variable over time, like a stock market plot.

When mathematicians talk about "graph theory", it means dots connected by lines as in the pictures below, though technically the dots are called vertices and the lines are called edges. Earlier Wednesday Math posts have dealt with graph theory, most notably the Road Coloring Problem, the way to check if a structure is rigid, and the Traveling Salesman Problem, one of the difficult problems in the category known as NP-complete.

One type of graph is called a weighted graph, where numbers are associated with the edges. Usually the number are positive, but not always, and they can represent distances or costs or capacities or any of a number other concepts, so the word "weight" is used instead of "length". For example, if the numbers in this graph were lengths, these shouldn't be straight lines. Notice the 5, 2 and 1 weights, marked in boxes at the top of the graph. If I wanted to travel from the middle top point to the point just to the right, going down first and then up would be a short cut, 2+1=3 instead of the "straight line" distance of 5, and from last week's post about the triangle inequality, we aren't supposed to have short cuts.

Let's assume instead that the numbers represent costs in this case. Our challenge is to find a subset of edges that connect every vertex to every other vertex for the lowest cost. This problem is called the minimal spanning tree.

Like many problems in math, there is more than one correct way to solve this problem and there may be more than one correct answer. The two most famous methods are called Prim's Algorithm and Kruskal's Algorithm, both named for guys who worked at Bell Labs back in the 1950's. Both are called "greedy algorithms", where the way to get the best answer for the final problem is to take the best option at every step of the method.

One of the minimal spanning trees for this graph is shown in red lines. Note that some cheap edges aren't used, like many of the edges with cost equal to 4, while the edge with cost equal to 7, marked in a red circle in the lower right hand has to be used, because that is the cheapest way to connect the lower right hand vertex to the rest of the graph. The only other choice is the edge with cost 8.

In this graph, there are two different sets of edges that can do the same job of connecting everything for the same cost. Since the cost is "minimal", that is the part of the solution that cannot change, but the two edges with cost 3 marked in boxes are interchangable. If we remove the edge at the top and replace it with the edge in the middle, we still have total connectivity at the lowest possible price.

Finding the number of different minimal spanning trees is Matty Boy's personal contribution to higher mathematics. I came up with a method for doing this, though it isn't the only one. As of a few years ago, my method was being taught at Stanford by Persi Diaconis, a well-known name in statistics and math, who I met at a few local math conferences.

Finding alternate minimal spanning trees can be useful in a really big graph, like the connections between computer nodes that make up the internet. If different users are making use of different minimal spanning trees at the same time, this can cut down on traffic congestion and make the internet faster for everybody while keeping costs low. Because these are greedy algorithms, the problem of finding minimal spanning trees can be solved very quickly on a fast computer, even for a very big instance of the problem, like a graph the size of the entire internet.


dguzman said...

So is your method named after you? Please say yes, so I can brag about you (more) to my geek friends.

Matty Boy said...

No, my method is not named after me. None of the mathy things I have done is named after me. Somebody else has to step in and decide a thing is Hubbard's Algortihm or the Hubbard Confidence of Victory Method or what have you. We are discouraged from naming things after ourselves.

That's how mathematicians roll.

ken said...

And where can I read about your not-yet-eponymous algorithm?

Matty Boy said...

Hey, Ken. I could send you the paper if you are interested.

ken said...