Wednesday, March 17, 2010

Wednesday Math, Vol. 113: MergeSort

Last week, the topic was sorting lists quickly, a problem one runs into regularly in the field of computer science, and I showed the first pass of the algorithm QuickSort, which uses a general method known as divide and conquer. This week, I'll show all the steps of the MergeSort method, which also works on average at the rate of nlogn operations, much faster than the brute force methods that work at ½n² operations. A standard sorting method that works at the slower speed is to search the list to find the smallest thing and put it at the front of the list, then find the second smallest and put it in position #2, etc. until completed.

Let me give an example of MergeSort from my real life. I have two sets of graded papers from the same class and both sets are alphabetized. If I want to merge the two piles into one, I don't have to start all over again, I can use the work that's already been down alphabetizing them separately. The first paper on the merged list has to be either the top paper from list #1 or the top from list #2. I take that paper and put it on a new stack face down. Now I make the comparison between the new two papers on top of the stacks and put it face down on the new stack, again and again, until all the papers are on the new stack face down. Once one of the original stacks is depleted, there are no more comparisons to be made and I can put the rest of the remaining stack face down on top of the merged stack, turn it back face up and it is completely sorted.

If we start with a list that hasn't been sorted at all, our first step is to split it into sublists that are sorted from lowest to highest. I underlined in red those lists, and the algorithm is to start a new list if the thing on the list is less than the previous entry. It looks like this.

Sublist #1: 31 94
Sublist #2: 69
Sublist #3: 62
Sublist #4: 25
Sublist #5: 9 20
Sublist #6: 13
Sublist #7: 11 37
Sublist #8: 5

We then look for consecutive sublists that have only one thing, lists known as singletons. For example, if we take Sublists #2 through #4 we have 69 62 25. If we switch the order to 25 62 69, this is a new Sublist in ascending order 25 62 69. We now have six sublists instead of eight.

Sub list #1: 31 94
Sub list #2: 25 62 69
Sub list #3: 9 20
Sub list #4: 13
Sub list #5: 11 37
Sub list #6: 5

The algorithm now is to merge the adjacent lists, #1 with #2, #3 with #4, #5 with #6, etc. So after this pass, we should have exactly half as many lists as we did before if we had an even number of lists, and one more than half if there were an odd number of lists. We do that again with the new lists, effectively cutting the number of lists in half every time, which means we should be down to one list in relatively few passes. For example, let's say we had 10,000 things on our list and on our first pass we had 3,257 ascending sublists, which means on average we have about three things on a list before a small thing comes after a larger thing. Even with this very unordered mess, it should only take 12 passes before we are down to a single list, and every time we go through a single pass we make slightly less than 10,000 comparisons.

After pass 1: 1,629 sublists, about 10,000 total comparisons.
After pass 2: 815 sublists, about 20,000 total comparisons.
After pass 3: 408 sublists, about 30,000 total comparisons.
After pass 4: 204 sublists, about 40,000 total comparisons.
After pass 5: 102 sublists, about 50,000 total comparisons.
After pass 6: 51 sublists, about 60,000 total comparisons.
After pass 7: 26 sublists, about 70,000 total comparisons.
After pass 8: 13 sublists, about 80,000 total comparisons.
After pass 9: 7 sublists, about 90,000 total comparisons.
After pass 10: 4 sublists, about 100,000 total comparisons.
After pass 11: 2 sublists, about 110,000 total comparisons.
After pass 12: 1 list, about 120,000 total comparisons.

To be fair, there are the 10,000 comparisons we do the first time through on Pass 0, when we count the number of sublists, so this algorithm in this made up instance needs to go through about 130,000 comparisons. That seems like a lot to a human, but a computer will do that work in a blink of an eye, and it's a lot more elegant than the brute force method finding the smallest thing, then the next and the next, etc. Doing it that way will take fifty million comparisons, much slower than the few hundred thousand we need when doing MergeSort.

Yay, computer science!

No comments: