Talk:Polyphase merge sort

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
???	This article has not yet received a rating on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.

"Ordinary merge sort" section

It's a good technique to describe an improved process by first describing the prior art, then explain the improvements. In this case, the prior art - the "Ordinary merge sort" section - seems to be describing something very old and very arcane. There is already an article about "merge sort"; 1) why not just provide a link to it 2) this description does not match the description there 3) does anyone anywhere on earth use tape drives anymore? Maybe it's worth keeping, but with words more like "Back in the old days, when tape drives were common, one could watch the reels' movements and observe that..." Friendly Person (talk) 23:16, 5 May 2011 (UTC)[reply]

Someone watching the tapes spin would wonder if the idle drives could be more useful. - Polyphase doesn't solve this issue, the merge process transfer rate is still limited to the transfer rate of the output drive, regardless of the number of input drives. What polyphase accomplishes is a reduction in the amount of data sorted during intermediate passes. In the three drive case, it leaves some of the data on one of the input drives to be merged on a later pass in order to balance the load between the drives. Rcgldr (talk) 16:32, 13 January 2012 (UTC)[reply]

Disagree. With N drives, merge sort is only doing an N/2 merge but polyphase merge is doing an N-1 merge. PPM makes the drives more useful -- they reduce the number of passes. Compare, roughly, o(log_N/2(n)) merge passes to o(log_N-1(n)).

PPM is used today, but one doesn't see disk drives move...

And this article needs a lot more work, too.

Glrx (talk) 17:50, 13 January 2012 (UTC)[reply]

My comment was about the "idle drive" statement. The "idle drive" percentage remains the same, polyphase just shifts the idle time from output drives to input drives. The real savings is the reduced number of passes and the reduced amount of data transferred in each pass. I forgot to mention the reduced number of passes.Rcgldr (talk) 01:32, 14 January 2012 (UTC)[reply]

Polyphase isn't quite N-1 merge, other than the first merge pass, the algorithm merges runs from the last pass with runs from 2 passes before, so merged runs increase in size by a factor of 1.618 (fibonacci ratio) instead of by 2. The article mentions it takes 5 passes to sort 13 runs on a 3 drive system with the runs already distributed on 2 of the drives. I'm assuming it took 1 pass to create 13 runs on one drive, then 1 pass to distribute the runs on the two other drives, for a total of 7 passes, but in terms of original run size, only 76 runs are transferred, versus the 104 runs transferred by a standard merge sort which would take 8 passes on 3 drives. Rewind overhead for polyphase would be less than standard merge (unless the standard merged only rewound what were the input drives and then read backwards from EOD on what was the output drive to distribute runs, the next pass would be a descending merge pass). Rcgldr (talk) 01:47, 14 January 2012 (UTC)[reply]

I'm not sure of the savings in a 4 drive situation: I assume polyphase consumes one extra pass to distribute runs, the three run merge size ratio is 1.8393 (3 term fibonacci ratio) versus a 2 way merge size ratio of 2, even though it's a 3 way merge versus a 2 way merge. Polyphases passes involve less data per pass though. Including the creation of runs, then splitting up runs on 3 drives, polyphase takes 6 passes for 17 runs, while a 2 way merge takes 5 passes for 16 runs or 6 passes for 32 runs. For 7 passes polyphase sorts 31 runs, 2 way sorts 64 runs. For 8 passes, polyphase 57 runs, 2 way 128 runs. If the iniital pass creates fixed sized runs and knows the number of records in a file, that would eliminate the pass to distribute runs with polyphase, but still it would be 6 passes poly 31, 2 way 32, 7 passes poly 57, 2 way 64, 8 passes poly 105, 2 way 128. Rcgldr (talk) 22:20, 14 January 2012 (UTC)[reply]

In the case of a natural merge sort, sublist size is variable, and I'm not sure how this would affect efficiency of a polyphase merge sort. Rcgldr (talk) 15:53, 14 January 2012 (UTC)[reply]

Is polyphase a stable sort? Rcgldr (talk) 02:07, 14 January 2012 (UTC)[reply]

Apparently not, using the 13 run example: tape content as run numbers after each pass:

[ |0| |1| |2| |3| |4| ] __________ [ |5| |6| |7| |8| |9| |10| |11| |12| ]

[ |0,5| |1,6| |2,7| |3,8| |4,9| ] __ [ |10| |11| |12| ]

[ |0,5,10| |1,6,11| |2,7,12| ] __ [ |3,8| |4,9| ]

At this point, run |0,5,10| will be merged with run |3,8|. Run 3 records could be getting merged with records from run 0, 5, or 10, and the original ordering on equal records would be lost. Rcgldr (talk) 04:30, 14 January 2012 (UTC)[reply]

Knuth isn't handy. I'd want a reliable source that says PPMS isn't stable. These course notes say it isn't stable, but I'd like a better source. Stablility in an external sort is also simple to enforce by augmenting the key with O(n) space. NMS has some fencepost stability issues if space is not burned. Glrx (talk) 20:14, 17 January 2012 (UTC)[reply]

"Ordinary merge sort" - tape drives

I found run counts for a 4 drive example on this web page esort.html. Expanding on this, I did a comparson for a 4 tape drive sort for polyphase versus standard merge, first table shows pass count, second table shows total amount of data transferred (in counts of original run size) starting with data already distributed on the input tapes, which shows PPMS is better.

passes      polyphase                 # runs          2 way merge                # runs
17    |     0  19513  16377  10609  |  46499 |  |     0      0  65536  65536   | 131072 |
16    | 10609   8904   5768      0  |  25281 |  | 32768  32768      0      0   |  65536 |
15    |  4841   3136      0   5768  |  13745 |  |     0      0  16384  16384   |  32768 |
14    |  1705      0   3136   2632  |   7473 |  |  8192   8192      0      0   |  16384 |
13    |     0   1705   1431    927  |   4063 |  |     0      0   4096   4096   |   8192 |
12    |   927    778    504      0  |   2209 |  |  2048   2048      0      0   |   4096 |
11    |   423    274      0    504  |   1201 |  |     0      0   1024   1024   |   2048 |
10    |   149      0    274    230  |    653 |  |   512    512      0      0   |   1024 |
 9    |     0    149    125     81  |    355 |  |     0      0    256    256   |    512 |
 8    |    81     68     44      0  |    193 |  |   128    128      0      0   |    256 |
 7    |    37     24      0     44  |    105 |  |     0      0     64     64   |    128 |
 6    |    13      0     24     20  |     57 |  |    32     32      0      0   |     64 |
 5    |     0     13     11      7  |     31 |  |     0      0     16     16   |     32 |
 4    |     7      6      4      0  |     17 |  |     8      8      0      0   |     16 |
 3    |     3      2      0      4  |      9 |  |     0      0      4      4   |      8 |
 2    |     1      0      2      2  |      5 |  |     2      2      0      0   |      4 |
 1    |     0      1      1      1  |      3 |  |     0      0      1      1   |      2 |
 0    |     1      0      0      0  |      1 |  |     1      0      0      0   |      1 |

initial number of runs versus total amount of data transferred (in counts of original run size) PPMS is better:

         PPMS     MS
7473  | 67376  97149 |
4063  | 34119  48756 |
 105  |   492    735 |
  57  |   232    342 |
  31  |   107    155 |

Rcgldr (talk) 05:31, 22 January 2012 (UTC)[reply]

Polyphase instability - I included an example showing the issue above for a 15 run case. NMS instability - Does this only occur if using compares (versus filemarks or array of sizes) to determine the end of runs? Rcgldr (talk) 10:18, 19 January 2012 (UTC)[reply]

NMS instability - Does this only occur if using compares (versus filemarks ... ) to determine the end of runs? Apparently so, according to those course notes you linked to above. The course notes mention using some type of end of run indicator, such as a max value record, to avoid this issue. Rcgldr (talk) 09:00, 22 January 2012 (UTC)[reply]

A PPMS phase does not equal a merge sort pass. A phase writes one tape; a merge sort pass writes two tapes. A phase does not touch all the data; a pass touches all the data. A PPMS run does not equal a merge sort run. A polyphase run is made by successively merging N-1 sources; a merge sort run successively merges N/2 sources.

WP does not report WP:OR; editors don't get to voice their judgment or beliefs about topics; instead, editors should report what reliable sources say. Find a reliable source that reports PPMS and MS sorting speeds. Also, if PPMS isn't better than MS, then why was there any interest in PPMS? Don't speculate about sorting techniques, buffer size choices, I/O rates, disk drive bandwidth, channel capacity, cache performance, scatter/gather, rebuffering, memory mapping, number of files, striping, reading tapes backwards, or various other issues. That's WP:OR. Find a reliable source.

I'm not overly concerned with stable sorts. Stability can be enforced in O(n) by augmenting the data -- whether by extending the key with a record number, using width arrays, adding infinity keys/null records/tape marks, or whatever. Many databases use unique keys, so sort stability does not matter.

Glrx (talk) 17:22, 21 January 2012 (UTC)[reply]

A PPMS phase does not equal a merge sort pass. - You're correct, I should have compared the total amount of data written for PPMS versus merge sort. I added a second table that shows PPMS is better. Even if PPMS takes two full passes to first generate runs on a single tape, then read that tape and distribute runs onto 3 tapes, it's still faster. Rcgldr (talk) 23:01, 21 January 2012 (UTC)[reply]

You seem to continually start from some belief that PPMS must be inferior to MS. Don't make assumptions like that either way. If PPMS had to make a separate initialization pass, then it would have a severe handicap compared to merge sort, would probably be an inferior algorithm, and nobody would bother to discuss it. Instead, you should just recognize it as something you don't understand yet. For the technique, read Knuth. PPMS reads the input file, distributes the data onto N-1 files, and simultaneously calculates the appropriate general fib distribution by keeping tabs on dummy runs and occasionally bumping the fib order.

Wikis (even Wikipedia) are not reliable sources; they are edited by idiots like me. Self-published websites are not reliable either. WP's original version of PPMS was based on some professor's flawed class notes.

I would not put too much stock into vague statements about > 8 drives. Sorting performance is limited by many different constraints. Beat on one constraint long enough, and some other constraint can become the bottle neck. And what does such a statement mean? Above 8 drives, I should switch to a straight merge sort? How about using 10 drives, splitting the sort in half, and running 2 instances of PPMS with 5 drives each? For a particular system (with limited drives, limited channel capacity, limited RAM), there may be more effective hybrid sorting solutions. But that is stepping outside of the theory of the basic algorithm.

Glrx (talk) 23:15, 21 January 2012 (UTC)[reply]

In general, polyphase merging is better than balanced n-way merging when there's a small (8) number of devices. - I created a program to grind this out. The initial condition assumes data already distributed on the drives, same as the article. 10 devices seems to be the transition point where which is better (PPMS or MS) depended if run count was just above a MS boundary point and just below or at a PPMS boundary point. At 12 or more devices, MS is better at any run count > 1000. Rcgldr (talk) 04:43, 22 January 2012 (UTC)[reply]

You seem to continually start from some belief that PPMS must be inferior to MS. - That was based on feedback I received about disk sorts a few years ago. Now that I'm reading the wiki and class note web page articles about PPMS, I've run into a conflict between what I was told before and what I'm reading now, and I've been trying to sort out this issue. Rcgldr (talk) 09:00, 22 January 2012 (UTC)[reply]

"Ordinary merge sort" - disk drives

For a disk sort, even on a single hard drive, with the amount of ram available on a typical PC, you could use 1.7GB of ram for a 16-way merge sort, reading or writing 100MB at a time, which would take about 1 second or so on a hard drive, reducing the random access overhead to 2% (about .02 second per random access) eliminating any significant benefit for doing sequential I/O beyond 100MB. Using two drives would double the effective transfer rate with concurrent I/O, four would quadruple it. For an odd number of drives, polyphase might help, but the alternative would be to use the drives as a striped group, reading or writing concurrently N times as fast rather than trying to read and write concurrently. Rcgldr (talk) 01:31, 14 January 2012 (UTC)[reply]

This is speculation. Larger buffers are better than smaller buffers, but there are diminishing returns -- and other problems can be tickled with very large buffers. Async double buffering/ring buffering can do wonders. Disk drives also have caches. Once the I/O rate is close to its peak, there's not a whole lot to do. I think one can do better with individual drives than striping. The striping is good for a fast write on output, but it doesn't help much when the file is turned around for input (where data demand is only 1/N). But what I think doesn't matter. Statements need sources. Glrx (talk) 20:52, 17 January 2012 (UTC)[reply]

The talk page for External_sorting might be a better place to discuss disk sorting. In the case of a disk drive, on a modern computer with a huge amount of ram, large I/O size can reduce the overhead of random access so that a single disk drive can be treated as 16 or more sequential devices. I'm not sure what you mean by data demand is 1/N, since I'm suggesting a K way merge on M drives, where K is M times some power of 2 (such as K = 16 M), and with the right buffering, all M drives are concurrently reading or writing almost continously. What are these guys using for these sort benchmarks? [sortbenchmark.org] Rcgldr (talk) 10:18, 19 January 2012 (UTC)[reply]

Find a reliable source that reports PPMS and MS sorting speeds. - For disk sorting, I've read that PPMS isn't used because MS is better for a large k-way (like k = 16) merge sort which can be used even on a single drive if using large I/O sizes, but I don't recall the source for that information. I can only site these class notes esort.html which implies MS is better than PPMS with more than 8 drives: In general, polyphase merging is better than balanced n-way merging when there's a small (> 8) number of devices. The wiki article and talk page for External_sorting, mentions using a 9-way merge sort (18 pseudo devices) for a disk sort with large I/O size. Rcgldr (talk) 01:14, 23 January 2012 (UTC)[reply]

Whoever invented this is gay

As per subject, thanks. 130.56.95.198 (talk) 04:33, 20 July 2012 (UTC)[reply]