# CHAPTER 28: SORTING NETWORKS

Comparison networks differ from RAM's in two important respects. First, they can only perform comparisons. Thus, an algorithm such as counting sort (see Section 9.2) cannot be implemented on a comparison network. Second, unlike the RAM model, in which operations occur serially--that is, one after another--operations in a comparison network may occur at the same time, or "in parallel." As we shall see, this characteristic allows the construction of comparison networks that sort n values in sublinear time.

We begin in Section 28.1 by defining comparison networks and sorting networks. We also give a natural definition for the "running time" of a comparison network in terms of the depth of the network. Section 28.2 proves the "zero-one principle," which greatly eases the task of analyzing the correctness of sorting networks.

The efficient sorting network that we shall design is essentially a parallel version of the merge-sort algorithm from Section 1.3.1. Our construction will have three steps. Section 28.3 presents the design of a "bitonic" sorter that will be our basic building block. We modify the bitonic sorter slightly in Section 28.4 to produce a merging network that can merge two sorted sequences into one sorted sequence. Finally, in Section 28.5, we assemble these merging networks into a sorting network that can sort n values in O(lg2 n) time.

# 28.1 Comparison networks

#### Figure 28.1 (a) A comparator with inputs x and y and outputs x' and y'. (b) The same comparator, drawn as a single vertical line. Inputs x = 7, y = 3 and outputs x' = 3, y' = 7 are shown.

`x'  =  min(x, y) ,`

`y'  =  max(x, y) .`

Because the pictorial representation of a comparator in Figure 28.1 (a) is too bulky for our purposes, we shall adopt the convention of drawing comparators as single vertical lines, as shown in Figure 28.1(b). Inputs appear on the left and outputs on the right, with the smaller input value appearing on the top output and the larger input value appearing on the bottom output. We can thus think of a comparator as sorting its two inputs.

We shall assume that each comparator operates in O(1) time. In other words, we assume that the time between the appearance of the input values x and y and the production of the output values x' and y' is a constant.

Figure 28.2 shows a comparison network, which is a set of comparators interconnected by wires. We draw a comparison network on n inputs as a collection of n horizontal lines with comparators stretched vertically. Note that a line does not represent a single wire, but rather a sequence of distinct wires connecting various comparators. The top line in Figure 28.2, for example, represents three wires: input wire a1, which connects to an input of comparator A; a wire connecting the top output of comparator A to an input of comparator C; and output wire b1, which comes from the top output of comparator C. Each comparator input is connected to a wire that is either one of the network's n input wires a1, a2, . . . , an or is connected to the output of another comparator. Similarly, each comparator output is connected to a wire that is either one of the network's n output wires b1, b2, . . . , bn or is connected to the input of another comparator. The main requirement for interconnecting comparators is that the graph of interconnections must be acyclic: if we trace a path from the output of a given comparator to the input of another to output to input, etc., the path we trace must never cycle back on itself and go through the same comparator twice. Thus, as in Figure 28.2, we can draw a comparison network with network inputs on the left and network outputs on the right; data move through the network from left to right.

#### Figure 28.2 (a) A 4-input, 4-output comparison network, which is in fact a sorting network. At time 0, the input values shown appear on the four input wires. (b) At time 1, the values shown appear on the outputs of comparators A and B, which are at depth l. (c) At time 2, the values shown appear on the outputs of comparators C and D, at depth 2. Output wires b1 and b4 now have their final values, but output wires b2 and b3 do not. (d) At time 3, the values shown appear on the outputs of comparator E, at depth 3. Output wires b2 and b3 now have their final values.

Each comparator produces its output values only when both of its input values are available to it. In Figure 28.2(a), for example, suppose that the sequence 9, 5, 2, 6 appears on the input wires at time 0. At time 0, then, only comparators A and B have all their input values available. Assuming that each comparator requires one time unit to compute its output values, comparators A and B produce their outputs at time 1; the resulting values are shown in Figure 28.2(b). Note that comparators A and B produce their values at the same time, or "in parallel." Now, at time 1, comparators C and D, but not E, have all their input values available. One time unit later, at time 2, they produce their outputs, as shown in Figure 28.2(c). Comparators C and D operate in parallel as well. The top output of comparator C and the bottom output of comparator D connect to output wires b1 and b4, respectively, of the comparison network, and these network output wires therefore carry their final values at time 2. Meanwhile, at time 2, comparator E has its inputs available, and Figure 28.2(d) shows that it produces its output values at time 3. These values are carried on network output wires b2 and b3, and the output sequence 2, 5, 6, 9 is now complete.

A comparison network is like a procedure in that it specifies how comparisons are to occur, but it is unlike a procedure in that its physical size depends on the number of inputs and outputs. Therefore, we shall actually be describing "families" of comparison networks. For example, the goal of this chapter is to develop a family SORTER of efficient sorting networks. We specify a given network within a family by the family name and the number of inputs (which equals the number of outputs). For example, the n-input, n-output sorting network in the family SORTER is named SORTER[n].

## Exercises

Show the values that appear on all the wires of the network of Figure 28.2 when it is given the input sequence 9, 6, 5, 2.

Let n be an exact power of 2. Show how to construct an n-input, n-output comparison network of depth lg n in which the top output wire always carries the minimum input value and the bottom output wire always carries the maximum input value.

Prove that the number of comparators in any sorting network is at least (n lg n).

We can represent an n-input comparison network with c comparators as a list of c pairs of integers in the range from 1 to n. If two pairs contain an integer in common, the order of the corresponding comparators in the network is determined by the order of the pairs in the list. Given this representation, describe an O(n+c)-time (serial) algorithm for determining the depth of a comparison network.

Suppose that in addition to the standard kind of comparator, we introduce an "upside-down" comparator that produces its minimum output on the bottom wire and its maximum output on the top wire. Show how to convert any sorting network that uses a total of c standard and upside-down comparators to one that uses c standard ones. Prove that your conversion method is correct.

# 28.2 The zero-one principle

The proof of the zero-one principle relies on the notion of a monotonically increasing function (Section 2.2).

If a comparison network transforms the input sequence a = a1, a2, . . . , an into the output sequence b = b1, b2, . . . , bn, then for any monotonically increasing function f, the network transforms the input sequence f(a) = f(a1), f(a2), . . . , f(an) into the output sequence f(b) = f(b1), f(b2), . . . , f(bn).

Proof We shall first prove the claim that if f is a monotonically increasing function, then a single comparator with inputs f(x) and f(y) produces outputs f(min(x,y)) and f(max(x,y)). We shall then use induction to prove the lemma.

To prove the claim, consider a comparator whose input values are x and y. The upper output of the comparator is min(x,y) and the lower output is max(x,y). Suppose we now apply f(x) and f(y) to the inputs of the comparator, as is shown in Figure 28.4. The operation of the comparator yields the value min(f(x), f(y)) on the upper output and the value max(f(x), f(y)) on the lower output. Since f is monotonically increasing, x y implies f(x) f(y). Consequently, we have the identities

`min(f(x), f(y))  =  f(min(x,y)) ,`

`max(f(x), f(y))  =  f(max(x,y)) .`

Thus, the comparator produces the values f (min(x,y)) and f (max(x,y)) when f (x) and f (y) are its inputs, which completes the proof of the claim.

#### Figure 28.4 The operation of the comparator in the proof of Lemma 28.1. The function f is monotonically increasing.

We can use induction on the depth of each wire in a general comparison network to prove a stronger result than the statement of the lemma: if a wire assumes the value ai when the input sequence a is applied to the network, then it assumes the value f(ai) when the input sequence f(a) is applied. Because the output wires are included in this statement, proving it will prove the lemma.

For the basis, consider a wire at depth 0, that is, an input wire ai. The result follows trivially: when f(a) is applied to the network, the input wire carries f(ai). For the inductive step, consider a wire at depth d, where d 1. The wire is the output of a comparator at depth d, and the input wires to this comparator are at a depth strictly less than d. By the inductive hypothesis, therefore, if the input wires to the comparator carry values ai and aj when the input sequence a is applied, then they carry f(ai) and f(aj) when the input sequence f(a) is applied. By our earlier claim, the output wires of this comparator then carry f(min(ai, aj)) and f(max(ai, aj)). Since they carry min(ai, aj) and max(ai, aj) when the input sequence is a, the lemma is proved.

As an example of the application of Lemma 28.1, Figure 28.5 shows the sorting network from Figure 28.2 with the monotonically increasing function f(x) = x/2 applied to the inputs. The value on every wire is f applied to the value on the same wire in Figure 28.2.

When a comparison network is a sorting network, Lemma 28.1 allows us to prove the following remarkable result.

If a comparison network with n inputs sorts all 2n possible sequences of 0's and 1's correctly, then it sorts all sequences of arbitrary numbers correctly.

#### Figure 28.5 (a) The sorting network from Figure 28.2 with input sequence 9, 5, 2, 6. (b) The same sorting network with the monotonically increasing function f(x) = f(x/2) applied to the inputs. Each wire in this network has the value of f applied to the value on the corresponding wire in (a).

Proof Suppose for the purpose of contradiction that the network sorts all zero-one sequences, but there exists a sequence of arbitrary numbers that the network does not correctly sort. That is, there exists an input sequence a1, a2, . . . , an containing elements ai and aj such that ai < aj, but the network places aj before ai in the output sequence. We define a monotonically increasing function f as

Since the network places aj before ai in the output sequence when a1, a2, . . . , an is input, it follows from Lemma 28.1 that it places f(aj) before f(ai) in the output sequence when f(a1), f(a2), . . . , f(an) is input. But since f(aj) = 1 and f(ai) = 0, we obtain the contradiction that the network fails to sort the zero-one sequence f(a1), f(a2), . . . , f(an) correctly.

## Exercises

Prove that applying a monotonically increasing function to a sorted sequence produces a sorted sequence.

Prove that a comparison network with n inputs correctly sorts the input sequence n, n - 1, . . . , 1 if and only if it correctly sorts the n - 1 zero-one sequences 1, 0, 0, . . . , 0, 0, 1, 1, 0, . . . , 0, 0, . . . , 1, 1, 1, . . . , 1, 0.

Use the zero-one principle to prove that the comparison network shown in Figure 28.6 is a sorting network.

#### Figure 28.6 A sorting network for sorting 4 numbers.

Prove that an n-input sorting network must contain at least one comparator between the ith and (i + 1 )st lines for all i = 1, 2, . . . , n - 1.

# 28.3 A bitonic sorting network

The bitonic sorter that we shall construct is a comparison network that sorts bitonic sequences of 0's and 1's. Exercise 28.3-6 asks you to show that the bitonic sorter can sort bitonic sequences of arbitrary numbers.

## The half-cleaner

#### Figure 28.7 The comparison network HALF-CLEANER[8]. Two different sample zero-one input and output values are shown. The input is assumed to be bitonic. A half-cleaner ensures that every output element of the top half is at least as small as every output element of the bottom half. Moreover, both halves are bitonic, and at least one half is clean.

If the input to a half-cleaner is a bitonic sequence of 0's and 1's, then the output satisfies the following properties: both the top half and the bottom half are bitonic, every element in the top half is at least as small as every element of the bottom half, and at least one half is clean.

Proof The comparison network HALF-CLEANER[n] compares inputs i and i + n/2 for i = 1, 2, . . . , n/2. Without loss of generality, suppose that the input is of the form 00 . . . 011 . . . 100 . . . 0. (The situation in which the input is of the form 11 . . . 100 . . . 011 . . . 1 is symmetric.) There are three possible cases depending upon the block of consecutive 0's or 1s in which the midpoint n/2 falls, and one of these cases (the one in which the midpoint occurs in the block of 1's) is further split into two cases. The four cases are shown in Figure 28.8. In each case shown, the lemma holds.

## The bitonic sorter

#### Figure 28.9 The comparison network BITONIC-SORTER[n], shown here for n = 8. (a) The recursive construction: HALF-CLEANER[n] followed by two copies of BITONIC-SORTER[n/2] that operate in parallel. (b) The network after unrolling the recursion. Each half-cleaner is shaded. Sample zero-one values are shown on the wires.

Thus, a zero-one bitonic sequence can be sorted by BITONIC-SORTER, which has a depth of lg n. It follows by the analog of the zero-one principle given as Exercise 28.3-6 that any bitonic sequence of arbitrary numbers can be sorted by this network.

## Exercises

How many bitonic sequences of 0's and 1's are there?

Show that BITONIC-SORTER[n], where n is an exact power of 2, contains (n lg n) comparators.

Describe how an O(lg n)-depth bitonic sorter can be constructed when the number n of inputs is not an exact power of 2.

If the input to a half-cleaner is a bitonic sequence of arbitrary numbers, prove that the output satisfies the following properties: both the top half and the bottom half are bitonic, and every element in the top half is at least as small as every element in the bottom half.

Consider two sequences of 0's and 1's. Prove that if every element in one sequence is at least as small as every element in the other sequence, then one of the two sequences is clean.

# 28.4 A merging network

The merging network is based on the following intuition. Given two sorted sequences, if we reverse the order of the second sequence and then concatenate the two sequences, the resulting sequence is bitonic. For example, given the sorted zero-one sequences X = 00000111 and Y = 00001111, we reverse Y to get YR = 11110000. Concatenating X and YR yields 0000011111110000, which is bitonic. Thus, to merge the two input sequences X and Y, it suffices to perform a bitonic sort on X concatenated with YR.

We can construct MERGER[n] by modifying the first half-cleaner of BITONIC-SORTER[n]. The key is to perform the reversal of the the second half of the inputs implicitly. Given two sorted sequences a1, a2, . . . , an/2 and an/2+1, an/2+2, . . . , an to be merged, we want the effect of bitonically sorting the sequence a1, a2, . . . , an/2, an, an-1, . . . , an/2+1. Since the half-cleaner of BITONIC-SORTER[n] compares inputs i and n/2 + i, for i = 1, 2, . . . , n/2, we make the first stage of the merging network compare inputs i and n - i + 1. Figure 28.10 shows the correspondence. The only subtlety is that the order of the outputs from the bottom of the first stage of MERGER[n] are reversed compared with the order of outputs from an ordinary half-cleaner. Since the reversal of a bitonic sequence is bitonic, however, the top and bottom outputs of the first stage of the merging network satisfy the properties in Lemma 28.3, and thus the top and bottom can be bitonically sorted in parallel to produce the sorted output of the merging network.

#### Figure 28.11 A network that merges two sorted input sequences into one sorted output sequence. The network MERGER[n] can be viewed as BITONIC-SORTER[n] with the first half-cleaner altered to compare inputs i and n - i + 1 for i = 1, 2, . . . , n/2. Here, n = 8. (a) The network decomposed into the first stage followed by two parallel copies of BITONIC-SORTER[n/2]. (b) The same network with the recursion unrolled. Sample zero-one values are shown on the wires, and the stages are shaded.

The resulting merging network is shown in Figure 28.11. Only the first stage of MERGER[n] is different from BITONIC-SORTER[n]. Consequently, the depth of MERGER[n] is lg n, the same as that of BITONIC-SORTER[n].

## Exercises

How many different zero-one input sequences must be applied to the input of a comparison network to verify that it is a merging network?

Show that any network that can merge 1 item with n - 1 items to produce a sorted sequence of length n must have depth at least lg n.

Prove that any merging network, regardless of the order of inputs, requires (n lg n) comparators.

# 28.5 A sorting network

Figure 28.12(a) shows the recursive construction of SORTER[n]. The n input elements are sorted by using two copies of SORTER[n/2] recursively to sort (in parallel) two subsequences of length n/2 each. The two resulting sequences are then merged by MERGER[n]. The boundary case for the recursion is when n = 1, in which case we can use a wire to sort the 1-element sequence, since a 1-element sequence is already sorted. Figure 28.12(b) shows the result of unrolling the recursion, and Figure 28.12(c) shows the actual network obtained by replacing the MERGER boxes in Figure 28.12(b) with the actual merging networks.

#### Figure 28.12 The sorting network SORTER[n] constructed by recursively combining merging networks. (a) The recursive construction. (b) Unrolling the recursion. (c) Replacing the MERGER boxes with the actual merging networks. The depth of each comparator is indicated, and sample zero-one values are shown on the wires.

Data pass through lg n stages in the network SORTER[n]. Each of the individual inputs to the network is already a sorted 1-element sequence. The first stage of SORTER[n] consists of n/2 copies of MERGER[2] that work in parallel to merge pairs of 1-element sequences to produce sorted sequences of length 2. The second stage consists of n/4 copies of MERGER[4] that merge pairs of these 2-element sorted sequences to produce sorted sequences of length 4. In general, for k = 1, 2, . . . , lg n, stage k consists of n/2k copies of MERGER[2k] that merge pairs of the 2k-1-element sorted sequences to produce sorted sequences of length 2k. At the final stage, one sorted sequence consisting of all the input values is produced. This sorting network can be shown by induction to sort zero-one sequences, and consequently, by the zero-one principle (Theorem 28.2), it can sort arbitrary values.

We can analyze the depth of the sorting network recursively. The depth D(n) of SORTER[n] is the depth D(n/2) of SORTER[n/2] (there are two copies of SORTER[n/2], but they operate in parallel) plus the depth lg n of MERGER[n]. Consequently, the depth of SORTER[n] is given by the recurrence

whose solution is D(n) = (lg2 n). Thus, we can sort n numbers in parallel in O(lg2 n) time.

## Exercises

Suppose we modify a comparator to take two sorted lists of length k as inputs, merge them, and output the largest k to its "max" output and the smallest k to its "min" output. Show that any sorting network on n inputs with comparators modified in this fashion can sort nk numbers, assuming that each input to the network is a sorted list of length k.

Suppose that we have 2n elements a1, a2, . . . , a2n and wish to partition them into the n smallest and the n largest. Prove that we can do this in constant additional depth after separately sorting a1,a2, . . . , an and an+1, an+2, . . . , a2n.

Let S(k) be the depth of a sorting network with k inputs, and let M(k) be the depth of a merging network with 2k inputs. Suppose that we have a sequence of n numbers to be sorted and know that every number is within k positions of its correct position in the sorted order. Show that we can sort the n numbers in depth S(k) + 2M(k).

We can sort the elements of an m X m matrix by repeating the following procedure k times:

1. Sort each odd-numbered row into monotonically increasing order.

2. Sort each even-numbered row into monotonically decreasing order.

3. Sort each column into monotonically increasing order.

How many iterations k are required for this procedure to sort, and what is the pattern of the sorted output?

# Problems

a. Show that any transposition network with n inputs that sorts has (n2) comparators.

b. Prove that a transposition network with n inputs is a sorting network if and only if it sorts the sequence n, n- 1, . . . , 1. (Hint: Use an induction argument analogous to the one in the proof of Lemma 28.1.)

c. Prove that the family of odd-even sorting networks is indeed a family of sorting networks.

a. Draw a 2n-input merging network for n = 4.

#### Figure 28.14 Permutation networks. (a) The permutation network P2, which consists of a single switch that can be set in either of the two ways shown. (b) The recursive construction of P8 from 8 switches and two P4's. The switches and P4's are set to realize the permutation = 4, 7, 3, 5, 1, 6, 8, 2.

b. Use the zero-one principle to prove that any 2n-input odd-even merging network is indeed a merging network.

c. What is the depth of a 2n-input odd-even merging network? What is its size?

a. Argue that if we replace each comparator in a sorting network with the switch of Figure 28.14(a), the resulting network is a permutation network. That is, for any permutation , there is a way to set the switches in the network so that input i is connected to output (i).

Figure 28.14(b) shows the recursive construction of an 8-input, 8-output permutation network P8 that uses two copies of P4 and 8 switches. The switches have been set to realize the permutation = 4, 7, 3, 5, 1, 6, 8, 2, which requires (recursively) that the top P4 realize 4, 2, 3, 1 and the bottom P4 realize 2, 3, 1, 4.

b. Show how to realize the permutation 5, 3, 4, 6, 1, 8, 2, 7 on P8 by drawing the switch settings and the permutations performed by the two P4's.

Let n be an exact power of 2. Define Pn recursively in terms of two Pn/2's in a manner similar to the way we defined P8.

c. Describe an O(n)-time (ordinary random-access machine) algorithm that sets the n switches connected to the inputs and outputs of Pn and specifies the permutations that must be realized by each Pn/2 in order to accomplish any given n-element permutation. Prove that your algorithm is correct.

d. What are the depth and size of Pn? How long does it take on an ordinary random-access machine to compute all switch settings, including those within the Pn/2's?

e. Argue that for n > 2, any permutation network--not just Pn--must realize some permutation by two distinct combinations of switch settings.

# Chapter notes

Knuth [123] contains a discussion of sorting networks and charts their history. They apparently were first explored in 1954 by P. N. Armstrong, R. J. Nelson, and D. J. O'Connor. In the early 1960's, K. E. Batcher discovered the first network capable of merging two sequences of n numbers in O(lg n) time. He used odd-even merging (see Problem 28-2), and he also showed how this technique could be used to sort n numbers in O(lg2 n) time. Shortly afterwards, he discovered an O(lg n)-depth bitonic sorter similar to the one presented in Section 28.3. Knuth attributes the zero-one principle to W. G. Bouricius (1954), who proved it in the context of decision trees.