The *i*th * order statistic* of a set of

This chapter addresses the problem of selecting the *i*th order statistic from a set of *n* distinct numbers. We assume for convenience that the set contains distinct numbers, although virtually everything that we do extends to the situation in which a set contains repeated values. The * selection problem* can be specified formally as follows:

**Input: **A set *A* of *n* (distinct) numbers and a number *i*, with 1 *i* *n*.

**Output: **The element *x* *A* that is larger than exactly *i* -1 other elements of *A*.

How many comparisons are necessary to determine the minimum of a set of *n* elements? We can easily obtain an upper bound of *n* - 1 comparisons: examine each element of the set in turn and keep track of the smallest element seen so far. In the following procedure, we assume that the set resides in array *A*, where *length*[*A*] = *n*.

MINIMUM (A)

1minA[1]

2fori2tolength[A]

3do ifmin>A[i]

4thenminA[i]

5returnmin

Finding the maximum can, of course, be accomplished with *n* - 1 comparisons as well.

The general selection problem appears more difficult than the simple problem of finding a minimum, yet, surprisingly, the asymptotic running time for both problems is the same: (*n*). In this section, we present a divide-and-conquer algorithm for the selection problem. The algorithm RANDOMIZED-SELECT is modeled after the quicksort algorithm of Chapter 8. As in quicksort, the idea is to partition the input array recursively. But unlike quicksort, which recursively processes both sides of the partition, RANDOMIZED-SELECT only works on one side of the partition. This difference shows up in the analysis: whereas quicksort has an expected running time of (*n *lg *n*), the expected time of RANDOMIZED-SELECT is (*n*).

RANDOMIZED-SELECT(A, p, r, i)

1ifp=r

2then returnA[p]

3qRANDOMIZED-PARTITION(A, p, r)

4kq-p+ 1

5 ifik

6then returnRANDOMIZED-SELECT(A, p, q, i)

7else returnRANDOMIZED-SELECT(A, q+ 1, r, i-k)

The worst-case running time for RANDOMIZED-SELECT is (*n*^{2}), even to find the minimum, because we could be extremely unlucky and always partition around the largest remaining element. The algorithm works well in the average case, though, and because it is randomized, no particular input elicits the worst-case behavior.

The second line follows from the first since max(1, *n* - 1 ) = *n* - 1 and

since we can pick *c* large enough so that *c*(*n*/4 + 1/2) dominates the *O*(*n*) term.

Write an iterative version of RANDOMIZED-SELECT.

We now examine a selection algorithm whose running time is *O*(*n*) in the worst case. Like RANDOMIZED-SELECT, the algorithm SELECT finds the desired element by recursively partitioning the input array. The idea behind the algorithm, however, is to *guarantee* a good split when the array is partitioned. SELECT uses the deterministic partitioning algorithm PARTITION from quicksort (see Section 8.1), modified to take the element to partition around as an input parameter.

The SELECT algorithm determines the *i*th smallest of an input array of *n* elements by executing the following steps.

3. Use SELECT recursively to find the median *x* of the *n/5* medians found in step 2.

T(n)cn/5 +c(7n/10 + 6) +O(n)

cn/5 +c+ 7cn/10 + 6c+O(n)

9cn/10 + 7c+O(n)

cn,

As in a comparison sort (see Section 9.1), SELECT and RANDOMIZED-SELECT determine information about the relative order of elements only by comparing elements. Thus, the linear-time behavior is not a result of assumptions about the input, as was the case for the sorting algorithms in Chapter 9. Sorting requires (*n* 1g *n*) time in the comparison model, even on average (see Problem 9-1), and thus the method of sorting and indexing presented in the introduction to this chapter is asymptotically inefficient.

In the algorithm SELECT, the input elements are divided into groups of 5. Will the algorithm work in linear time if they are divided into groups of 7? How about groups of 3?

Analyze SELECT to show that the number of elements greater than the median-of-medians *x* and the number of elements less than *x* is at least *n*/4 if *n* 38.

Show how quicksort can be made to run in *O*(*n *1g *n*) time in the worst case.

The *k*th * quantiles *of an

10-1 Largest i numbers in sorted order

* a. *Sort the numbers and list the

* b. *Build a priority queue from the numbers and call EXTRACT-MAX

For *n* distinct elements *x*_{1}, *x*_{2}, . . . , *x _{n}* with positive weights

* b. *Show how to compute the weighted median of

The* post-office location problem*

* b. *Show that

* c. *Show that if

* d. *Show that if