Section 6.1 reviews elementary results in counting theory, including standard formulas for counting permutations and combinations. The axioms of probability and basic facts concerning probability distributions are presented in Section 6.2. Random variables are introduced in Section 6.3, along with the properties of expectation and variance. Section 6.4 investigates the geometric and binomial distributions that arise from studying Bernoulli trials. The study of the binomial distribution is continued in Section 6.5, an advanced discussion of the "tails" of the distribution. Finally, Section 6.6 illustrates probabilistic analysis via three examples: the birthday paradox, tossing balls randomly into bins, and winning streaks.

The * rule of sum* says that the number of ways to choose an element from one of two

The* rule of product* says that the number of ways to choose an ordered pair is the number of ways to choose the first element times the number of ways to choose the second element. That is, if

A * string* over a finite set

000, 001, 010, 011, 100, 101, 110, 111 .

We sometimes call a string of length *k* a **k-string****.** A **substring***s*' of a string *s* is an ordered sequence of consecutive elements of *s*. A **k-substring*** *of a string is a substring of length *k*. For example, 010 is a 3-substring of 01101001 (the 3-substring that begins in position 4), but 111 is not a substring of 01101001.

A * permutation* of a finite set

abc, acb, bac, bca, cab, cba .

A * k-permutation* of

ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc.

The number of *k*-permutations of an *n*-set is

A * k-combination* of an

ab, ac, ad, bc, bd, cd .

We use the notation (read "*n* choose *k*") to denote the number of *k*-combinations of an *n*-set. From equation (6.2), we have

This formula is symmetric in *k* and *n* - *k*:

These numbers are also known as * binomial coefficients*, due to their appearance in the

A special case of the binomial expansion occurs when *x* = *y* = 1:

We sometimes need to bound the size of a binomial coefficient. For 1 *k* *n*, we have the lower bound

For all 0 *k* *n*, we can use induction (see Exercise 6.1-12) to prove the bound

where for convenience we assume that 0^{0} = 1. For *k* = *n*, where 0 1, this bound can be rewritten as

H() = -lg - (1 - ) lg (1 - )

is the **(*** binary) entropy function* and where, for convenience, we assume that 0 lg 0 = 0, so that

An *n*-input, *m*-output **boolean**** ****function**** **is a function from {TRUE, FALSE}* ^{n}* to {TRUE, FALSE}

Using the result of Exercise 6.1-7, make a table for *n* = 0, 1, . . . , 6 and 0 *k* *n* of the binomial coefficients with at the top, and on the next line, and so forth. Such a table of binomial coefficients is called **Pascal's triangle.**

Show that for any *n* 0 and 0 *k* *n*, the maximum value of is achieved when *k* = *n*/2 or *k* = *n*/2.

Argue that for any *n* 0, *j* 0, *k* 0, and *j* + *k* *n*,

**image 104_g.gif not available**

Use induction on *k* *n*/2 to prove inequality (6.10), and use equation (6.4) to extend it to all *k* *n*.

Use Stirling's approximation to prove that

**image 104_h.gif not available**

Probability is an essential tool for the design and analysis of probabilistic and randomized algorithms. This section reviews basic probability theory.

We define probability in terms of a **sample space*** S,* which is a set whose elements are called

S= {HH, HT, TH, TT} .

An* event* is a subset

A * probability distribution* Pr{} on a sample space

Several results follow immediately from these axioms and basic set theory (see Section 5.1). The null event has probability . If *A* *B*, then Pr{*A*} Pr{*B*}. Using to denote the event *S* - *A* (the* complement* of

PR{AB} = Pr{A} + Pr{B} - Pr{AB}

Pr{A} + Pr{B} .

Pr{HH, HT, TH} = Pr{HH} + Pr{HT} + Pr{TH}

= 3/4.

A probability distribution is * discrete* if it is defined over a finite or countably infinite sample space. Let

Pr{s} = 1/S,

then we have the **uniform probability distribution**** **on *S*. In such a case the experiment is often described as "picking an element of *S* at random."

As an example, consider the process of flipping a** ****fair coin****, **one for which the probability of obtaining a head is the same as the probability of obtaining a tail, that is, 1/2. If we flip the coin n times,we have the uniform probability distribution defined on the sample space *S* = {H, T}* ^{n}*, a set of size 2

A= {exactlykheads and exactlyn-ktails occur}

For any closed interval [*c,d*], where *a * c * d * b*, the* **continuous uniform probability distribution*** defines the probability of the event [*c, d*] to be*

Sometimes we have some prior partial knowledge about the outcome of an experiment. For example, suppose that a friend has flipped two fair coins and has told you that at least one of the coins showed a head. What is the probability that both coins are heads? The information given eliminates the possibility of two tails. The three remaining elementary events are equally likely, so we infer that each occurs with probability 1/3. Since only one of these elementary events shows two heads, the answer to our question is 1/3.

Pr{AB} = Pr{A}Pr{B},

which is equivalent, if Pr{*B*} 0, to the condition

Pr{AB} = Pr{A}.

A collection *A*_{1,}* A*_{2},* *. . . ,* A _{n}* of events is said to be

Pr{A_{i}A} = Pr{_{j}A} Pr{_{i}A}_{j}

for all 1 *i* < *j* *n*. We say that they are **(mutually) independent*** *if every *k*-subset *A _{i}*1,

Pr{A1_{i}A2_{i}A} = Pr{_{ik}A1}Pr{_{i}A2} Pr{_{i}Ak}._{i}

Pr{A_{1}} = 1/2 ,

Pr{A_{2}} = 1/2 ,

Pr{A_{3}} = 1/2 ,

Pr{A_{1 }A_{2}} = 1/4 ,

Pr{A_{1}A_{3}} = 1/4 ,

Pr{A_{2}A_{3}} = 1/4 ,

Pr{A_{1}A_{2}A_{3}} = 0.

From the definition of conditional probability (6.19), it follows that for two events *A* and *B*, each with nonzero probability,

Pr{AB} = Pr{B}Pr{A|B}

= Pr{A}Pr{B|A}.

Solving for Pr{*A* |*B*}, we obtain

Substituting into equation (6.21), we obtain an equivalent form of Bayes's theorem:

Prove * Boole's inequality:* For any finite or countably infinite sequence of events

Pr{A_{1 }A_{2 }}Pr{A_{1}} + Pr{A_{2}}+ .

Prove that for any collection of events *A*_{1}, *A*_{2}, . . . , *A _{n}*,

Pr{A_{1}A_{2}A} = Pr{_{n}A_{1}} Pr{A_{2 }|A_{1}} Pr{A_{3 }|A_{1}A_{2}}

Pr{A|_{n}A_{1}A_{2}A-1}._{n}

Two events *A* and *B* are * conditionally independent*, given

Pr{AB|C} = Pr{A|C} Pr{B|C} .

A** ****(****discrete***)* **random variable** *X* is a function from a finite or countably infinite sample space *S* to the real numbers. It associates a real number with each possible outcome of an experiment, which allows us to work with the probability distribution induced on the resulting set of numbers. Random variables can also be defined for uncountably infinite sample spaces, but they raise technical issues that are unnecessary to address for our purposes. Henceforth, we shall assume that random variables are discrete.

For a random variable *X* and a real number *x*, we define the event *X* = *x *to be {*s* *S* : *X* (*s*) = *x*}; thus,

f(x) = Pr{X=x}

It is common for several random variables to be defined on the same sample space. If *X* and *Y* are random variables, the function

f(x,y) = Pr{X=xandY=y}

is the * joint probability density function* of

and similarly, for a fixed value *x*,

Using the definition (6.19) of conditional probability, we have

We define two random variables *X* and *Y* to be * independent* if for all

The simplest and most useful summary of the distribution of a random variable is the "average" of the values it takes on. The **expected****value** (or, synonymously, * expectation* or

E[X] = 6 Pr{2 H'S} + 1 Pr{1 H, 1 T} - 4 Pr{2 T'S}

= 6(1/4)+ 1(1/2) - 4(1/4)

= 1 .

The expectation of the sum of two random variables is the sum of their expectations, that is,

E[X+Y] = E[X] + E[Y] ,

Letting *g*(*x*) = *ax*, we have for any constant *a*,

E[aX] =aE[X] .

Consequently, expectations are linear: for any two random variables *X *and *Y* and any constant *a*,

E[aX+Y] =aE[X] + E[Y] .

When two random variables *X* and *Y* are independent and each has a defined expectation,

In general, when *n* random variables *X*_{1}, *X*_{2}, . . . , *X _{n}* are mutually independent,

E[X_{1}X_{2 }X]_{n}= E[X_{1}]E[X_{2}] E[X] ._{n}

The * variance* of a random variable

Var[X] = E[(X- E[X])^{2}]

= E[X^{2}- 2XE[X] + E^{2}[X]]

= E[X^{2}] - 2E[XE[X]] + E^{2}[X]

= E[X^{2}] - 2E^{2}[X]+ E^{2}[X]

= E[X^{2}] - E^{2}[X].

E[X^{2}] = Var[X] + E^{2}[X].

The variance of a random variable *X* and the variance of *aX* are related:

Var[aX] =a^{2}Var[X].

When *X* and *Y* are independent random variables,

Var[X+Y] = Var[X] + Var[Y].

In general, if *n* random variables *X*_{1}, *X*_{2}, . . . , *X _{n}* are pairwise independent, then

The * standard deviation* of a random variable

Let *X* be a nonnegative random variable, and suppose that E (*X*) is well defined. Prove **Markov's inequality:**

Pr{Xt} E[X]/t

Pr{Xt} Pr{X't} .

Prove that Var[*aX*] = *a*^{2}Var[*x*] from the definition (6.29) of variance.

A coin flip is an instance of a * Bernoulli trial*, which is defined as an experiment with only two possible outcomes:

Suppose we have a sequence of Bernoulli trials, each with a probability *p* of success and a probability *q* = 1 - *p* of failure. How many trials occur before we obtain a success? Let the random variable *X* be the number of trials needed to obtain a success. Then *X* has values in the range {1, 2, . . .}, and for *k* 1,

Pr{X=k} =q-1^{k}p ,

Assuming *p* < 1, the expectation of a geometric distribution can be calculated using identity (3.6):

Thus, on average, it takes 1/*p* trials before we obtain a success, an intuitive result. The variance, which can be calculated similarly, is

Var[X] =q/p^{2 }.

How many successes occur during *n* Bernoulli trials, where a success occurs with probability *p* and a failure with probability *q* = 1 - *p*? Define the random variable *X* to be the number of successes in *n* trials. Then *X* has values in the range {0, 1, . . . , *n*}, and for *k* = 0, . . . , *n*,

as is required by axiom 2 of the probability axioms.

The same approach can be used to calculate the variance of the distribution. Using equation (6.29), we have . Since *X _{i} *only takes on the values 0 and 1, we have , and hence

Var[X] =_{i}p-p^{2}=pq.

The following lemma provides an upper bound on the binomial distribution.

Let *n* 0, let 0 < *p* < 1, let *q* = 1 - *p*, and let 0 *k* *n*. Then

* Proof* Using equation (6.10), we have

Verify axiom 2 of the probability axioms for the geometric distribution.

How many times on average must we flip 6 fair coins before we obtain 3 heads and 3 tails?

Show that *b*(*k*; *n*, *p*) = *b*(*n* - *k*; *n*, *q*), where *q* = 1 - *p*.

Show that value of the maximum of the binomial distribution *b(k; n, p*) is approximately where *q *=* 1 *-* p*.

b(k;n, 1/2) 2,^{nH(k/n)-n}

where *H*(*x*) is the entropy function (6.13).

Pr{X' k} Pr{Xk} .

The probability of having at least, or at most, *k* successes in *n* Bernoulli trials, each with probability *p* of success, is often of more interest than the probability of having exactly *k* successes. In this section, we investigate the * tails* of the binomial distribution: the two regions of the distribution

* Proof *We make use of the inequality (6.15)

b(i- 1;n,p) <xb(i;n,p)

for 0 < *i ** k*. Iterating, we obtain

b(i;n,p) < x-^{k}ib(k;n,p)

A bound on the right tail can be determined similarly.

* Proof *Since for any > 0, the function

Pr{X-r} = Pr{e^{}(X-^{)}e^{}r^{} ,}

where* * will be determined later. Using Markov's inequality (6.32), we obtain

Pr{X-r} E[e^{}(X-^{)}]e^{-}r^{.}

Substituting for *X* - , we obtain

E [e(X-_{i}p)] =_{i}e(1-p)_{i}p+_{i}e(0-p)_{i}q_{i}

=pq_{i}e_{i}+q_{i}e-p_{i}

p+ 1_{i}e

ex(p) ,_{i}e

since . Hence, from inequality (6.43), it follows that

Pr{X-r} exp(e^{}^{ - r) .}

Choosing = 1n(*r/*) (see Exercise 6.5-6), we obtain

* Proof *For a binomial distribution, equation (6.39) implies that = E [

for all *a* > 0 and all *k* such that 0 < *k* < *n*.

Prove that if 0 < *k* < *np*, where 0 < *p* < 1 and *q* = 1 - *p*, then

Show that the conditions of Theorem 6.6 imply that

Similarly, show that the conditions of Corollary 6.7 imply that

Pr{X- ]r}e-2/2^{r}n.

Show that choosing = 1n(*r*/) minimizes the right-hand side of inequality (6.45).

This section uses three examples to illustrate probabilistic analysis. The first determines the probability that in a room of *k* people, some pair shares the same birthday. The second example examines the random tossing of balls into bins. The third investigates "streaks" of consecutive heads in coin flipping.

A good example to illustrate probabilistic reasoning is the classical * birthday paradox*. How many people must there be in a room before there is a good chance two of them were born on the same day of the year? The answer is surprisingly few. The paradox is that it is in fact far fewer than the number of days in the year, as we shall see.

Pr{b=_{i}randb=_{j}r} = Pr{b=_{i}r}Pr {b=_{j}r}

= 1/n^{2 }.

Thus, the probability that they both fall on the same day is

A= {_{i}b_{i}_{+1}b:_{j}j= 1, 2 . . . ,i}.

Since we can write *B _{k}* =

Pr{B} = Pr{_{k}B- 1}Pr{_{k}A- 1_{k}B- 1} ,_{k}

The inequality (2.7), 1 + *x* *e ^{x}*, gives us

We can use the linearity of expectation (equation (6.26)) to provide a simpler but approximate analysis of the birthday paradox. For each pair (*i, j*) of the *k* people in the room, let us define the random variable *X _{ij}*, for 1

E[X] = 1 (1/_{ij}n) + 0 (1 - 1/n)

= 1/n.

Consider the process of randomly tossing identical balls into *b* bins, numbered 1, 2, . . . , *b*. The tosses are independent, and on each toss the ball is equally likely to end up in any bin. The probability that a tossed ball lands in any given bin is 1/*b*. Thus, the ball-tossing process is a sequence of Bernoulli trials with a probability 1/*b* of success, where success means that the ball falls in the given bin. A variety of interesting questions can be asked about the ball-tossing process.

*How many balls fall in a given bin?* The number of balls that fall in a given bin follows the binomial distribution *b(k; n*, 1/*b*). If *n* balls are tossed, the expected number of balls that fall in the given bin is *n*/*b*.

*How many balls must one toss, on the average, until a given bin contains a ball?* The number of tosses until the given bin receives a ball follows the geometric distribution with probability 1/*b*, and thus the expected number of tosses until success is 1/(1/*b*) = *b*.

Suppose you flip a fair coin *n* times. What is the longest streak of consecutive heads that you expect to see? The answer is (lg *n*), as the following analysis shows.

We first prove that the expected length of the longest streak of heads is *O*(lg *n*). Let *A _{ik}* be the event that a streak of heads of length at least

Pr {A} = 1/2_{ik}^{k }.

Fork= 2 lgn,

Pr{A,21g_{i}n_{}} = 1/2^{2}1gn^{}

1/2^{2 1g n}

= 1/n^{2},

(2 lgn)(1 - 1/n) +n(1/n) =O(lgn).

Pr{Ai,r1gn} = 1/2lg^{r}n^{}

1/n.^{r}

(1gn/2)(1 - 1/n) + 0(1/n) = (1gn).

Suppose that balls are tossed into *b* bins. Each toss is independent, and each ball is equally likely to end up in any bin. What is the expected number of ball tosses before at least one of the bins contains two balls?

For the analysis of the birthday paradox, is it important that the birthdays be mutually independent, or is pairwise independence sufficient? Justify your answer.

Sharpen the lower bound on streak length by showing that in *n* flips of a fair coin, the probability is less than 1/*n* that no streak longer than 1g *n* - 2 1g 1g *n* consecutive heads occurs.

In this problem, we investigate the effect of various assumptions on the number of ways of placing *n* balls into *b* distinct bins.

The following program determines the maximum value in an unordered array *A*[1 . . *n*].

1max-

2fori1ton

3doCompareA[i] tomax.

4ifA[i] >max

5thenmaxA[i]

* b.* When line 5 of the program is executed, what is the relationship between

* c.* For each

Professor Dixon needs to hire a new research assistant. She has arranged interviews with *n* applicants and would like to base her decision solely on their qualifications. Unfortunately, university regulations require that after each interview she immediately reject or offer the position to the applicant.

With a *t*-bit counter, we can ordinarily only count up to 2* ^{t}* - 1. With R. Morris's