The model of computation provided by an ordinary computer assumes that the basic arithmetic operations--addition, subtraction, multiplication, and division--can be performed in constant time. This abstraction is reasonable, since most basic operations on a random-access machine have similar costs. When it comes to designing the circuitry that implements these operations, however, we soon discover that performance depends on the magnitudes of the numbers being operated on. For example, we all learned in grade school how to add two natural numbers, expressed as *n*-digit decimal numbers, in (*n*) steps (although teachers usually do not emphasize the number of steps required).

Like the comparison networks of Chapter 28, combinational circuits operate in **parallel****:** many elements can compute values simultaneously as a single step. In this section, we define combinational circuits and investigate how larger combinational circuits can be built up from elementary gates.

Arithmetic circuits in real computers are built from combinational elements that are interconnected by wires. A * combinational element* is any circuit element that has a constant number of inputs and outputs and that performs a well-defined function. Some of the elements we shall deal with in this chapter are

A boolean combinational element that computes a simple boolean function is called a * logic gate*. Figure 29.1 shows the four basic logic gates that will serve as combinational elements in this chapter: the

The operation of each gate, and of any boolean combinational element, can be described by a * truth table*, shown under each gate in Figure 29.1. A truth table gives the outputs of the combinational element for each possible setting of the inputs. For example, the truth table for the XOR gate tells us that when the inputs are

Combinational elements in real circuits do not operate instantaneously. Once the input values entering a combinational element * settle*, or become

A * combinational circuit* consists of one or more combinational elements interconnected in an acyclic fashion. The interconnections are called

As an example, Figure 29.2 shows a combinational circuit, called a * full adder*, that takes as input three bits

x y z c s

-------------

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

Output *s* is the * parity* of the input bits,

s = parity(x,y,z) = x y z,

and output *c* is the * majority* of the input bits,

c= majority(x,y,z) = (xy) (yz) (xz).

^{1}For clarity, we omit the commas between sequence elements when they are bits.

Each of the inputs *x, y*, and *z* to the full adder has a fan-out of 3. When the operation performed by a combinational element is commutative and associative with respect to its inputs (such as the functions AND, OR, and XOR), we call the number of inputs the * fan-in* of the element. Although the fan-in of each gate in Figure 29.2 is 2, we could redraw the full adder to replace XOR gates

As in the case of the comparison networks discussed in Chapter 28, we measure the propagation delay of a combinational circuit in terms of the largest number of combinational elements on any path from the inputs to the outputs. Specifically, we define the * depth* of a circuit, which corresponds to its worst-case "running time," inductively in terms of the depths of its constituent wires. The depth of an input wire is 0. If a combinational element has inputs

Besides circuit depth, there is another resource that we typically wish to minimize when designing circuits. The **size*** *of a combinational circuit is the number of combinational elements it contains. Intuitively, circuit size corresponds to the memory space used by an algorithm. The full adder of Figure 29.2 has size 7, for example, since it uses 7 gates.

In Figure 29.2, change input *y* to a 1. Show the resulting value carried on each wire.

Show how to construct an *n*-input parity circuit with *n* - 1 XOR gates and depth lg *n*.

Show that any boolean function can be constructed entirely out of NAND gates.

We now investigate the problem of adding numbers represented in binary. We present three combinational circuits for this problem. First, we look at ripple-carry addition, which can add two *n*-bit numbers in (*n*) time using a circuit with (*n*) size. This time bound can be improved to *O*(lg *n*) using a carry-lookahead adder, which also has (*n*) size. Finally, we present carry-save addition, which in *O*(1) time can reduce the sum of 3 *n*-bit numbers to the sum of an *n*-bit number and an (*n* + 1)-bit number. The circuit has (*n*) size.

8 7 6 5 4 3 2 1 0i

1 1 0 1 1 1 0 0 0 =c

0 1 0 1 1 1 1 0 =a

1 1 0 1 0 1 0 1 =b

-------------------------------

1 0 0 1 1 0 0 1 1 =s

We start with the ordinary method of summing binary numbers. We assume that a nonnegative integer *a* is represented in binary by a sequence of *n* bits *a _{n-}*1

Given two *n*-bit numbers *a* = *a _{n}*-1

An *n*-bit * ripple-carry adder* is formed by cascading

Ripple-carry addition requires (*n*) time because of the rippling of carry bits through the circuit. Carry-lookahead addition avoids this (*n*)-time delay by accelerating the computation of carries using a treelike circuit. A carry-lookahead adder can sum two *n*-bit numbers in *O*(lg* n*) time.

As an example, let *a _{i-}*1

If *a _{i-}*1

Figure 29.5 summarizes these relationships in terms of * carry statuses*, where k is "carry kill," g is "carry generate," and p is "carry propagate."

Consider two consecutive full adders *FA _{i-}*1 and

ab_{i-1}c_{i-1 }carry status_{i }

-----------------------------

0 0 0 k

0 1 c_{i-1}p

1 0 c_{i-1}p

1 1 1 g

FA_{i}

k p g

--------------------

k k k g

FA_{i-1 }p k p g

g k g g

The carry-out *c _{i}* of a given full adder

y=_{i }y1_{i-}x_{i}

= x_{0 }x_{1}x_{i}

1. *y _{i} =* k implies

2. *y _{i} = *g implies

By using a prefix circuit that operates in parallel, as opposed to a ripple-carry circuit that produces its outputs one by one, we can compute all *n *carry statuses *y*_{0}*, y*_{1}*, . . . , y _{n}* more quickly. Specifically, we shall design a parallel prefix circuit with

[i,j] =x_{i}x+1_{i}x._{j}

[i,k] = [i,j- 1] [j,k],

Now that we have a parallel prefix circuit, we can complete the description of the carry-lookahead adder. Figure 29.10 shows the construction. An *n*-bit * carry-lookahead adder* consists of

A carry-lookahead adder can add two *n*-bit numbers in *O*(lg* n*) time. Perhaps surprisingly, adding three *n*-bit numbers takes only a constant additional amount of time. The trick is to reduce the problem of adding three numbers to the problem of adding just two numbers.

u+v=x+y+z.

As shown in Figure 29.11 (a), it does this by computing

u= parity (_{i}x,_{i}y, z_{i}),_{i}

v1 = majority (_{i + }x,_{i}y,_{i}z),_{i}

for* i* = 0, 1, . . . , *n* - 1. Bit *v*_{0} always equals 0.

Prove that the carry-status operator given by Figure 29.5 is associative.

Show by example how to construct an *O*(lg* n*)-time parallel prefix circuit for values of *n* that are not exact powers of 2 by drawing a parallel prefix circuit for *n* = 11. Characterize the performance of parallel prefix circuits built in the shape of arbitrary binary trees.

Label each wire in the parallel prefix circuit of Figure 29.9(a) with its depth. A **critical path*** *in a circuit is a path with the largest number of combinational elements on any path from inputs to outputs. Identify the critical path in Figure 29.9(a), and show that its length is *O*(lg* n*). Show that some node has elements that operate (lg *n*) time apart. Is there a node whose elements operate simultaneously?

A **tally circuit**** **has *n* binary inputs and *m* = lg(*n *+ 1) outputs. Interpreted as a binary number, the outputs give the number of 1's in the inputs. For example, if the input is 10011110, the output is 101, indicating that there are five 1's in the input. Describe an *O*(lg *n*)-depth tally circuit having (*n*) size.

Suppose that two random *n*-bit numbers are added with a ripple-carry adder, where each bit is independently 0 or 1 with equal probability. Show that with probability at least 1 - 1/*n*, no carry propagates farther than *O*(1g *n*) consecutive stages. In other words, although the depth of the ripple-carry adder is (*n*), for two random numbers, the outputs almost always settle within *O*(lg *n*) time.

The "grade-school" multiplication algorithm in Figure 29.13 can compute the 2*n*-bit product *p* = *p*_{2n-1}, *p*_{2n-2}, . . . . , *p*_{0} of two *n*-bit numbers *a* = *a _{n}*-1,

Each term *m*^{(i) }is called a * partial product*. There are

1 1 1 0 =a

1 1 0 1 =b

-------------------------------

1 1 1 0 =m^{(0)}

0 0 0 0 =m^{(1)}

1 1 1 0 =m^{(2)}

1 1 1 0 =m^{(3)}

-------------------------------

1 0 1 1 0 1 1 0 =p

An array multiplier consists conceptually of three parts. The first part forms the partial products. The second sums the partial products using carry-save adders. Finally, the third sums the two numbers resulting from the carry-save additions using either a ripple-carry or carry-lookahead adder.

0 0 0 0 = 0

1 1 1 0 =m^{(0)}

0 0 0 0 =m^{(1)}

-------------------------------

0 1 1 1 0 =u^{(1)}

0 0 0 =v^{(1)}

1 1 1 0 =m^{(2)}

-------------------------------

1 1 0 1 1 0 =u^{(2)}

0 1 0 =v^{(2)}

1 1 1 0 =m^{(3)}

-------------------------------

1 0 1 0 1 1 0 =u^{(3)}

1 1 0 =v^{(3)}

-------------------------------

1 0 1 1 0 1 1 0 =p

A * Wallace tree* is a circuit that reduces the problem of summing

Suppose that a carry-save adder takes inputs *x*, *y*, and *z* and produces outputs *s* and *c*, with *n _{x}*,

A * cyclic shifter*, or

This section investigates clocked circuits for performing addition and multiplication. We begin with a (1)-size clocked circuit, called a bit-serial adder, that can add two *n*-bit numbers in (*n*) time. We then investigate linear-array multipliers. We present a linear-array multiplier with (*n*) size that can multiply two *n*-bit numbers in (*n*) time.

We introduce the notion of a clocked circuit by returning to the problem of adding two *n*-bit numbers. Figure 29.17 shows how we can use a single full adder to produce the (*n* + 1)-bit sum *s* = *s _{n}*,

As Figure 29.17 shows, the solution is to use a * clocked circuit*, or

Each register in a clocked circuit is controlled by a periodic signal, or * clock*. Whenever the clock pulses, or

To determine the total time *t* taken by a globally clocked circuit, we need to know the number *p* of clock periods and the duration *d* of each clock period: *t = pd*. The clock period *d* must be long enough for all combinational circuitry to settle between ticks. Although for some inputs it may settle earlier, if the circuit is to work correctly for all inputs, *d* must be at least proportional to the depth of the combinational circuitry.

The size of the bit-serial adder (number of combinational elements plus number of registers) is (1).

Observe that a ripple-carry adder is like a replicated bit-serial adder with the registers replaced by direct connections between combinational elements. That is, the ripple-carry adder corresponds to a spatial "unrolling" of the computation of the bit-serial adder. The *i*th full adder in the ripple-carry adder implements the *i*th clock period of the bit-serial adder.

The combinational multipliers of Section 29.3 need (*n*^{2}) size to multiply two *n*-bit numbers. We now present two multipliers that are linear, rather than two-dimensional, arrays of circuit elements. Like the array multiplier, the faster of these two linear-array multipliers runs in (*n*) time.

The linear-array multipliers implement the * Russian peasant's algorithm* (so called because Westerners visiting Russia in the nineteenth century found the algorithm widely used there), illustrated in Figure 29.18(a). Given two input numbers

a^{(j) }^{ }b^{(j)}+ p^{(j)}=ab

2. Shift *a* right by one bit position:

3. Shift *b* left by one bit position:

a^{(j)}b^{(j)}+u^{(j)}+v^{(j)}=ab

Prove the invariants (29.6) and (29.7) for the linear-array multipliers.

Prove that in the fast linear-array multiplier, *v*^{(2n-1)} = 0.

Redo Exercise 29.4-5 using only *O*(1g *n*) "maximum" elements.

We can construct a division circuit from subtraction and multiplication circuits with a technique called * Newton iteration*. We shall focus on the related problem of computing a reciprocal, since we can obtain a division circuit by making one additional multiplication.

* a.* Suppose that

29-2 Boolean formulas for symmetric functions

A *n*-input function (*x*_{1}, *x*_{2}, . . . , *x _{n}*) is

f(x1,x2, . . ., xn) = (x(1),x(2), . . . ,x(n) )

**a****.** We start by considering a simple symmetric function. The generalized **majority function**** **on *n* boolean inputs is defined by

Describe an *O*(lg *n*)-depth combinational circuit for majority* _{n}*. (