# CHAPTER 31: MATRIX OPERATIONS

After Section 31.1 introduces basic matrix concepts and notations, Section 31.2 presents Strassen's surprising algorithm for multiplying two n n matrices in (nlg 7) = O(n2.81) time. Section 31.3 defines quasirings, rings, and fields, clarifying the assumptions required to make Strassen's algorithm work. It also contains an asymptotically fast algorithm for multiplying boolean matrices. Section 31.4 shows how to solve a set of linear equations using LUP decompositions. Then, Section 31.5 explores the close relationship between the problem of multiplying matrices and the problem of inverting a matrix. Finally, Section 31.6 discusses the important class of symmetric positive-definite matrices and shows how they can be used to find a least-squares solution to an overdetermined set of linear equations.

# 31.1 Properties of matrices

In this section, we review some basic concepts of matrix theory and some fundamental properties of matrices, focusing on those that will be needed in later sections.

## Matrices and vectors

A matrix is a rectangular array of numbers. For example,

#### (31.1)

is a 2 3 matrix A = (aij), where for i = 1, 2 and j = 1, 2, 3, the element of the matrix in row i and column j is aij. We use uppercase letters to denote matrices and corresponding subscripted lowercase letters to denote their elements. The set of all m n matrices with real-valued entries is denoted Rmn. In general, the set of m n matrices with entries drawn from a set S is denoted Smn.

#### (31.2)

is a vector of size 3. We use lowercase letters to denote vectors, and we denote the ith element of a size-n vector x by xi, for i = 1, 2, . . . , n. We take the standard form of a vector to be as a column vector equivalent to an n 1 matrix; the corresponding row vector is obtained by taking the transpose:

`xT = ( 2 3 5 ) .`

`In = diag(1,1,...,1)`

When I appears without a subscript, its size can be derived from context. The ith column of an identity matrix is the unit vector ei.

Such a matrix is called a permutation matrix because multiplying a vector x by a permutation matrix has the effect of permuting (rearranging) the elements of x.

is a symmetric matrix.

## Operations on matrices

`cij = aij + bij`

for i = 1, 2, . . . , m and j = 1, 2, . . . , n. That is, matrix addition is performed componentwise. A zero matrix is the identity for matrix addition:

`A + 0 = A`

`= 0 + A .`

`A + (-A) = 0`

`= (-A) + A .`

Given this definition, we can define matrix subtraction as the addition of the negative of a matrix: A - B = A + (-B).

#### (31.3)

for i = 1, 2, . . . , m and k = 1, 2, . . . , p. The procedure MATRIX-MULTIPLY in Section 26.1 implements matrix multiplication in the straightforward manner based on equation (31.3), assuming that the matrices are square: m = n = p. To multiply n n matrices, MATRIX-MULTIPLY performs n3 multiplications and n2(n - 1) additions, and its running time is (n3).

Matrices have many (but not all) of the algebraic properties typical of numbers. Identity matrices are identities for matrix multiplication:

`ImA = AIn = A`

for any m n matrix A. Multiplying by a zero matrix gives a zero matrix:

`A 0 = 0 .`

Matrix multiplication is associative:

`A(BC) = (AB)C`

#### (31.4)

for compatible matrices A, B, and C. Matrix multiplication distributes over addition:

`A(B + C) = AB + AC ,`

`(B + C)D = BD + CD .`

#### (31.5)

Multiplication of n n matrices is not commutative, however, unless n = 1. For example, if and , then

and

is a number (actually a 1 1 matrix) called the inner product of x and y. The matrix xyT is an n n matrix Z called the outer product of x and y, with zij = xiyj. The (euclidean) norm ||x|| of a vector x of size n is defined by

Thus, the norm of x is its length in n-dimensional euclidean space.

## Matrix inverses, ranks, and determinants

`(BA)-1 = A-1B-1.`

#### (31.6)

The inverse operation commutes with the transpose operation:

`(A-1)T = (AT)-1 .`

`A = BC .`

A square matrix has full rank if and only if it is nonsingular.

An m n matrix has full column rank if its rank is n.

A matrix A has full column rank if and only if it does not have a null vector.

A square matrix A is singular if and only if it has a null vector.

#### (31.7)

The following theorems, whose proofs are omitted here, express fundamental properties of the determinant.

The determinant of a square matrix A has the following properties:

If any row or any column of A is zero, then det (A) = 0.

The determinant of A is multiplied by if the entries of any one row (or any one column) of A are all multiplied by .

The determinant of A is unchanged if the entries in one row (respectively, column) are added to those in another row (respectively, column).

The determinant of A equals the determinant of AT.

The determinant of A is multiplied by - 1 if any two rows (respectively, columns) are exchanged.

Also, for any square matrices A and B, we have det (AB) = det(A) det(B).

An n n matrix A is singular if and only if det(A) = 0.

## Positive-definite matrices

As we shall see, matrices that arise in applications are often positive-definite due to the following theorem.

For any matrix A with full column rank, the matrix ATA is positive-definite.

Proof We must show that xT (ATA)x > 0 for any nonzero vector x. For any vector x,

`xT(AT A)x = (Ax)T(Ax)   (by Exercise 31.1-3)`

`= ||Ax||2`

` 0 .`

#### (31.8)

Note that ||Ax||2 is just the sum of the squares of the elements of the vector Ax. Therefore, if ||Ax||2 = 0, every element of Ax is 0, which is to say Ax = 0. Since A has full column rank, Ax = 0 implies x = 0, by Theorem 31.2. Hence, AT A is positive-definite.

Other properties of positive-definite matrices will be explored in Section 31.6.

## Exercises

Prove that if B and C are inverses of A, then B = C.

Let A and B be n n matrices such that AB = I. Prove that if A' is obtained from A by adding row j into row i, then the inverse B' of A' can be obtained by subtracting column i from column j of B.

Let A be a nonsingular n n matrix with complex entries. Show that every entry of A-1 is real if and only if every entry of A is real.

Show that if A is a nonsingular symmetric matrix, then A-1 is symmetric. Show that if B is an arbitrary (compatible) matrix, then B ABT is symmetric.

Show that a matrix A has full column rank if and only if Ax = 0 implies x = 0. (Hint: Express the linear dependence of one column on the others as a matrix-vector equation.)

Prove that for any two compatible matrices A and B,

rank(AB) min(rank(A), rank(B)) ,

where equality holds if either A or B is a nonsingular square matrix. (Hint: Use the alternate definition of the rank of a matrix.)

is

(Hint: Multiply column i by -x0 and add it to column i + 1 for i = n - 1, n - 2, . . . , 1, and then use induction.)

# 31.2 Strassen's algorithm for matrix multiplication

## An overview of the algorithm

#### (31.9)

(Exercise 31.2-2 deals with the situation in which n is not an exact power of 2.) For convenience, the submatrices of A are labeled alphabetically from left to right, whereas those of B are labeled from top to bottom, in agreement with the way matrix multiplication is performed. Equation (31.9) corresponds to the four equations

`r = ae + bf ,`

#### (31.10)

`s = ag + bh ,`

#### (31.11)

`t = ce + df ,`

#### (31.12)

`u = cg + dh .`

#### (31.13)

Each of these four equations specifies two multiplications of n/2 n/2 matrices and the addition of their n/2 n/2 products. Using these equations to define a straightforward divide-and-conquer strategy, we derive the following recurrence for the time T(n) to multiply two n n matrices:

`T(n) = 8T(n/2) + (n2).`

#### (31.14)

Unfortunately, recurrence (31.14) has the solution T(n) = (n3), and thus this method is no faster than the ordinary one.

Strassen discovered a different recursive approach that requires only 7 recursive multiplications of n/2 n/2 matrices and (n2) scalar additions and subtractions, yielding the recurrence

`T(n) = 7T(n/2) + (n2)`

`= (nlg 7)`

`= O(n2.81) .`

#### (31.15)

Strassen's method has four steps:

1. Divide the input matrices A and B into n/2 n/2 submatrices, as in equation (31.9).

2. Using (n2) scalar additions and subtractions, compute 14 n/2 n/2 matrices A1, B1, A2, B2, . . . , A7, B7.

3. Recursively compute the seven matrix products Pi = AiBi for i = 1, 2, . . . , 7.

4. Compute the desired submatrices r, s, t, u of the result matrix C by adding and/or subtracting various combinations of the Pi matrices, using only (n2) scalar additions and subtractions.

Such a procedure satisfies the recurrence (31.15). All that we have to do now is fill in the missing details.

## Determining the submatrix products

It is not clear exactly how Strassen discovered the submatrix products that are the key to making his algorithm work. Here, we reconstruct one plausible discovery method.

Let us guess that each matrix product Pi can be written in the form

`Pi  =  AiBi`

`=  (i1a + i2b + i3c + i4d)  (i1e + i2f + i3g + i4h) ,`

#### (31.16)

where the coefficients ij, ij are all drawn from the set {-1, 0, 1}. That is, we guess that each product is computed by adding or subtracting some of the submatrices of A, adding or subtracting some of the submatrices of B, and then multiplying the two results together. While more general strategies are possible, this simple one turns out to work.

If we form all of our products in this manner, then we can use this method recursively without assuming commutativity of multiplication, since each product has all of the A submatrices on the left and all of the B submatrices on the right. This property is essential for the recursive application of this method, since matrix multiplication is not commutative.

For convenience, we shall use 4 4 matrices to represent linear combinations of products of submatrices, where each product combines one submatrix of A with one submatrix of B as in equation (31.16). For example, we can rewrite equation (31.10) as

The last expression uses an abbreviated notation in which "+" represents +1, " represents 0, and "-" represents -1. (From here on, we omit the row and column labels.) Using this notation, we have the following equations for the other submatrices of the result matrix C:

We begin our search for a faster matrix-multiplication algorithm by observing that the submatrix s can be computed as s = P1 + P2, where P1 and P2 are computed using one matrix multiplication each:

The matrix t can be computed in a similar manner as t = P3 + P4, where

Let us define an essential term to be one of the eight terms appearing on the right-hand side of one of the equations (31.10)--(31.13). We have now used 4 products to compute the two submatrices s and t whose essential terms are ag, bh, ce, and df . Note that P1 computes the essential term ag, P2 computes the essential term bh, P3 computes the essential term ce, and P4 computes the essential term df. Thus, it remains for us to compute the remaining two submatrices r and u, whose essential terms are the diagonal terms ae, bf, cg, and dh, without using more than 3 additional products. We now try the innovation P5 in order to compute two essential terms at once:

In addition to computing both of the essential terms ae and dh, P5 computes the inessential terms ah and de, which need to be cancelled somehow. We can use P4 and P2 to cancel them, but two other inessential terms then appear:

however, we obtain

We can obtain u in a similar manner from P5 by using P1 and P3 to move the inessential terms of P5 in a different direction:

we now obtain

The 7 submatrix products P1, P2, . . . , P7 can thus be used to compute the product C = AB, which completes the description of Strassen's method.

## Discussion

The large constant hidden in the running time of Strassen's algorithm makes it impractical unless the matrices are large (n at least 45 or so) and dense (few zero entries). For small matrices, the straightforward algorithm is preferable, and for large, sparse matrices, there are special sparsematrix algorithms that beat Strassen's in practice. Thus, Strassen's method is largely of theoretical interest.

By using advanced techniques beyond the scope of this text, one can in fact multiply n n matrices in better than (n1g 7) time. The current best upper bound is approximately O(n2.376). The best lower bound known is just the obvious (n2) bound (obvious because we have to fill in n2 elements of the product matrix). Thus, we currently do not know how hard matrix multiplication really is.

Strassen's algorithm does not require that the matrix entries be real numbers. All that matters is that the number system form an algebraic ring. If the matrix entries do not form a ring, however, sometimes other techniques can be brought to bear to allow his method to apply. These issues are discussed more fully in the next section.

## Exercises

Use Strassen's algorithm to compute the matrix product

How would you modify Strassen's algorithm to multiply n n matrices in which n is not an exact power of 2? Show that the resulting algorithm runs in time (n1g 7).

What is the largest k such that if you can multiply 3 3 matrices using k multiplications (not assuming commutativity of multiplication), then you can multiply n n matrices in time o(n1g 7)? What would the running time of this algorithm be?

How quickly can you multiply a kn n matrix by an n kn matrix, using Strassen's algorithm as a subroutine? Answer the same question with the order of the input matrices reversed.

# * 31.3 Algebraic number systems and boolean matrix multiplication

The properties of matrix addition and multiplication depend on the properties of the underlying number system. In this section, we define three different kinds of underlying number systems: quasirings, rings, and fields. We can define matrix multiplication over quasirings, and Strassen's matrix-multiplication algorithm works over rings. We then present a simple trick for reducing boolean matrix multiplication, which is defined over a quasiring that is not a ring, to multiplication over a ring. Finally, we discuss why the properties of a field cannot naturally be exploited to provide better algorithms for matrix multiplication.

## Quasirings

Likewise, is a monoid.

We can extend and to matrices as we did for + and in Section 31.1. Denoting the n n identity matrix composed of we find that matrix multiplication is well defined and the matrix system is itself a quasiring, as the following theorem states.

Proof The proof is left as Exercise 31.3-3.

## Rings

Given that the negative of any element is defined, we can define subtraction by a - b = a + (-b).

There are many examples of rings. The integers (Z, +, , 0, 1) under the usual operations of addition and multiplication form a ring. The integers modulo n for any integer n > 1--that is, (Zn, +, , 0, 1), where + is addition modulo n and is multiplication modulo n--form a ring. Another example is the set R[x] of finite-degree polynomials in x with real coefficients under the usual operations--that is, (R[x], +, , 0, 1), where + is polynomial addition and is polynomial multiplication.

The following corollary shows that Theorem 31.7 generalizes naturally to rings.

If is a ring and n 1, then is a ring.

Proof The proof is left as Exercise 31.3-3.

Using this corollary, we can prove the following theorem.

Strassen's matrix-multiplication algorithm works properly over any ring of matrix elements.

Proof Strassen's algorithm depends on the correctness of the algorithm for 2 2 matrices, which requires only that the matrix elements belong to a ring. Since the matrix elements do belong to a ring, Corollary 31.8 implies the matrices themselves form a ring. Thus, by induction, Strassen's algorithm works correctly at each level of recursion.

Strassen's algorithm for matrix multiplication, in fact, depends critically on the existence of additive inverses. Out of the seven products P1, P2, . . . ,P7, four involve differences of submatrices. Thus, Strassen's algorithm does not work in general for quasirings.

## Boolean matrix multiplication

The following theorem presents a simple trick for reducing boolean matrix multiplication to multiplication over a ring. Problem 31-1 presents another efficient approach.

If M(n) denotes the number of arithmetic operations required to multiply two n n matrices over the integers, then two n n boolean matrices can be multiplied using O(M(n)) arithmetic operations.

Proof Let the two matrices be A and B, and let C = AB in the boolean quasiring, that is,

Instead of computing over the boolean quasiring, we compute the product C' over the ring of integers with the given matrix-multiplication algorithm that uses M(n) arithmetic operations. We thus have

Each term aikbkj of this sum is 0 if and only if aik bkj = 0, and 1 if and only if aik bkj = 1. Thus, the integer sum is 0 if and only if every term is 0 or, equivalently, if and only if the boolean OR of the terms, which is cij, is 0. Therefore, the boolean matrix C can be reconstructed with (n2) arithmetic operations from the integer matrix C' by simply comparing each with 0. The number of arithmetic operations for the entire procedure is then O(M(n)) +(n2) = O(M(n)), since M(n) = (n2).

Thus, using Strassen's algorithm, we can perform boolean matrix multiplication in O(n lg 7) time.

The normal method of multiplying boolean matrices uses only boolean variables. If we use this adaptation of Strassen's algorithm, however, the final product matrix can have entries as large as n, thus requiring a computer word to store them rather than a single bit. More worrisome is that the intermediate results, which are integers, may grow even larger. One method for keeping intermediate results from growing too large is to perform all computations modulo n + 1. Exercise 31.3-5 asks you to show that working modulo n + 1 does not affect the correctness of the algorithm.

## Fields

Such an element b is often called the inverse of a and is denoted a-1.

Examples of fields include the real numbers (R, +, , 0, 1), the complex numbers (C, +, , 0, 1), and the integers modulo a prime p: (Zp, +, , 0, 1).

Because fields offer multiplicative inverses of elements, division is possible. They also offer commutativity. By generalizing from quasirings to rings, Strassen was able to improve the running time of matrix multiplication. Since the underlying elements of matrices are often from a field--the real numbers, for instance--one might hope that by using fields instead of rings in a Strassen-like recursive algorithm, the running time might be further improved.

This approach seems unlikely to be fruitful. For a recursive divide-and-conquer algorithm based on fields to work, the matrices at each step of the recursion must form a field. Unfortunately, the natural extension of Theorem 31.7 and Corollary 31.8 to fields fails badly. For n > 1, the set of n n matrices never forms a field, even if the underlying number system is a field. Multiplication of n n matrices is not commutative, and many n n matrices do not have inverses. Better algorithms for matrix multiplication are therefore more likely to be based on ring theory than on field theory.

## Exercises

Does Strassen's algorithm work over the number system (Z[x], +, , 0, 1), where Z[x] is the set of all polynomials with integer coefficients in the variable x and + and are ordinary polynomial addition and multiplication?

Explain why Strassen's algorithm doesn't work over closed semirings (see Section 26.4) or over the boolean quasiring ({0, 1}, , , 0, 1).

Prove Theorem 31.7 and Corollary 31.8.

Show that the boolean quasiring ({0, 1}, , , 0, 1) cannot be embedded in a ring. That is, show that it is impossible to add a "-1" to the quasiring so that the resulting algebraic structure is a ring.

Argue that if all computations in the algorithm of Theorem 31.10 are performed modulo n + 1, the algorithm still works correctly.

Show how to compute the transitive closure of a given directed n-vertex input graph in time O(nlg 7 lg n). Compare this result with the performance of the TRANSITIVE-CLOSURE procedure in Section 26.2.

# 31.4 Solving systems of linear equations

We start with a set of linear equations in n unknowns x1, x2, . . . , xn:

`a11x1 + a12x2 +    + a1nxn = b1,`

`a21x1 + a22x2 +    + a2nxn = b2,`

#### (31.17)

`an1x1 + an2x2 +    + annxn = bn.`

We can conveniently rewrite equations (31.17) as the matrix-vector equation

or, equivalently, letting A = (aij), x = (xj), and b = (bi), as

`Ax = b .`

#### (31.18)

If A is nonsingular, it possesses an inverse A-1, and

`x = A-1 b`

#### (31.19)

is the solution vector. We can prove that x is the unique solution to equation (31.18) as follows. If there are two solutions, x and x', then Ax = Ax' = b and

`x  =  (A-1 A)x`

`=  A-1(Ax)`

`=  A-1(Ax')`

`=  (A-1 A)x'`

`=  x'.`

Let us return to our problem of solving the system Ax = b of n equations in n unknowns. One approach is to compute A-1 and then multiply both sides by A-1, yielding A-1Ax = A-1b, or x = A-1b. This approach suffers in practice from numerical instability: round-off errors tend to accumulate unduly when floating-point number representations are used instead of ideal real numbers. There is, fortunately, another approach--LUP decomposition--that is numerically stable and has the further advantage of being about a factor of 3 faster.

## Overview of LUP decomposition

`PA = LU ,`

#### (31.20)

where

L is a unit lower-triangular matrix,

U is an upper-triangular matrix, and

P is a permutation matrix.

We call matrices L, U, and P satisfying equation (31.20) an LUP decomposition of the matrix A. We shall show that every nonsingular matrix A possesses such a decomposition.

The advantage of computing an LUP decomposition for the matrix A is that linear systems can be solved more readily when they are triangular, as is the case for both matrices L and U. Having found an LUP decomposition for A, we can solve the equation (31.18) Ax = b by solving only triangular linear systems, as follows. Multiplying both sides of Ax = b by P yields the equivalent equation P Ax = Pb, which by Exercise 31.1-2 amounts to permuting the equations (31.17). Using our decomposition (31.20), we obtain

`LUx = Pb .`

We can now solve this equation by solving two triangular linear systems. Let us define y = Ux, where x is the desired solution vector. First, we solve the lower-triangular system

`Ly = Pb`

#### (31.21)

for the unknown vector y by a method called "forward substitution." Having solved for y, we then solve the upper-triangular system

`Ux = y`

#### (31.22)

for the unknown x by a method called "back substitution." The vector x is our solution to Ax = b, since the permutation matrix P is invertible (Exercise 31.1 -2):

`Ax  =  P-1 LUx`

`=  P-1 Ly`

`=  P-1 Pb`

`=  b .`

Our next step is to show how forward and back substitution work and then attack the problem of computing the LUP decomposition itself.

## Forward and back substitution

`y1                             =  b[1],`

`l21y1 + y2                        =  b[2],`

`l31y1 + l32y2 + y3                =  b[3],`

`ln1y1 + ln2y2 + ln3y3 +    + yn  =  b[n].`

Quite apparently, we can solve for y1 directly, since the first equation tells us that y1 = b[1]. Having solved for y1, we can substitute it into the second equation, yielding

`y2 = b[2] - l21y1.`

Now, we can substitute both y1 and y2 into the third equation, obtaining

`y3 = b[3] - (l31y1 + l32y2).`

In general, we substitute y1, y2, . . . ,yi-1 "forward" into the ith equation to solve for yi:

`u11x1 + u12x2 +    +   u1,n-2xn-2 +   u1,n-1xn-1  +    u1nxn = y1,`

`u22x2 +    +   u2,n-2xn-2 +   u2,n-1xn-1  +     u2nxn = y2,`

`un-2,n-2xn-2 + un-2,n-1xn-1 + un-2,nxn = yn-2,`

`un-1,n-1xn-1 + un-1,nxn = yn-1,`

`     un,nxn = yn .`

Thus, we can solve for xn, xn-1, . . . , x1 successively as follows:

`xn = yn/unn ,`

`xn-1 = (yn-1 - un-1,nxn)/un-1,n-1 ,`

`xn-2 = (yn-2 - (un-2,n-1xn-1 + un-2,nxn))/un-2,n-2 ,`

or, in general,

Procedure LUP-SOLVE solves for y using forward substitution in lines 2-3, and then it solves for x using backward substitution in lines 4-5. Since there is an implicit loop in the summations within each of the for loops, the running time is (n2).

As an example of these methods, consider the system of linear equations defined by

where

and we wish to solve for the unknown x. The LUP decomposition is

(The reader can verify that PA = LU.) Using forward substitution, we solve Ly = Pb for y:

obtaining

by computing first y1, then y2, and finally y3. Using back substitution, we solve Ux = y for x:

by computing first x3, then x2, and finally x1.

## Computing an LU decomposition

Our algorithm to implement this strategy is recursive. We wish to construct an LU decomposition for an n n nonsingular matrix A. If n = 1, then we're done, since we can choose L = I1 and U = A. For n > 1, we break A into four parts:

where v is a size-(n - 1) column vector, wT is a size-(n - 1) row vector, and A' is an (n - 1) (n - 1) matrix. Then, using matrix algebra (verify the equations by simply multiplying through), we can factor A as

`A' - vwT/a11`

#### (31.23)

is called the Schur complement of A with respect to a11.

We now recursively find an LU decomposition of the Schur complement. Let us say that

`A' - vwT/a11 = L'U' ,`

where L' is unit lower-triangular and U' is upper-triangular. Then, using matrix algebra, we have

thereby providing our LU decomposition. (Note that because L' is unit lower-triangular, so is L, and because U'is upper-triangular, so is U.)

An important class of matrices for which LU decomposition always works correctly is the class of symmetric positive-definite matrices. Such matrices require no pivoting, and thus the recursive strategy outlined above can be employed without fear of dividing by 0. We shall prove this result, as well as several others, in Section 31.6.

The outer for loop beginning in line 2 iterates once for each recursive step. Within this loop, the pivot is determined to be ukk = akk in line 3. Within the for loop in lines 4-6 (which does not execute when k = n), the v and wT vectors are used to update L and U. The elements of the v vector are determined in line 5, where vi is stored in lik, and the elements of the wT vector are determined in line 6, where wiT is stored in uki. Finally, the elements of the Schur complement are computed in lines 7-9 and stored back in the matrix A. Because line 9 is triply nested, LU-DECOMPOSITION runs in time (n3).

#### Figure 31.1 The operation of LU-DECOMPOSITION. (a) The matrix A. (b) The element a11 = 2 in black is the pivot, the shaded column is v/a11, and the shaded row is wT. The elements of U computed thus far are above the horizontal line, and the elements of L are to the left of the vertical line. The Schur complement matrix A' - vwT/a11 occupies the lower right. (c) We now operate on the Schur complement matrix produced from part (b). The element a22 = 4 in black is the pivot, and the shaded column and row are v/a22 and wT (in the partitioning of the Schur complement), respectively. Lines divide the matrix into the elements of U computed so far (above), the elements of L computed so far (left), and the new Schur complement (lower right). (d) The next step completes the factorization. (The element 3 in the new Schur complement becomes part of U when the recursion terminates.) (e) The factorization A = LU.

Figure 31.1 illustrates the operation of LU-DECOMPOSITION. It shows a standard optimization of the procedure in which the significant elements of L and U are stored "in place" in the matrix A. That is, we can set up a correspondence between each element aij and either lij (if i > j) or uij (if i j) and update the matrix A so that it holds both L and U when the procedure terminates. The pseudocode for this optimization is obtained from the above pseudocode merely by replacing each reference to l or u by a; it is not difficult to verify that this transformation preserves correctness.

## Computing an LUP decomposition

The mathematics behind LUP decomposition is similar to that of LU decomposition. Recall that we are given an n n nonsingular matrix A and wish to find a permutation matrix P, a unit lower-triangular matrix L, and an upper-triangular matrix U such that PA = LU. Before we partition the matrix A, as we did for LU decomposition, we move a nonzero element, say ak1, from the first column to the (1,1) position of the matrix. (If the first column contains only 0's, then A is singular, because its determinant is 0, by Theorems 31.4 and 31.5.) In order to preserve the set of equations, we exchange row 1 with row k, which is equivalent to multiplying A by a permutation matrix Q on the left (Exercise 31.1-2). Thus, we can write QA as

where v = (a21, a31, . . . , an1)T, except that a11 replaces ak1; wT = (ak2, ak3, . . . , akn); and A' is an (n - 1) (n - 1) matrix. Since ak1 0, we can now perform much the same linear algebra as for LU decomposition, but now guaranteeing that we do not divide by 0:

The Schur complement A' - vwT/ak1 is nonsingular, because otherwise the second matrix in the last equation has determinant 0, and thus the determinant of matrix A is 0; but this means that A is singular, which contradicts our assumption that A is nonsingular. Consequently, we can inductively find an LUP decomposition for the Schur complement, with unit lower-triangular matrix L', upper-triangular matrix U', and permutation matrix P', such that

`P'(A' - vwT/ak1) = L'U' .`

Define

which is a permutation matrix, since it is the product of two permutation matrices (Exercise 31.1-2). We now have

yielding the LUP decomposition. Because L' is unit lower-triangular, so is L, and because U' is upper-triangular, so is U.

Notice that in this derivation, unlike the one for LU decomposition, both the column vector v/ak1 and the Schur complement A' - vwT/ak1 must be multiplied by the permutation matrix P'.

Like LU-DECOMPOSITION, our pseudocode for LUP decomposition replaces the recursion with an iteration loop. As an improvement over a direct implementation of the recursion, we dynamically maintain the permutation matrix P as an array , where [i] = j means that the ith row of P contains a 1 in column j. We also implement the code to compute L and U "in place" in the matrix A. Thus, when the procedure terminates,

`LUP-DECOMPOSITION(A)`

`1  n  rows[A]`

`2  for i  1 to n`

`3       do [i]  i`

`4  for k  1 to n - 1`

`5       do p  0`

`6          for i  k to n`

`7              do if |aik| > p`

`8                    then p  |aik|`

`9                         k'  i`

`10          if p = 0`

`11             then error "singular matrix"`

`12          exchange [k]  [k']`

`13          for i  1 to n`

`14               do exchange aki  ak'i`

`15          for i  k + 1 to n`

`16               do aik  aik/akk`

`17                  for j  k + 1 to n`

`18                       do aij  aij - aikakj`

Figure 31.2 illustrates how LUP-DECOMPOSITION factors a matrix. The array is initialized by lines 2-3 to represent the identity permutation. The outer for loop beginning in line 4 implements the recursion. Each time through the outer loop, lines 5-9 determine the element ak'k with largest absolute value of those in the current first column (column k) of the (n - k + l ) (n - k + 1) matrix whose LU decomposition must be found.

#### Figure 31.2 The operation of LUP-DECOMPOSITION. (a) The input matrix A with the identity permutation of the rows on the left. The first step of the algorithm determines that the element 5 in black in the third row is the pivot for the first column. (b) Rows 1 and 3 are swapped and the permutation is updated. The shaded column and row represent v and wT. (c) The vector v is replaced by v/5, and the the lower right of the matrix is updated with the Schur complement. Lines divide the matrix into three regions: elements of U (above), elements of L (left), and elements of the Schur complement (lower right).(d)-(f) The second step.(g)-(i) The third step finishes the algorithm. (j) The LUP decomposition PA = LU.

If all elements in the current first column are zero, lines 10-11 report that the matrix is singular. To pivot, we exchange [k'] with [k] in line 12 and exchange the kth and k'th rows of A in lines 13-14, thereby making the pivot element akk. (The entire rows are swapped because in the derivation of the method above, not only is A' - vwT/ak1 multiplied by P', but so is v/ak1.) Finally, the Schur complement is computed by lines 15-18 in much the same way as it is computed by lines 4-9 of LU-DECOMPOSITION, except that here the operation is written to work "in place."

Because of its triply nested loop structure, the running time of LUP-DECOMPOSITION is (n3), the same as that of LU-DECOMPOSITION. Thus, pivoting costs us at most a constant factor in time.

## Exercises

Solve the equation

by using forward substitution.

Find an LU decomposition of the matrix

Why does the for loop in line 4 of LUP-DECOMPOSITION run only up to n - 1, whereas the corresponding for loop in line 2 of LU-DECOMPOSITION runs all the way to n?

Solve the equation

by using an LUP decomposition.

Show that for all n 1, there exist singular n n matrices that have LU decompositions.

Show how we can efficiently solve a set of equations of the form Ax = b over the boolean quasiring ({0, 1}, , , 0, 1).

Suppose that A is an m n real matrix of rank m, where m < n. Show how to find a size-n vector x0 and an m (n - m) matrix B of rank n - m such that every vector of the form x0 + By, for y Rn-m, is a solution to the underdetermined equation Ax = b.

# 31.5 Inverting matrices

## Computing a matrix inverse from an LUP decomposition

The equation

`AX = In`

#### (31.24)

can be viewed as a set of n distinct equations of the form Ax = b. These equations define the matrix X as the inverse of A. To be precise, let Xi denote the ith column of X, and recall that the unit vector ei is the ith column of In. Equation (31.24) can then be solved for X by using the LUP decomposition for A to solve each equation

`AXi = ei`

separately for Xi. Each of the n Xi can be found in time (n2), and so the computation of X from the LUP decomposition of A takes time (n3). Since the LUP decomposition of A can be computed in time (n3), the inverse A-1 of a matrix A can be determined in time (n3).

## Matrix multiplication and matrix inversion

If we can invert an n n matrix in time I(n), where I(n) = (n2) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n n matrices in time O(I(n)).

Proof Let A and B be n n matrices whose matrix product C we wish to compute. We define the 3n 3n matrix D by

The inverse of D is

and thus we can compute the product AB by taking the upper right n n submatrix of D-1.

We can construct matrix D in (n2) = O(I(n)) time, and we can invert D in O(I(3n)) = O(I(n)) time, by the regularity condition on I(n). We thus have

`M(n) = O(I(n)).      `

Note that I(n) satisfies the regularity condition whenever I(n) does not have large jumps in value. For example, if I(n) = (nc lgd n) for any constants c > 0, d 0, then I(n) satisfies the regularity condition.

## Reducing matrix inversion to matrix multiplication

If we can multiply two n n real matrices in time M(n), where M(n) = (n2) and M(n) = O(M(n + k)) for 0 k n, then we can compute the inverse of any real nonsingular n n matrix in time O(M(n)).

Proof We can assume that n is an exact power of 2, since we have

for any k > 0. Thus, by choosing k such that n + k is a power of 2, we enlarge the matrix to a size that is the next power of 2 and obtain the desired answer A-1 from the answer to the enlarged problem. The regularity condition on M(n) ensures that this enlargement does not cause the running time to increase by more than a constant factor.

For the moment, let us assume that the n n matrix A is symmetric and positive-definite. We partition A into four n/2 n/2 submatrices:

#### (31.25)

Then, if we let

`S = D - CB-1 CT`

#### (31.26)

be the Schur complement of A with respect to B, we have

#### (31.27)

since AA-1 = In, as can be verified by performing the matrix multiplication. The matrices B-1 and S-1 exist if A is symmetric and positive-definite, by Lemmas 31.13, 31.14, and 31.15 in Section 31.6, because both B and S are symmetric and positive-definite. By Exercise 31.1-3, B-1CT = (CB-1)T and B-1CTS-1 = (S-1 CB-1)T. Equations (31.26) and (31.27) can therefore be used to specify a recursive algorithm involving 4 multiplications of n/2 n/2 matrices:

`C  B-1,`

`(CB-1)  CT,`

`S-1  (CB-1),`

`(CB-1)T  (S-1 CB-1).`

Since we can multiply n/2 n/2 matrices using an algorithm for n n matrices, matrix inversion of symmetric positive-definite matrices can be performed in time

`I(n)  2I(n/2) + 4M(n) + O(n2)`

`= 2I(n/2) + O(M(n))`

`= O(M(n)).`

It remains to prove that the asymptotic running time of matrix multiplication can be obtained for matrix inversion when A is invertible but not symmetric and positive-definite. The basic idea is that for any nonsingular matrix A, the matrix ATA is symmetric (by Exercise 31.1-3) and positive-definite (by Theorem 31.6). The trick, then, is to reduce the problem of inverting A to the problem of inverting ATA.

The reduction is based on the observation that when A is an n n nonsingular matrix, we have

`A-1 = (ATA)-1AT,`

since ((ATA)-1 AT)A = (ATA)-1(ATA) = In and a matrix inverse is unique. Therefore, we can compute A-1 by first multiplying AT by A to obtain ATA, then inverting the symmetric positive-definite matrix ATA using the above divide-and-conquer algorithm, and finally multiplying the result by AT. Each of these three steps takes O(M(n)) time, and thus any nonsingular matrix with real entries can be inverted in O(M(n)) time.

The proof of Theorem 31.12 suggests a means of solving the equation Ax = b without pivoting, so long as A is nonsingular. We multiply both sides of the equation by AT, yielding (ATA)x = ATb. This transformation doesn't affect the solution x, since AT is invertible, so we can factor the symmetric positive-definite matrix ATA by computing an LU decomposition. We then use forward and back substitution to solve for x with the right-hand side ATb. Although this method is theoretically correct, in practice the procedure LUP-DECOMPOSITION works much better. LUP decomposition requires fewer arithmetic operations by a constant factor, and it has somewhat better numerical properties.

## Exercises

Let M(n) be the time to multiply n n matrices, and let S(n) denote the time required to square an n n matrix. Show that multiplying and squaring matrices have essentially the same difficulty: S(n) = (M(n)).

Does the matrix-inversion algorithm based on Theorem 31.12 work when matrix elements are drawn from the field of integers modulo 2? Explain.

# 31.6 Symmetric positive-definite matrices and least-squares approximation

The first property we prove is perhaps the most basic.

Any symmetric positive-definite matrix is nonsingular.

Proof Suppose that a matrix A is singular. Then by Corollary 31.3, there exists a nonzero vector x such that Ax = 0. Hence, xTAx = 0, and A cannot be positive-definite.

If A is a symmetric positive-definite matrix, then every leading submatrix of A is symmetric and positive-definite.

Proof That each leading submatrix Ak is symmetric is obvious. To prove that Ak is positive-definite, let x be a nonzero column vector of size k, and let us partition A as follows:

Then, we have

since A is positive-definite, and hence Ak is also positive-definite.

#### (31.28)

Then, the Schur complement of A with respect to Ak is defined to be

#### (31.29)

(By Lemma 31.14, Ak is symmetric and positive-definite; therefore, exists by Lemma 31.13, and S is well defined.) Note that our earlier definition (31.23) of the Schur complement is consistent with definition (31.29), by letting k = 1.

The next lemma shows that the Schur-complement matrices of symmetric positive-definite matrices are themselves symmetric and positive-definite. This result was used in Theorem 31.12, and its corollary is needed to prove the correctness of LU decomposition for symmetric positive-definite matrices.

If A is a symmetric positive-definite matrix and Ak is a leading data k k submatrix of A, then the Schur complement of A with respect to Ak is symmetric and positive-definite.

Proof That S is symmetric follows from Exercise 31.1-7. It remains to show that S is positive-definite. Consider the partition of A given in equation (31.28).

For any nonzero vector x, we have xTAx > 0 by assumption. Let us break x into two subvectors y and z compatible with Ak and C, respectively. Because Ak-1 exists, we have

#### (31.30)

by matrix magic. (Verify by multiplying through.) This last equation amounts to "completing the square" of the quadractic form. (See Exercise 31.6-2.)

Since xTAx > 0 holds for any nonzero x, let us pick any nonzero z and then choose , which causes the first term in equation (31.30) to vanish, leaving

as the value of the expression. For any z 0, we therefore have zTSz = xTAx > 0, and thus S is positive-definite.

LU decomposition of a symmetric positive-definite matrix never causes a division by 0.

Proof Let A be a symmetric positive-definite matrix. We shall prove something stronger than the statement of the corollary: every pivot is strictly positive. The first pivot is a11. Let e1 be the first unit vector, from which we obtain . Since the first step of LU decomposition produces the Schur complement of A with respect to A1 = (a11), Lemma 31.15 implies that all pivots are positive by induction.

## Least-squares approximation

`(x1,y1),(x2,y2),...,(xm,ym),`

where the yi are known to be subject to measurement errors. We would like to determine a function F (x) such that

`yi = F(xi) + i,`

#### (31.31)

for i = 1, 2, . . . , m, where the approximation errors i are small. The form of the function F depends on the problem at hand. Here, we assume that it has the form of a linearly weighted sum,

where the number of summands n and the specific basis functions fj are chosen based on knowledge of the problem at hand. A common choice is fj(x) = xj-1, which means that

`F(x) = c1 + c2x + c3x2 +    + cnxn-1`

is a polynomial of degree n - 1 in x.

By choosing n = m, we calculate each yi exactly in equation (31.31). Such a high-degree F "fits the noise" as well as the data, however, and generally gives poor results when used to predict y for previously unseen values of x. It is usually better to choose n significantly smaller than m and hope that by choosing the coefficients cj well, we can obtain a function F that finds the significantl patterns in the data points without paying undue attention to the noise. Some theoretical principles exist for choosing n, but they are beyond the scope of this text. In any case, once n is chosen, we end up with an overdetermined set of equations that we wish to solve as well as we can. We now show how this can be done.

Let

denote the matrix of values of the basis functions at the given points; that is, aij = fj(xi). Let c = (ck) denote the desired size-n vector of coefficients. Then,

is the size-m vector of "predicted values" for y. Thus,

` = Ac - y`

is the size-m vector of approximation errors.

To minimize approximation errors, we choose to minimize the norm of the error vector , which gives us a least-squares solution, since

Since

we can minimize |||| by differentiating ||||2 with respect to each ck and then setting the result to 0:

#### (31.32)

The n equations (31.32) for k = 1, 2, . . . , n are equivalent to the single matrix equation

`(Ac -y)TA = 0`

or, equivalently (using Exercise 31.1-3), to

`AT(Ac - y) = 0 ,`

which implies

`ATAc = ATy .`

#### (31.33)

`c = ((ATA)-1AT)y`

`= A+y ,`

#### (31.34)

where the matrix A+ = ((ATA)-1AT) is called the pseudoinverse of the matrix A. The pseudoinverese is a natural generalization of the notion of a matrix inverse to the case in which A is nonsquare. (Compare equation (31.34) as the approximate solution to Ac = y with the solution A-1b as the exact solution to Ax = b.)

As an example of producing a least-squares fit, suppose that we have 5 data points

`(-1,2),(1,1),(2,1),(3,0),(5,3),`

shown as black dots in Figure 31.3. We wish to fit these points with a quadratic polynomial

`F(x) = c1 + c2x + c3x2.`

#### Figure 31.3 The least-squares fit of a quadratic polynomial to the set of data points {(-1, 2), (1, 1), (2, 1),(3, 0), (5, 3)}. The black dots are the data points, and the white dots are their estimated values predicted by the polynomial F(x) = 1.2 - 0.757x + 0.214x2, the quadratic polynomial that minimizes the sum of the squared errors. The error for each data point is shown as a shaded line.

whose pseudoinverse is

Multiplying y by A+, we obtain the coefficient vector

which corresponds to the quadratic polynomial

`F(x) = 1.200 - 0.757x + 0.214x2`

as the closest-fitting quadratic to the given data, in a least-squares sense.

As a practical matter, we solve the normal equation (31.33) by multiplying y by AT and then finding an LU decomposition of AT A. If A has full rank, the matrix AT A is guaranteed to be nonsingular, because it is symmetric and positive-definite. (See Exercise 31.1-3 and Theorem 31.6.)

## Exercises

Prove that every diagonal element of a symmetric positive-definite matrix is positive.

Let be a 2 2 symmetric positive-definite matrix. Prove that its determinant ac - b2 is positive by "completing the square" in a manner similar to that used in the proof of Lemma 31.15.

Prove that the maximum element in a symmetric positive-definite matrix lies on the diagonal.

Prove that the determinant of each leading submatrix of a symmetric positive-definite matrix is positive.

Let Ak denote the kth leading submatrix of a symmetric positive-definite matrix A. Prove that det(Ak)/det(Ak-1) is the kth pivot during LU decomposition, where by convention det(A0) = 1.

Find the function of the form

`F(x) = c1 + c2x lg x + c3ex`

that is the best least-squares fit to the data points

`(1,1),(2,1),(3,3),(4,8) .`

Show that the pseudoinverse A+ satisfies the following four equations:

`AA+ A =  A ,`

`A+ AA+  =  A+ ,`

`(AA+)T  =  AA+ ,`

`(A+A)T  =  A+A .`

# Problems

a. Show that R = ({0, 1}, , , 0, 1), where is the XOR (exclusive-or) function, is a ring.

Let A = (aij) and B = (bij) be n n boolean matrices, and let C = (cij) = AB in the quasiring Q. Generate A' = (a'ij) from A using the following randomized procedure:

If aij = 0, then let a'ij = 0.

If Aij = 1, then let a'ij = 1 with probability 1/2 and let a'ij = 0 with probability 1/2. The random choices for each entry are independent.

b. Let C' = (c'ij) = A'B in the ring R. Show that cij = 0 implies c'ij = 0. Show that cij = 1 implies c'ij = 1 with probability 1/2.

c. Show that for any > 0, the probability is at most /n2 that a given c'ij never takes on the value cij for lg(n2/) independent choices of the matrix A'. Show that the probability is at least 1 - that all c'ij take on their correct values at least once.

d. Give an O(nlg 7 lg n)-time randomized algorithm that computes the product in the boolean quasiring Q of two n n matrices with probability at least 1 - 1/nk for any constant k > 0. The only operations permitted on matrix elements are , , and .

a. Find an LU decomposition of A.

b. Solve the equation Ax = (1 1 1 1 1)T by using forward and back substitution.

c. Find the inverse of A.

d. Show that for any n n symmetric, positive-definite, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LU decomposition. Argue that any method based on forming A-1 is asymptotically more expensive in the worst case.

e. Show that for any n n nonsingular, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LUP decomposition.

To ensure continuity of f(x), we require that

`f(xi)   =  fi(0)  =  yi ,`

`f(xi+1) =  fi(1)  =  yi+1`

for i = 0, 1, . . . , n - 1. To ensure that f(x) is sufficiently smooth, we also insist that there be continuity of the first derivative at each knot:

for i = 0, 1, . . . , n - 1.

a. Suppose that for i = 0, 1, . . . , n, we are given not only the point-value pairs {(xi, yi)} but also the first derivatives Di = f'(xi) at each knot. Express each coefficient ai, bi, ci, and di in terms of the values yi, yi+1, Di, and Di+1. (Remember that xi = i.) How quickly can the 4n coefficients be computed from the point-value pairs and first derivatives?

The question remains of how to choose the first derivatives of f(x) at the knots. One method is to require the second derivatives to be continuous at the knots:

for i = 0, 1, . . . ,n - 1. At the first and last knots, we assume that ; these assumptions make f(x) a natural cubic spline.

b. Use the continuity constraints on the second derivative to show that for i = 1, 2, . . . , n - 1,

`Di - 1 + 4Di + Di + 1 = 3(yi + 1 - yi - 1) .`

#### (31.35)

c. Show that

`2D0 + D1 = 3(y1 - y0) ,`

#### (31.36)

`Dn - 1 + 2Dn = 3(yn - yn - 1) .`

#### (31.37)

d. Rewrite equations (31.35)--(31.37) as a matrix equation involving the vector D = D0, D1, . . . , Dn of unknowns. What attributes does the matrix in your equation have?

e. Argue that a set of n + 1 point-value pairs can be interpolated with a natural cubic spline in O(n) time (see Problem 31-2).

f. Show how to determine a natural cubic spline that interpolates a set of n + 1 points (xi, yi) satisfying x0 < x1 < < xn, even when xi is not necessarily equal to i. What matrix equation must be solved, and how quickly does your algorithm run?

# Chapter notes

There are many excellent texts available that describe numerical and scientific computation in much greater detail than we have room for here. The following are especially readable: George and Liu [81], Golub and Van Loan [89], Press, Flannery, Teukolsky, and Vetterling[161, 162], and Strang[181, 182].

The publication of Strassen's algorithm in 1969 [183] caused much excitement. Before then, it was hard to imagine that the naive algorithm could be improved upon. The asymptotic upper bound on the difficulty of matrix multiplication has since been considerably improved. The most asymptotically efficient algorithm for multiplying n n matrices to date, due to Coppersmith and Winograd [52], has a running time of O(n2.376). The graphical presentation of Strassen's algorithm is due to Paterson [155]. Fischer and Meyer [67] adapted Strassen's algorithm to boolean matrices (Theorem 31.10) .

Gaussian elimination, upon which the LU and LUP decompositions are based, was the first systematic method for solving linear systems of equations. It was also one of the earliest numerical algorithms. Although it was known earlier, its discovery is commonly attributed to C. F. Gauss (1777-1855). In his famous paper [183], Strassen also showed that an n n matrix can be inverted in O(nlg 7) time. Winograd [203] originally proved that matrix multiplication is no harder than matrix inversion, and the converse is due to Aho, Hopcroft, and Ullman [4].

Strang [182] has an excellent presentation of symmetric positive-definite matrices and on linear algebra in general. He makes the following remark on page 334: "My class often asks about unsymmetric positive definite matrices. I never use that term."