Next Chapter

Return to Table of Contents

Previous Chapter

CHAPTER 31: MATRIX OPERATIONS

Operations on matrices are at the heart of scientific computing. Efficient algorithms for working with matrices are therefore of considerable practical interest. This chapter provides a brief introduction to matrix theory and matrix operations, emphasizing the problems of multiplying matrices and solving sets of simultaneous linear equations.

After Section 31.1 introduces basic matrix concepts and notations, Section 31.2 presents Strassen's surprising algorithm for multiplying two n n matrices in (n^{lg 7}) = O(n^2.81) time. Section 31.3 defines quasirings, rings, and fields, clarifying the assumptions required to make Strassen's algorithm work. It also contains an asymptotically fast algorithm for multiplying boolean matrices. Section 31.4 shows how to solve a set of linear equations using LUP decompositions. Then, Section 31.5 explores the close relationship between the problem of multiplying matrices and the problem of inverting a matrix. Finally, Section 31.6 discusses the important class of symmetric positive-definite matrices and shows how they can be used to find a least-squares solution to an overdetermined set of linear equations.

31.1 Properties of matrices

In this section, we review some basic concepts of matrix theory and some fundamental properties of matrices, focusing on those that will be needed in later sections.

Matrices and vectors

A matrix is a rectangular array of numbers. For example,

(31.1)

is a 2 3 matrix A = (a_ij), where for i = 1, 2 and j = 1, 2, 3, the element of the matrix in row i and column j is a_ij. We use uppercase letters to denote matrices and corresponding subscripted lowercase letters to denote their elements. The set of all m n matrices with real-valued entries is denoted R^mn. In general, the set of m n matrices with entries drawn from a set S is denoted S^mn.

The transpose of a matrix A is the matrix A^T obtained by exchanging the rows and columns of A. For the matrix A of equation (31.1),

A vector is a one-dimensional array of numbers. For example,

(31.2)

is a vector of size 3. We use lowercase letters to denote vectors, and we denote the ith element of a size-n vector x by x_i, for i = 1, 2, . . . , n. We take the standard form of a vector to be as a column vector equivalent to an n 1 matrix; the corresponding row vector is obtained by taking the transpose:

x^T = ( 2 3 5 ) .

The unit vector e_i is the vector whose ith element is 1 and all of whose other elements are 0. Usually, the size of a unit vector is clear from the context.

A zero matrix is a matrix whose every entry is 0. Such a matrix is often denoted 0, since the ambiguity between the number 0 and a matrix of 0's is usually easily resolved from context. If a matrix of 0's is intended, then the size of the matrix also needs to be derived from the context.

Square n n matrices arise frequently. Several special cases of square matrices are of particular interest:

1. A diagonal matrix has a_ij = 0 whenever i j. Because all of the off-diagonal elements are zero, the matrix can be specified by listing the elements along the diagonal:

2 The n n identity matrix I_n is a diagonal matrix with 1's along the diagonal:

I_n = diag(1,1,...,1)

When I appears without a subscript, its size can be derived from context. The ith column of an identity matrix is the unit vector e_i.

3. A tridiagonal matrix T is one for which t_ij = 0 if |i - j| > 1. Nonzero entries appear only on the main diagonal, immediately above the main diagonal (t_i,i+1 for i = 1, 2, . . . , n - 1), or immediately below the main diagonal (t_i+1,i for i = 1, 2, . . . , n - 1):

4. An upper-triangular matrix U is one for which u_ij = 0 if i > j. All entries below the diagonal are zero:

An upper-triangular matrix is unit upper-triangular if it has all 1's along the diagonal.

5. A lower-triangular matrix L is one for which l_ij = 0 if i < j. All entries above the diagonal are zero:

A lower-triangular matrix is unit lower-triangular if it has all 1's along the diagonal.

6. A permutation matrix P has exactly one 1 in each row or column, and 0's elsewhere. An example of a permutation matrix is

Such a matrix is called a permutation matrix because multiplying a vector x by a permutation matrix has the effect of permuting (rearranging) the elements of x.

7. A symmetric matrix A satisfies the condition A = A^T. For example,

is a symmetric matrix.

Operations on matrices

The elements of a matrix or vector are numbers from a number system, such as the real numbers, the complex numbers, or integers modulo a prime. The number system defines how to add and multiply numbers. We can extend these definitions to encompass addition and multiplication of matrices.

We define matrix addition as follows. If A = (a_ij) and B = (b_ij) are m n matrices, then their matrix sum C = (c_ij) = A + B is the m n matrix defined by

c_ij = a_ij + b_ij

for i = 1, 2, . . . , m and j = 1, 2, . . . , n. That is, matrix addition is performed componentwise. A zero matrix is the identity for matrix addition:

A + 0 = A

= 0 + A .

If is a number and A = (a_ij) is a matrix, then A = (a_ij) is the scalar multiple of A obtained by multiplying each of its elements by . As a special case, we define the negative of a matrix A = (a_ij) to be -1 A = - A, so that the ijth entry of - A is -a_ij. Thus,

A + (-A) = 0

= (-A) + A .

Given this definition, we can define matrix subtraction as the addition of the negative of a matrix: A - B = A + (-B).

We define matrix multiplication as follows. We start with two matrices A and B that are compatible in the sense that the number of columns of A equals the number of rows of B. (In general, an expression containing a matrix product AB is always assumed to imply that matrices A and B arecompatible.) If A = (a_ij) is an m n matrix and B = (b_jk) is an n p matrix, then their matrix product C = AB is the m p matrix C = (c_ik),where

(31.3)

for i = 1, 2, . . . , m and k = 1, 2, . . . , p. The procedure MATRIX-MULTIPLY in Section 26.1 implements matrix multiplication in the straightforward manner based on equation (31.3), assuming that the matrices are square: m = n = p. To multiply n n matrices, MATRIX-MULTIPLY performs n³ multiplications and n²(n - 1) additions, and its running time is (n³).

Matrices have many (but not all) of the algebraic properties typical of numbers. Identity matrices are identities for matrix multiplication:

I_mA = AI_n = A

for any m n matrix A. Multiplying by a zero matrix gives a zero matrix:

A 0 = 0 .

Matrix multiplication is associative:

A(BC) = (AB)C

(31.4)

for compatible matrices A, B, and C. Matrix multiplication distributes over addition:

A(B + C) = AB + AC ,

(B + C)D = BD + CD .

(31.5)

Multiplication of n n matrices is not commutative, however, unless n = 1. For example, if and , then

Matrix-vector products or vector-vector products are defined as if the vector were the equivalent n 1 matrix (or a 1 n matrix, in the case of a row vector). Thus, if A is an m n matrix and x is a vector of size n, then Ax is a vector of size m. If x and y are vectors of size n, then

is a number (actually a 1 1 matrix) called the inner product of x and y. The matrix xy^T is an n n matrix Z called the outer product of x and y, with z_ij = x_iy_j. The (euclidean) norm ||x|| of a vector x of size n is defined by

Thus, the norm of x is its length in n-dimensional euclidean space.

Matrix inverses, ranks, and determinants

We define the inverse of an n n matrix A to be the n n matrix, denoted A^-1 (if it exists), such that AA^-1 = I_n = A^-1 A. For example,

Many nonzero n n matrices do not have inverses. A matrix without an inverse is is called noninvertible, or singular. An example of a nonzero singular matrix is

If a matrix has an inverse, it is called invertible, or nonsingular. Matrix inverses, when they exist, are unique. (See Exercise 31.1-4.) If A and B are nonsingular n n matrices, then

(BA)^-1 = A^-1B^-1.

(31.6)

The inverse operation commutes with the transpose operation:

(A^-1)^T = (A^T)^-1.

The vectors x₁, x₂, . . . , x_n are linearly dependent if there exist coefficients c₁,c₂, . . . , cn, not all of which are zero, such that c₁x₁+ c₂x₂ + . . . + c_nx_n = 0. For example, the vectors x₁ = ( 1 2 3 )^T, x₂ = ( 2 6 4 )^T, and x₃ = ( 4 11 9 )^T are linearly dependent, since 2x₁ + 3x₂ - 2x₃ = 0. If vectors are not linearly dependent, they are linearly independent. For example, the columns of an identity matrix are linearly independent.

The column rank of a nonzero m n matrix A is the size of the largest set of linearly independent columns of A. Similarly, the row rank of A is the size of the largest set of linearly independent rows of A. A fundamental property of any matrix A is that its row rank always equals its column rank, so that we can simply refer to the rank of A. The rank of an m n matrix is an integer between 0 and min(m, n), inclusive. (The rank of a zero matrix is 0, and the rank of an n n identity matrix is n.) An alternate, but equivalent and often more useful, definition is that the rank of a nonzero m n matrix A is the smallest number r such that there exist matrices B and C of respective sizes m r and r n such that

A = BC .

A square n n matrix has full rank if its rank is n. A fundamental property of ranks is given by the following theorem.

A square matrix has full rank if and only if it is nonsingular.

An m n matrix has full column rank if its rank is n.

A null vector for a matrix A is a nonzero vector x such that Ax = 0. The following theorem, whose proof is left as Exercise 31.1-8, and its corollary relate the notions of column rank and singularity to null vectors.

A matrix A has full column rank if and only if it does not have a null vector.

A square matrix A is singular if and only if it has a null vector.

The ijth minor of an n n matrix A, for n > 1, is the (n - 1) (n - 1) matrix A_[ij]obtained by deleting the ith row and jth column of A. The determinant of an n n matrix A can be defined recursively in terms of its minors by

(31.7)

The term (-1)^{i+ j}det(A_[ij]) is known as the cofactor of the element a_ij.

The following theorems, whose proofs are omitted here, express fundamental properties of the determinant.

The determinant of a square matrix A has the following properties:

If any row or any column of A is zero, then det (A) = 0.

The determinant of A is multiplied by if the entries of any one row (or any one column) of A are all multiplied by .

The determinant of A is unchanged if the entries in one row (respectively, column) are added to those in another row (respectively, column).

The determinant of A equals the determinant of A^T.

The determinant of A is multiplied by - 1 if any two rows (respectively, columns) are exchanged.

Also, for any square matrices A and B, we have det (AB) = det(A) det(B).

An n n matrix A is singular if and only if det(A) = 0.

Positive-definite matrices

Positive-definite matrices play an important role in many applications. An n n matrix A is positive-definite if x^T Ax > 0 for all size-n vectors x 0. For example, the identity matrix is positive-definite, since for any nonzero vector x = ( x₁ x₂ . . . x_n)^T,

As we shall see, matrices that arise in applications are often positive-definite due to the following theorem.

For any matrix A with full column rank, the matrix A^TA is positive-definite.

Proof We must show that x^T (A^TA)x > 0 for any nonzero vector x. For any vector x,

x^T(A^T A)x = (Ax)^T(Ax)   (by Exercise 31.1-3)

= ||Ax||²

0 .

(31.8)

Note that ||Ax||² is just the sum of the squares of the elements of the vector Ax. Therefore, if ||Ax||² = 0, every element of Ax is 0, which is to say Ax = 0. Since A has full column rank, Ax = 0 implies x = 0, by Theorem 31.2. Hence, A^T A is positive-definite.

Other properties of positive-definite matrices will be explored in Section 31.6.

Exercises

Prove that the product of two lower-triangular matrices is lower-triangular. Prove that the determinant of a (lower- or upper-) triangular matrix is equal to the product of its diagonal elements. Prove that the inverse of a lower-triangular matrix, if it exists, is lower-triangular.

Prove that if P is an n n permutation matrix and A is an n n matrix, then PA can be obtained from A by permuting its rows, and AP can be obtained from A by permuting its columns. Prove that the product of two permutation matrices is a permutation matrix. Prove that if P is a permutation matrix, then P is invertible, its inverse is P^T, and P^T is a permutation matrix.

Prove that (AB)^T= B^TA^Tand that A^TA is always a symmetric matrix.

Prove that if B and C are inverses of A, then B = C.

Let A and B be n n matrices such that AB = I. Prove that if A' is obtained from A by adding row j into row i, then the inverse B' of A' can be obtained by subtracting column i from column j of B.

Let A be a nonsingular n n matrix with complex entries. Show that every entry of A^-1 is real if and only if every entry of A is real.

Show that if A is a nonsingular symmetric matrix, then A^-1is symmetric. Show that if B is an arbitrary (compatible) matrix, then B AB^Tis symmetric.

Show that a matrix A has full column rank if and only if Ax = 0 implies x = 0. (Hint: Express the linear dependence of one column on the others as a matrix-vector equation.)

Prove that for any two compatible matrices A and B,

rank(AB) min(rank(A), rank(B)) ,

where equality holds if either A or B is a nonsingular square matrix. (Hint: Use the alternate definition of the rank of a matrix.)

Given numbers x₀, x₁, . . . , x_n-1, prove that the determinant of the Vandermonde matrix

(Hint: Multiply column i by -x₀ and add it to column i + 1 for i = n - 1, n - 2, . . . , 1, and then use induction.)

31.2 Strassen's algorithm for matrix multiplication

This section presents Strassen's remarkable recursive algorithm for multiplying n n matrices that runs in (n^{lg 7}) = O(n^2.81) time. For sufficiently large n, therefore, it outperforms the naive (n³) matrix-multiplication algorithm MATRIX-MULTIPLY from Section 26.1.

An overview of the algorithm

Strassen's algorithm can be viewed as an application of a familiar design technique: divide and conquer. Suppose we wish to compute the product C = AB, where each of A, B, and C are n n matrices. Assuming that n is an exact power of 2, we divide each of A, B, and C into four n/2 n/2 matrices, rewriting the equation C = AB as follows:

(31.9)

(Exercise 31.2-2 deals with the situation in which n is not an exact power of 2.) For convenience, the submatrices of A are labeled alphabetically from left to right, whereas those of B are labeled from top to bottom, in agreement with the way matrix multiplication is performed. Equation (31.9) corresponds to the four equations

r = ae + bf ,

(31.10)

s = ag + bh ,

(31.11)

t = ce + df ,

(31.12)

u = cg + dh .

(31.13)

Each of these four equations specifies two multiplications of n/2 n/2 matrices and the addition of their n/2 n/2 products. Using these equations to define a straightforward divide-and-conquer strategy, we derive the following recurrence for the time T(n) to multiply two n n matrices:

T(n) = 8T(n/2) + (n²).

(31.14)

Unfortunately, recurrence (31.14) has the solution T(n) = (n³), and thus this method is no faster than the ordinary one.

Strassen discovered a different recursive approach that requires only 7 recursive multiplications of n/2 n/2 matrices and (n²) scalar additions and subtractions, yielding the recurrence

T(n) = 7T(n/2) + (n²)

= (n^{lg 7})

= O(n^2.81) .

(31.15)

Strassen's method has four steps:

1. Divide the input matrices A and B into n/2 n/2 submatrices, as in equation (31.9).

2. Using (n²) scalar additions and subtractions, compute 14 n/2 n/2 matrices A₁, B₁, A₂, B₂, . . . , A₇, B₇.

3. Recursively compute the seven matrix products P_i = A_iB_i for i = 1, 2, . . . , 7.

4. Compute the desired submatrices r, s, t, u of the result matrix C by adding and/or subtracting various combinations of the P_i matrices, using only (n²) scalar additions and subtractions.

Such a procedure satisfies the recurrence (31.15). All that we have to do now is fill in the missing details.

Determining the submatrix products

It is not clear exactly how Strassen discovered the submatrix products that are the key to making his algorithm work. Here, we reconstruct one plausible discovery method.

Let us guess that each matrix product P_i can be written in the form

P_i=  A_iB_i

=  (_i1a + _i2b + _i3c + _i4d)  (_i1e + _i2f + _i3g + _i4h) ,

(31.16)

where the coefficients _ij, _ij are all drawn from the set {-1, 0, 1}. That is, we guess that each product is computed by adding or subtracting some of the submatrices of A, adding or subtracting some of the submatrices of B, and then multiplying the two results together. While more general strategies are possible, this simple one turns out to work.

If we form all of our products in this manner, then we can use this method recursively without assuming commutativity of multiplication, since each product has all of the A submatrices on the left and all of the B submatrices on the right. This property is essential for the recursive application of this method, since matrix multiplication is not commutative.

For convenience, we shall use 4 4 matrices to represent linear combinations of products of submatrices, where each product combines one submatrix of A with one submatrix of B as in equation (31.16). For example, we can rewrite equation (31.10) as

The last expression uses an abbreviated notation in which "+" represents +1, " represents 0, and "-" represents -1. (From here on, we omit the row and column labels.) Using this notation, we have the following equations for the other submatrices of the result matrix C:

We begin our search for a faster matrix-multiplication algorithm by observing that the submatrix s can be computed as s = P₁ + P₂, where P₁and P₂ are computed using one matrix multiplication each:

The matrix t can be computed in a similar manner as t = P₃ + P₄, where

Let us define an essential term to be one of the eight terms appearing on the right-hand side of one of the equations (31.10)--(31.13). We have now used 4 products to compute the two submatrices s and t whose essential terms are ag, bh, ce, and df . Note that P₁ computes the essential term ag, P₂ computes the essential term bh, P₃ computes the essential term ce, and P₄ computes the essential term df. Thus, it remains for us to compute the remaining two submatrices r and u, whose essential terms are the diagonal terms ae, bf, cg, and dh, without using more than 3 additional products. We now try the innovation P₅ in order to compute two essential terms at once:

In addition to computing both of the essential terms ae and dh, P₅computes the inessential terms ah and de, which need to be cancelled somehow. We can use P₄ and P₂ to cancel them, but two other inessential terms then appear:

By adding an additional product

however, we obtain

We can obtain u in a similar manner from P₅ by using P₁ and P₃ to move the inessential terms of P₅ in a different direction:

By subtracting an additional product

The 7 submatrix products P₁, P₂, . . . , P₇ can thus be used to compute the product C = AB, which completes the description of Strassen's method.

Discussion

The large constant hidden in the running time of Strassen's algorithm makes it impractical unless the matrices are large (n at least 45 or so) and dense (few zero entries). For small matrices, the straightforward algorithm is preferable, and for large, sparse matrices, there are special sparsematrix algorithms that beat Strassen's in practice. Thus, Strassen's method is largely of theoretical interest.

By using advanced techniques beyond the scope of this text, one can in fact multiply n n matrices in better than (n^{1g 7}) time. The current best upper bound is approximately O(n^2.376). The best lower bound known is just the obvious (n²) bound (obvious because we have to fill in n²elements of the product matrix). Thus, we currently do not know how hard matrix multiplication really is.

Strassen's algorithm does not require that the matrix entries be real numbers. All that matters is that the number system form an algebraic ring. If the matrix entries do not form a ring, however, sometimes other techniques can be brought to bear to allow his method to apply. These issues are discussed more fully in the next section.

Exercises

Use Strassen's algorithm to compute the matrix product

Show your work.

How would you modify Strassen's algorithm to multiply n n matrices in which n is not an exact power of 2? Show that the resulting algorithm runs in time (n^{1g 7}).

What is the largest k such that if you can multiply 3 3 matrices using k multiplications (not assuming commutativity of multiplication), then you can multiply n n matrices in time o(n^{1g 7})? What would the running time of this algorithm be?

V. Pan has discovered a way of multiplying 68 68 matrices using 132,464 multiplications, a way of multiplying 70 70 matrices using 143,640 multiplications, and a way of multiplying 72 72 matrices using 155,424 multiplications. Which method yields the best asymptotic running time when used in a divide-and-conquer matrix-multiplication algorithm? Compare it with the running time for Strassen's algorithm.

How quickly can you multiply a kn n matrix by an n kn matrix, using Strassen's algorithm as a subroutine? Answer the same question with the order of the input matrices reversed.

Show how to multiply the complex numbers a + bi and c + di using only three real multiplications. The algorithm should take a, b, c, and d as input and produce the real component ac - bd and the imaginary component ad + bc separately.

* 31.3 Algebraic number systems and boolean matrix multiplication

The properties of matrix addition and multiplication depend on the properties of the underlying number system. In this section, we define three different kinds of underlying number systems: quasirings, rings, and fields. We can define matrix multiplication over quasirings, and Strassen's matrix-multiplication algorithm works over rings. We then present a simple trick for reducing boolean matrix multiplication, which is defined over a quasiring that is not a ring, to multiplication over a ring. Finally, we discuss why the properties of a field cannot naturally be exploited to provide better algorithms for matrix multiplication.

Quasirings

Let denote a number system, where S is a set of elements, and are binary operations on S (the addition and multiplication operations, respectively), and are distinct distinguished elements of S. This system is a quasiring if it satisfies the following properties:

1. is a monoid:

S is closed under ; that is, a b S for all a, b S.

is associative; that is, a (b c) = (a b) c for all a, b, c S.

is an identity for ; that is, for all a S.

Likewise, is a monoid.

2. is an annihilator, that is, for all a S.

3. The operator is commutative; that is, a b = b a for all a, b S.

4. The operator distributes over ; that is, and for all a,b,c S.

Examples of quasirings include the boolean quasiring ({0,1}, , , 0, 1), where denotes logical OR and denotes logical AND, and the natural number system (N, +, , 0, 1), where + and denote ordinary addition and multiplication. Any closed semiring (see Section 26.4) is also a quasiring; closed semirings obey additional idempotence and infinite-sum properties.

We can extend and to matrices as we did for + and in Section 31.1. Denoting the n n identity matrix composed of we find that matrix multiplication is well defined and the matrix system is itself a quasiring, as the following theorem states.

If is a quasiring and n 1, then is a quasiring.

Proof The proof is left as Exercise 31.3-3.

Rings

Subtraction is not defined for quasirings, but it is for a ring, which is a quasiring that satisfies the following additional property:

5. Every element in S has an additive inverse; that is, for all a S, there exists an element b S such that . Such a b is also called the negative of a and is denoted (-a).

Given that the negative of any element is defined, we can define subtraction by a - b = a + (-b).

There are many examples of rings. The integers (Z, +, , 0, 1) under the usual operations of addition and multiplication form a ring. The integers modulo n for any integer n > 1--that is, (Z_n, +, , 0, 1), where + is addition modulo n and is multiplication modulo n--form a ring. Another example is the set R[x] of finite-degree polynomials in x with real coefficients under the usual operations--that is, (R[x], +, , 0, 1), where + is polynomial addition and is polynomial multiplication.

The following corollary shows that Theorem 31.7 generalizes naturally to rings.

If is a ring and n 1, then is a ring.

Proof The proof is left as Exercise 31.3-3.

Using this corollary, we can prove the following theorem.

Strassen's matrix-multiplication algorithm works properly over any ring of matrix elements.

Proof Strassen's algorithm depends on the correctness of the algorithm for 2 2 matrices, which requires only that the matrix elements belong to a ring. Since the matrix elements do belong to a ring, Corollary 31.8 implies the matrices themselves form a ring. Thus, by induction, Strassen's algorithm works correctly at each level of recursion.

Strassen's algorithm for matrix multiplication, in fact, depends critically on the existence of additive inverses. Out of the seven products P₁, P₂, . . . ,P₇, four involve differences of submatrices. Thus, Strassen's algorithm does not work in general for quasirings.

Boolean matrix multiplication

Strassen's algorithm cannot be used directly to multiply boolean matrices, since the boolean quasiring ({0,1}, , , 0, 1) is not a ring. There are instances in which a quasiring is contained in a larger system that is a ring. For example, the natural numbers (a quasiring) are a subset of the integers (a ring), and Strassen's algorithm can therefore be used to multiply matrices of natural numbers if we consider the underlying number system to be the integers. Unfortunately, the boolean quasiring cannot be extended in a similar way to a ring. (See Exercise 31.3-4.)

The following theorem presents a simple trick for reducing boolean matrix multiplication to multiplication over a ring. Problem 31-1 presents another efficient approach.

If M(n) denotes the number of arithmetic operations required to multiply two n n matrices over the integers, then two n n boolean matrices can be multiplied using O(M(n)) arithmetic operations.

Proof Let the two matrices be A and B, and let C = AB in the boolean quasiring, that is,

Instead of computing over the boolean quasiring, we compute the product C' over the ring of integers with the given matrix-multiplication algorithm that uses M(n) arithmetic operations. We thus have

Each term a_ikb_kj of this sum is 0 if and only if a_ik b_kj = 0, and 1 if and only if a_ik b_kj = 1. Thus, the integer sum is 0 if and only if every term is 0 or, equivalently, if and only if the boolean OR of the terms, which is c_ij, is 0. Therefore, the boolean matrix C can be reconstructed with (n²) arithmetic operations from the integer matrix C' by simply comparing each with 0. The number of arithmetic operations for the entire procedure is then O(M(n)) +(n²) = O(M(n)), since M(n) = (n²).

Thus, using Strassen's algorithm, we can perform boolean matrix multiplication in O(n ^{lg 7}) time.

The normal method of multiplying boolean matrices uses only boolean variables. If we use this adaptation of Strassen's algorithm, however, the final product matrix can have entries as large as n, thus requiring a computer word to store them rather than a single bit. More worrisome is that the intermediate results, which are integers, may grow even larger. One method for keeping intermediate results from growing too large is to perform all computations modulo n + 1. Exercise 31.3-5 asks you to show that working modulo n + 1 does not affect the correctness of the algorithm.

Fields

A ring is a field if it satisfies the following two additional properties:

6. The operator is commutative; that is, for all a, b S.

7. Every nonzero element in S has a multiplicative inverse; that is, for all , there exists an element b S such that .

Such an element b is often called the inverse of a and is denoted a^-1.

Examples of fields include the real numbers (R, +, , 0, 1), the complex numbers (C, +, , 0, 1), and the integers modulo a prime p: (Z_p, +, , 0, 1).

Because fields offer multiplicative inverses of elements, division is possible. They also offer commutativity. By generalizing from quasirings to rings, Strassen was able to improve the running time of matrix multiplication. Since the underlying elements of matrices are often from a field--the real numbers, for instance--one might hope that by using fields instead of rings in a Strassen-like recursive algorithm, the running time might be further improved.

This approach seems unlikely to be fruitful. For a recursive divide-and-conquer algorithm based on fields to work, the matrices at each step of the recursion must form a field. Unfortunately, the natural extension of Theorem 31.7 and Corollary 31.8 to fields fails badly. For n > 1, the set of n n matrices never forms a field, even if the underlying number system is a field. Multiplication of n n matrices is not commutative, and many n n matrices do not have inverses. Better algorithms for matrix multiplication are therefore more likely to be based on ring theory than on field theory.

Exercises

Does Strassen's algorithm work over the number system (Z[x], +, , 0, 1), where Z[x] is the set of all polynomials with integer coefficients in the variable x and + and are ordinary polynomial addition and multiplication?

Explain why Strassen's algorithm doesn't work over closed semirings (see Section 26.4) or over the boolean quasiring ({0, 1}, , , 0, 1).

Prove Theorem 31.7 and Corollary 31.8.

Show that the boolean quasiring ({0, 1}, , , 0, 1) cannot be embedded in a ring. That is, show that it is impossible to add a "-1" to the quasiring so that the resulting algebraic structure is a ring.

Argue that if all computations in the algorithm of Theorem 31.10 are performed modulo n + 1, the algorithm still works correctly.

Show how to determine efficiently if a given undirected input graph contains a triangle (a set of three mutually adjacent vertices).

Show that computing the product of two n n boolean matrices over the boolean quasiring is reducible to computing the transitive closure of a given directed 3n-vertex input graph.

Show how to compute the transitive closure of a given directed n-vertex input graph in time O(n^{lg 7} lg n). Compare this result with the performance of the TRANSITIVE-CLOSURE procedure in Section 26.2.

31.4 Solving systems of linear equations

Solving a set of simultaneous linear equations is a fundamental problem that occurs in diverse applications. A linear system can be expressed as a matrix equation in which each matrix or vector element belongs to a field, typically the real numbers R. This section discusses how to solve a system of linear equations using a method called LUP decomposition.

We start with a set of linear equations in n unknowns x₁, x₂, . . . , x_n:

a₁₁x₁ + a₁₂x₂ +    + a_1nx_n = b₁,

a₂₁x₁ + a₂₂x₂ +    + a_2nx_n = b₂,

(31.17)

a_n1x₁ + a_n2x₂ +    + a_nnx_n = b_n.

A set of values for x₁, x₂, . . . , x_n that satisfy all of the equations (31.17) simultaneously is said to be a solution to these equations. In this section, we only treat the case in which there are exactly n equations in n unknowns.

We can conveniently rewrite equations (31.17) as the matrix-vector equation

or, equivalently, letting A = (a_ij), x = (x_j), and b = (b_i), as

Ax = b .

(31.18)

If A is nonsingular, it possesses an inverse A^-1, and

x = A^-1b

(31.19)

is the solution vector. We can prove that x is the unique solution to equation (31.18) as follows. If there are two solutions, x and x', then Ax = Ax' = b and

x  =  (A^-1A)x

=  A^-1(Ax)

=  A^-1(Ax')

=  (A^-1A)x'

=  x'.

In this section, we shall be concerned predominantly with the case in which A is nonsingular or, equivalently (by Theorem 31.1), the rank of A is equal to the number n of unknowns. There are other possibilities, however, which merit a brief discussion. If the number of equations is less than the number n of unknowns--or, more generally, if the rank of A is less than n--then the system is underdetermined. An underdetermined system typically has infinitely many solutions (see Exercise 31.4-9), although it may have no solutions at all if the equations are inconsistent. If the number of equations exceeds the number n of unknowns, the system is overdetermined, and there may not exist any solutions. Finding good approximate solutions to overdetermined systems of linear equations is an important problem that is addressed in Section 31.6.

Let us return to our problem of solving the system Ax = b of n equations in n unknowns. One approach is to compute A^-1 and then multiply both sides by A^-1, yielding A^-1Ax = A^-1b, or x = A^-1b. This approach suffers in practice from numerical instability: round-off errors tend to accumulate unduly when floating-point number representations are used instead of ideal real numbers. There is, fortunately, another approach--LUP decomposition--that is numerically stable and has the further advantage of being about a factor of 3 faster.

Overview of LUP decomposition

The idea behind LUP decomposition is to find three n n matrices L, U, and P such that

PA = LU ,

(31.20)

L is a unit lower-triangular matrix,

U is an upper-triangular matrix, and

P is a permutation matrix.

We call matrices L, U, and P satisfying equation (31.20) an LUP decomposition of the matrix A. We shall show that every nonsingular matrix A possesses such a decomposition.

The advantage of computing an LUP decomposition for the matrix A is that linear systems can be solved more readily when they are triangular, as is the case for both matrices L and U. Having found an LUP decomposition for A, we can solve the equation (31.18) Ax = b by solving only triangular linear systems, as follows. Multiplying both sides of Ax = b by P yields the equivalent equation P Ax = Pb, which by Exercise 31.1-2 amounts to permuting the equations (31.17). Using our decomposition (31.20), we obtain

LUx = Pb .

We can now solve this equation by solving two triangular linear systems. Let us define y = Ux, where x is the desired solution vector. First, we solve the lower-triangular system

Ly = Pb

(31.21)

for the unknown vector y by a method called "forward substitution." Having solved for y, we then solve the upper-triangular system

Ux = y

(31.22)

for the unknown x by a method called "back substitution." The vector x is our solution to Ax = b, since the permutation matrix P is invertible (Exercise 31.1 -2):

Ax  =  P^-1 LUx

=  P^-1 Ly

=  P^-1 Pb

=  b .

Our next step is to show how forward and back substitution work and then attack the problem of computing the LUP decomposition itself.

Forward and back substitution

Forward substitution can solve the lower-triangular system (31.21) in (n²) time, given L, P, and b. For convenience, we represent the permutation P compactly by an array [1 . . n]. For i = 1, 2, . . . , n, the entry [i] indicates that P_i,[i] = 1 and P_ij = 0 for j [i]. Thus, PA has a[i],j in row i and column j, and Pb has b[i]as its ith element. Since L is unit lower-triangular, equation (31.21) can be rewritten as

y₁                             =  b_[1],

l₂₁y₁ + y₂=  b_[2],

l₃₁y₁ + l₃₂y₂ + y₃=  b_[3],

l_n1y₁ + l_n2y₂ + l_n3y₃ +    + y_n=  b_[n].

Quite apparently, we can solve for y₁ directly, since the first equation tells us that y₁ = b_[1]. Having solved for y₁, we can substitute it into the second equation, yielding

y₂ = b_[2]- l₂₁y₁.

Now, we can substitute both y₁ and y₂ into the third equation, obtaining

y₃ = b_[3] - (l₃₁y₁ + l₃₂y₂).

In general, we substitute y₁, y₂, . . . ,y_i-1 "forward" into the ith equation to solve for y_i:

Back substitution is similar to forward substitution. Given U and y, we solve the nth equation first and work backward to to the first equation. Like forward substitution, this process runs in (n²) time. Since U is upper-triangular, we can rewrite the system (31.22) as

u₁₁x₁ + u₁₂x₂ +    +   u_1,n-2x_n-2 +   u_1,n-1x_n-1  +    u_1nx_n = y₁,

u₂₂x₂ +    +   u_2,n-2x_n-2 +   u_2,n-1x_n-1  +     u_2nx_n = y₂,

un-2,n-2xn-2 + un-2,n-1xn-1 + un-2,nxn = yn-2,

un-1,n-1xn-1 + un-1,nxn = yn-1,

     u_n,nx_n = y_n .

Thus, we can solve for x_n, x_n-1, . . . , x₁ successively as follows:

x_n = y_n/u_nn ,

x_n-1 = (y_n-1 - u_n-1,nx_n)/u_n-1,n-1 ,

x_n-2 = (y_n-2 - (u_n-2,n-1x_n-1 + u_n-2,nx_n))/u_n-2,n-2 ,

or, in general,

Given P, L, U, and b, the procedure LUP-SOLVE solves for x by combining forward and back substitution. The pseudocode assumes that the dimension n appears in the attribute rows[L] and that the permutation matrix P is represented by the array .

Procedure LUP-SOLVE solves for y using forward substitution in lines 2-3, and then it solves for x using backward substitution in lines 4-5. Since there is an implicit loop in the summations within each of the for loops, the running time is (n²).

As an example of these methods, consider the system of linear equations defined by

and we wish to solve for the unknown x. The LUP decomposition is

(The reader can verify that PA = LU.) Using forward substitution, we solve Ly = Pb for y:

by computing first y₁, then y₂, and finally y₃. Using back substitution, we solve Ux = y for x:

thereby obtaining the desired answer

by computing first x₃, then x₂, and finally x₁.

Computing an LU decomposition

We have now shown that if an LUP decomposition can be computed for a nonsingular matrix A, forward and back substitution can be used to solve the system Ax = b of linear equations. It remains to show how an LUP decomposition for A can be found efficiently. We start with the case in which A is an n n nonsingular matrix and P is absent (or, equivalently, P = I_n). In this case, we must find a factorization A = LU. We call the two matrices L and U an LU decomposition of A.

The process by which we perform LU decomposition is called Gaussian elimination. We start by subtracting multiples of the first equation from the other equations so that the first variable is removed from those equations. Then, we subtract multiples of the second equation from the third and subsequent equations so that now the first and second variables are removed from them. We continue this process until the system that is left has an upper-triangular form--in fact, it is the matrix U. The matrix L is made up of the row multipliers that cause variables to be eliminated.

Our algorithm to implement this strategy is recursive. We wish to construct an LU decomposition for an n n nonsingular matrix A. If n = 1, then we're done, since we can choose L = I₁ and U = A. For n > 1, we break A into four parts:

where v is a size-(n - 1) column vector, w^T is a size-(n - 1) row vector, and A' is an (n - 1) (n - 1) matrix. Then, using matrix algebra (verify the equations by simply multiplying through), we can factor A as

The 0's in the first and second matrices of the factorization are row and column vectors, respectively, of size n - 1. The term vw^T/a₁₁, formed by taking the outer product of v and w and dividing each element of the result by a₁₁, is an (n - 1) (n - 1) matrix, which conforms in size to the matrix A' from which it is subtracted. The resulting (n - 1) (n - 1) matrix

A' - vw^T/a₁₁

(31.23)

is called the Schur complement of A with respect to a₁₁.

We now recursively find an LU decomposition of the Schur complement. Let us say that

A' - vw^T/a₁₁ = L'U' ,

where L' is unit lower-triangular and U' is upper-triangular. Then, using matrix algebra, we have

thereby providing our LU decomposition. (Note that because L' is unit lower-triangular, so is L, and because U'is upper-triangular, so is U.)

Of course, if a₁₁ = 0, this method doesn't work, because it divides by 0. It also doesn't work if the upper leftmost entry of the Schur complement A' - vw^T/a₁₁ is 0, since we divide by it in the next step of the recursion. The elements by which we divide during LU decomposition are called pivots, and they occupy the diagonal elements of the matrix U. The reason we include a permutation matrix P during LUP decomposition is that it allows us to avoid dividing by zero elements. Using permutations to avoid division by 0 (or by small numbers) is called pivoting.

An important class of matrices for which LU decomposition always works correctly is the class of symmetric positive-definite matrices. Such matrices require no pivoting, and thus the recursive strategy outlined above can be employed without fear of dividing by 0. We shall prove this result, as well as several others, in Section 31.6.

Our code for LU decomposition of a matrix A follows the recursive strategy, except that an iteration loop replaces the recursion. (This transformation is a standard optimization for a "tail-recursive" procedure--one whose last operation is a recursive call to itself.) It assumes that the dimension of A is kept in the attribute rows[A]. Since we know that the output matrix U has 0's below the diagonal, and since LU-SOLVE does not look at these entries, the code does not bother to fill them in. Likewise, because the output matrix L has 1's on its diagonal and 0's above the diagonal, these entries are not filled in either. Thus, the code computes only the "significant" entries of L and U.

The outer for loop beginning in line 2 iterates once for each recursive step. Within this loop, the pivot is determined to be u_kk = a_kk in line 3. Within the for loop in lines 4-6 (which does not execute when k = n), the v and w^Tvectors are used to update L and U. The elements of the v vector are determined in line 5, where v_i is stored in l_ik, and the elements of the w^T vector are determined in line 6, where w_i^T is stored in u_ki. Finally, the elements of the Schur complement are computed in lines 7-9 and stored back in the matrix A. Because line 9 is triply nested, LU-DECOMPOSITION runs in time (n³).

Figure 31.1 The operation of LU-DECOMPOSITION. (a) The matrix A. (b) The element a₁₁ = 2 in black is the pivot, the shaded column is v/a₁₁, and the shaded row is w^T. The elements of U computed thus far are above the horizontal line, and the elements of L are to the left of the vertical line. The Schur complement matrix A' - vw^T/a₁₁ occupies the lower right. (c) We now operate on the Schur complement matrix produced from part (b). The element a₂₂ = 4 in black is the pivot, and the shaded column and row are v/a₂₂ and w^T (in the partitioning of the Schur complement), respectively. Lines divide the matrix into the elements of U computed so far (above), the elements of L computed so far (left), and the new Schur complement (lower right). (d) The next step completes the factorization. (The element 3 in the new Schur complement becomes part of U when the recursion terminates.) (e) The factorization A = LU.

Figure 31.1 illustrates the operation of LU-DECOMPOSITION. It shows a standard optimization of the procedure in which the significant elements of L and U are stored "in place" in the matrix A. That is, we can set up a correspondence between each element a_ij and either l_ij (if i > j) or u_ij (if i j) and update the matrix A so that it holds both L and U when the procedure terminates. The pseudocode for this optimization is obtained from the above pseudocode merely by replacing each reference to l or u by a; it is not difficult to verify that this transformation preserves correctness.

Computing an LUP decomposition

Generally, in solving a system of linear equations Ax = b, we must pivot on off-diagonal elements of A to avoid dividing by 0. Not only is division by 0 undesirable, so is division by any small value, even if A is nonsingular, because numerical instabilities can result in the computation. We therefore try to pivot on a large value.

The mathematics behind LUP decomposition is similar to that of LU decomposition. Recall that we are given an n n nonsingular matrix A and wish to find a permutation matrix P, a unit lower-triangular matrix L, and an upper-triangular matrix U such that PA = LU. Before we partition the matrix A, as we did for LU decomposition, we move a nonzero element, say a_k1, from the first column to the (1,1) position of the matrix. (If the first column contains only 0's, then A is singular, because its determinant is 0, by Theorems 31.4 and 31.5.) In order to preserve the set of equations, we exchange row 1 with row k, which is equivalent to multiplying A by a permutation matrix Q on the left (Exercise 31.1-2). Thus, we can write QA as

where v = (a₂₁, a₃₁, . . . , a_n1)^T, except that a₁₁ replaces a_k1; w^T = (a_k2, a_k3, . . . , a_kn); and A' is an (n - 1) (n - 1) matrix. Since a_k1 0, we can now perform much the same linear algebra as for LU decomposition, but now guaranteeing that we do not divide by 0:

The Schur complement A' - vw^T/a_k1 is nonsingular, because otherwise the second matrix in the last equation has determinant 0, and thus the determinant of matrix A is 0; but this means that A is singular, which contradicts our assumption that A is nonsingular. Consequently, we can inductively find an LUP decomposition for the Schur complement, with unit lower-triangular matrix L', upper-triangular matrix U', and permutation matrix P', such that

P'(A' - vw^T/a_k1) = L'U' .

which is a permutation matrix, since it is the product of two permutation matrices (Exercise 31.1-2). We now have

yielding the LUP decomposition. Because L' is unit lower-triangular, so is L, and because U' is upper-triangular, so is U.

Notice that in this derivation, unlike the one for LU decomposition, both the column vector v/a_k1 and the Schur complement A' - vw^T/a_k1 must be multiplied by the permutation matrix P'.

Like LU-DECOMPOSITION, our pseudocode for LUP decomposition replaces the recursion with an iteration loop. As an improvement over a direct implementation of the recursion, we dynamically maintain the permutation matrix P as an array , where [i] = j means that the ith row of P contains a 1 in column j. We also implement the code to compute L and U "in place" in the matrix A. Thus, when the procedure terminates,

LUP-DECOMPOSITION(A)

1  n  rows[A]

2  for i  1 to n

3       do [i]  i

4  for k  1 to n - 1

5       do p  0

6          for i  k to n

7              do if |a_ik| > p

8                    then p  |a_ik|

9                         k'  i

10          if p = 0

11             then error "singular matrix"

12          exchange [k]  [k']

13          for i  1 to n

14               do exchange a_ki a_k'_i

15          for i  k + 1 to n

16               do a_ik  a_ik/a_kk

17                  for j  k + 1 to n

18                       do a_ij  a_ij - a_ika_kj

Figure 31.2 illustrates how LUP-DECOMPOSITION factors a matrix. The array is initialized by lines 2-3 to represent the identity permutation. The outer for loop beginning in line 4 implements the recursion. Each time through the outer loop, lines 5-9 determine the element a_k'k with largest absolute value of those in the current first column (column k) of the (n - k + l ) (n - k + 1) matrix whose LU decomposition must be found.

Figure 31.2 The operation of LUP-DECOMPOSITION. (a) The input matrix A with the identity permutation of the rows on the left. The first step of the algorithm determines that the element 5 in black in the third row is the pivot for the first column. (b) Rows 1 and 3 are swapped and the permutation is updated. The shaded column and row represent v and w^T. (c) The vector v is replaced by v/5, and the the lower right of the matrix is updated with the Schur complement. Lines divide the matrix into three regions: elements of U (above), elements of L (left), and elements of the Schur complement (lower right).(d)-(f) The second step.(g)-(i) The third step finishes the algorithm. (j) The LUP decomposition PA = LU.

If all elements in the current first column are zero, lines 10-11 report that the matrix is singular. To pivot, we exchange [k'] with [k] in line 12 and exchange the kth and k'th rows of A in lines 13-14, thereby making the pivot element a_kk. (The entire rows are swapped because in the derivation of the method above, not only is A' - vw^T/a_k1 multiplied by P', but so is v/a_k1.) Finally, the Schur complement is computed by lines 15-18 in much the same way as it is computed by lines 4-9 of LU-DECOMPOSITION, except that here the operation is written to work "in place."

Because of its triply nested loop structure, the running time of LUP-DECOMPOSITION is (n³), the same as that of LU-DECOMPOSITION. Thus, pivoting costs us at most a constant factor in time.

Exercises

Solve the equation

by using forward substitution.

Find an LU decomposition of the matrix

Why does the for loop in line 4 of LUP-DECOMPOSITION run only up to n - 1, whereas the corresponding for loop in line 2 of LU-DECOMPOSITION runs all the way to n?

Solve the equation

by using an LUP decomposition.

Describe the LUP decomposition of a diagonal matrix.

Describe the LUP decomposition of a permutation matrix A, and prove that it is unique.

Show that for all n 1, there exist singular n n matrices that have LU decompositions.

Show how we can efficiently solve a set of equations of the form Ax = b over the boolean quasiring ({0, 1}, , , 0, 1).

Suppose that A is an m n real matrix of rank m, where m < n. Show how to find a size-n vector x₀ and an m (n - m) matrix B of rank n - m such that every vector of the form x₀ + By, for y R^n-m, is a solution to the underdetermined equation Ax = b.

31.5 Inverting matrices

Although in practice we do not generally use matrix inverses to solve systems of linear equations, preferring instead to use more numerically stable techniques such as LUP decomposition, it is sometimes necessary to compute a matrix inverse. In this section, we show how LUP decomposition can be used to compute a matrix inverse. We also discuss the theoretically interesting question of whether the computation of a matrix inverse can be sped up using techniques such as Strassen's algorithm for matrix multiplication. Indeed, Strassen's original paper was motivated by the problem of showing that a set of a linear equations could be solved more quickly than by the usual method.

Computing a matrix inverse from an LUP decomposition

Suppose that we have an LUP decomposition of a matrix A in the form of three matrices L, U, and P such that PA = LU. Using LU-SOLVE, we can solve an equation of the form Ax = b in time (n²). Since the LUP decomposition depends on A but not b, we can solve a second set of equations of the form Ax = b' in additional time (n²). In general, once we have the LUP decomposition of A, we can solve, in time (kn²), k versions of the equation Ax = b that differ only in b.

AX = I_n

(31.24)

can be viewed as a set of n distinct equations of the form Ax = b. These equations define the matrix X as the inverse of A. To be precise, let X_i denote the ith column of X, and recall that the unit vector e_i is the ith column of I_n. Equation (31.24) can then be solved for X by using the LUP decomposition for A to solve each equation

AX_i = e_i

separately for X_i. Each of the n X_i can be found in time (n²), and so the computation of X from the LUP decomposition of A takes time (n³). Since the LUP decomposition of A can be computed in time (n³), the inverse A^-1 of a matrix A can be determined in time (n³).

Matrix multiplication and matrix inversion

We now show that the theoretical speedups obtained for matrix multiplication translate to speedups for matrix inversion. In fact, we prove something stronger: matrix inversion is equivalent to matrix multiplication, in the following sense. If M(n) denotes the time to multiply two n n matrices and I(n) denotes the time to invert a nonsingular n n matrix, then I(n) = (M(n)). We prove this result in two parts. First, we show that M(n) = O(I(n)), which is relatively easy, and then we prove that I(n) = O(M(n)).

If we can invert an n n matrix in time I(n), where I(n) = (n²) and I(n) satisfies the regularity condition I(3n) = O(I(n)), then we can multiply two n n matrices in time O(I(n)).

Proof Let A and B be n n matrices whose matrix product C we wish to compute. We define the 3n 3n matrix D by

The inverse of D is

and thus we can compute the product AB by taking the upper right n n submatrix of D^-1.

We can construct matrix D in (n²) = O(I(n)) time, and we can invert D in O(I(3n)) = O(I(n)) time, by the regularity condition on I(n). We thus have

M(n) = O(I(n)).

Note that I(n) satisfies the regularity condition whenever I(n) does not have large jumps in value. For example, if I(n) = (n^c lg^d n) for any constants c > 0, d 0, then I(n) satisfies the regularity condition.

Reducing matrix inversion to matrix multiplication

The proof that matrix inversion is no harder than matrix multiplication relies on some properties of symmetric positive-definite matrices that will be proved in Section 31.6.

If we can multiply two n n real matrices in time M(n), where M(n) = (n²) and M(n) = O(M(n + k)) for 0 k n, then we can compute the inverse of any real nonsingular n n matrix in time O(M(n)).

Proof We can assume that n is an exact power of 2, since we have

for any k > 0. Thus, by choosing k such that n + k is a power of 2, we enlarge the matrix to a size that is the next power of 2 and obtain the desired answer A^-1from the answer to the enlarged problem. The regularity condition on M(n) ensures that this enlargement does not cause the running time to increase by more than a constant factor.

For the moment, let us assume that the n n matrix A is symmetric and positive-definite. We partition A into four n/2 n/2 submatrices:

(31.25)

Then, if we let

S = D - CB^-1C^T

(31.26)

be the Schur complement of A with respect to B, we have

(31.27)

since AA^-1 = I_n, as can be verified by performing the matrix multiplication. The matrices B^-1 and S^-1 exist if A is symmetric and positive-definite, by Lemmas 31.13, 31.14, and 31.15 in Section 31.6, because both B and S are symmetric and positive-definite. By Exercise 31.1-3, B^-1C^T = (CB^-1)^T and B^-1C^TS-¹ = (S^-1CB^-1)^T. Equations (31.26) and (31.27) can therefore be used to specify a recursive algorithm involving 4 multiplications of n/2 n/2 matrices:

C  B^-1,

(CB^-1)  C^T,

S^-1 (CB^-1),

(CB^-1)^T  (S^-1 CB^-1).

Since we can multiply n/2 n/2 matrices using an algorithm for n n matrices, matrix inversion of symmetric positive-definite matrices can be performed in time

I(n)  2I(n/2) + 4M(n) + O(n²)

= 2I(n/2) + O(M(n))

= O(M(n)).

It remains to prove that the asymptotic running time of matrix multiplication can be obtained for matrix inversion when A is invertible but not symmetric and positive-definite. The basic idea is that for any nonsingular matrix A, the matrix A^TA is symmetric (by Exercise 31.1-3) and positive-definite (by Theorem 31.6). The trick, then, is to reduce the problem of inverting A to the problem of inverting A^TA.

The reduction is based on the observation that when A is an n n nonsingular matrix, we have

A^-1 = (A^TA)^-1A^T,

since ((A^TA)^-1 A^T)A = (A^TA)^-1(A^TA) = I_n and a matrix inverse is unique. Therefore, we can compute A^-1 by first multiplying A^T by A to obtain A^TA, then inverting the symmetric positive-definite matrix A^TA using the above divide-and-conquer algorithm, and finally multiplying the result by A^T. Each of these three steps takes O(M(n)) time, and thus any nonsingular matrix with real entries can be inverted in O(M(n)) time.

The proof of Theorem 31.12 suggests a means of solving the equation Ax = b without pivoting, so long as A is nonsingular. We multiply both sides of the equation by A^T, yielding (A^TA)x = A^Tb. This transformation doesn't affect the solution x, since A^T is invertible, so we can factor the symmetric positive-definite matrix A^TA by computing an LU decomposition. We then use forward and back substitution to solve for x with the right-hand side A^Tb. Although this method is theoretically correct, in practice the procedure LUP-DECOMPOSITION works much better. LUP decomposition requires fewer arithmetic operations by a constant factor, and it has somewhat better numerical properties.

Exercises

Let M(n) be the time to multiply n n matrices, and let S(n) denote the time required to square an n n matrix. Show that multiplying and squaring matrices have essentially the same difficulty: S(n) = (M(n)).

Let M(n) be the time to multiply n n matrices, and let L(n) be the time to compute the LUP decomposition of an n n matrix. Show that multiplying matrices and computing LUP decompositions of matrices have essentially the same difficulty: L(n) = (M(n)).

Let M(n) be the time to multiply n n matrices, and let D(n) denote the time required to find the determinant of an n n matrix. Show that finding the determinant is no harder than multiplying matrices: D(n) = O(M(n)).

Let M(n) be the time to multiply n n boolean matrices, and let T(n) be the time to find the transitive closure of n n boolean matrices. Show that M(n) = O(T(n)) and T(n) = O(M(n)lg n).

Does the matrix-inversion algorithm based on Theorem 31.12 work when matrix elements are drawn from the field of integers modulo 2? Explain.

Generalize the matrix-inversion algorithm of Theorem 31.12 to handle matrices of complex numbers, and prove that your generalization works correctly. (Hint: Instead of the transpose of A, use the conjugate transpose A*, which is obtained from the transpose of A by replacing every entry with its complex conjugate. Instead of symmetric matrices, consider Hermitian matrices, which are matrices A such that A = A*.)

31.6 Symmetric positive-definite matrices and least-squares approximation

Symmetric positive-definite matrices have many interesting and desirable properties. For example, they are nonsingular, and LU decomposition can be performed on them without our having to worry about dividing by 0. In this section, we shall prove several other important properties of symmetric positive-definite matrices and show an interesting application to curve fitting by a least-squares approximation.

The first property we prove is perhaps the most basic.

Any symmetric positive-definite matrix is nonsingular.

Proof Suppose that a matrix A is singular. Then by Corollary 31.3, there exists a nonzero vector x such that Ax = 0. Hence, x^TAx = 0, and A cannot be positive-definite.

The proof that we can perform LU decomposition on a symmetric positive-definite matrix A without dividing by 0 is more involved. We begin by proving properties about certain submatrices of A. Define the kth leading submatrix of A to be the matrix A_k consisting of the intersection of the first k rows and first k columns of A.

If A is a symmetric positive-definite matrix, then every leading submatrix of A is symmetric and positive-definite.

Proof That each leading submatrix A_k is symmetric is obvious. To prove that A_k is positive-definite, let x be a nonzero column vector of size k, and let us partition A as follows:

since A is positive-definite, and hence A_k is also positive-definite.

We now turn to some essential properties of the Schur complement. Let A be a symmetric positive-definite matrix, and let A_k be a leading k k submatrix of A. Partition A as

(31.28)

Then, the Schur complement of A with respect to Ak is defined to be

(31.29)

(By Lemma 31.14, Ak is symmetric and positive-definite; therefore, exists by Lemma 31.13, and S is well defined.) Note that our earlier definition (31.23) of the Schur complement is consistent with definition (31.29), by letting k = 1.

The next lemma shows that the Schur-complement matrices of symmetric positive-definite matrices are themselves symmetric and positive-definite. This result was used in Theorem 31.12, and its corollary is needed to prove the correctness of LU decomposition for symmetric positive-definite matrices.

If A is a symmetric positive-definite matrix and A_k is a leading data k k submatrix of A, then the Schur complement of A with respect to A_k is symmetric and positive-definite.

Proof That S is symmetric follows from Exercise 31.1-7. It remains to show that S is positive-definite. Consider the partition of A given in equation (31.28).

For any nonzero vector x, we have x^TAx > 0 by assumption. Let us break x into two subvectors y and z compatible with A_k and C, respectively. Because A_k^-1 exists, we have

(31.30)

by matrix magic. (Verify by multiplying through.) This last equation amounts to "completing the square" of the quadractic form. (See Exercise 31.6-2.)

Since x^TAx > 0 holds for any nonzero x, let us pick any nonzero z and then choose , which causes the first term in equation (31.30) to vanish, leaving

as the value of the expression. For any z 0, we therefore have z^TSz = x^TAx > 0, and thus S is positive-definite.

Corollary 31.16

LU decomposition of a symmetric positive-definite matrix never causes a division by 0.

Proof Let A be a symmetric positive-definite matrix. We shall prove something stronger than the statement of the corollary: every pivot is strictly positive. The first pivot is a₁₁. Let e₁ be the first unit vector, from which we obtain . Since the first step of LU decomposition produces the Schur complement of A with respect to A₁ = (a₁₁), Lemma 31.15 implies that all pivots are positive by induction.

Least-squares approximation

Fitting curves to given sets of data points is an important application of symmetric positive-definite matrices. Suppose that we are given a set of m data points

(x₁,y₁),(x₂,y₂),...,(x_m,y_m),

where the y_i are known to be subject to measurement errors. We would like to determine a function F (x) such that

y_i = F(x_i) + _i,

(31.31)

for i = 1, 2, . . . , m, where the approximation errors i are small. The form of the function F depends on the problem at hand. Here, we assume that it has the form of a linearly weighted sum,

where the number of summands n and the specific basis functions f_j are chosen based on knowledge of the problem at hand. A common choice is f_j(x) = x^j-1, which means that

F(x) = c₁ + c₂x + c₃x² +    + c_nxⁿ-1

is a polynomial of degree n - 1 in x.

By choosing n = m, we calculate each y_i exactly in equation (31.31). Such a high-degree F "fits the noise" as well as the data, however, and generally gives poor results when used to predict y for previously unseen values of x. It is usually better to choose n significantly smaller than m and hope that by choosing the coefficients c_j well, we can obtain a function F that finds the significantl patterns in the data points without paying undue attention to the noise. Some theoretical principles exist for choosing n, but they are beyond the scope of this text. In any case, once n is chosen, we end up with an overdetermined set of equations that we wish to solve as well as we can. We now show how this can be done.

denote the matrix of values of the basis functions at the given points; that is, a_ij = f_j(x_i). Let c = (c_k) denote the desired size-n vector of coefficients. Then,

is the size-m vector of "predicted values" for y. Thus,

 = Ac - y

is the size-m vector of approximation errors.

To minimize approximation errors, we choose to minimize the norm of the error vector , which gives us a least-squares solution, since

we can minimize |||| by differentiating ||||² with respect to each c_k and then setting the result to 0:

(31.32)

The n equations (31.32) for k = 1, 2, . . . , n are equivalent to the single matrix equation

(Ac -y)^TA = 0

or, equivalently (using Exercise 31.1-3), to

A^T(Ac - y) = 0 ,

A^TAc = A^Ty .

(31.33)

In statistics, this is called the normal equation. The matrix A^T A is symmetric by Exercise 31.1-3, and if A has full column rank, then A^TA is positive-definite as well. Hence, (A^TA)^-1 exists, and the solution to equation (31.33) is

c = ((A^TA)^-1A^T)y

= A⁺y ,

(31.34)

where the matrix A⁺ = ((A^TA)^-1A^T) is called the pseudoinverse of the matrix A. The pseudoinverese is a natural generalization of the notion of a matrix inverse to the case in which A is nonsquare. (Compare equation (31.34) as the approximate solution to Ac = y with the solution A^-1b as the exact solution to Ax = b.)

As an example of producing a least-squares fit, suppose that we have 5 data points

(-1,2),(1,1),(2,1),(3,0),(5,3),

shown as black dots in Figure 31.3. We wish to fit these points with a quadratic polynomial

F(x) = c₁ + c₂x + c₃x².

We start with the matrix of basis-function values

Figure 31.3 The least-squares fit of a quadratic polynomial to the set of data points {(-1, 2), (1, 1), (2, 1),(3, 0), (5, 3)}. The black dots are the data points, and the white dots are their estimated values predicted by the polynomial F(x) = 1.2 - 0.757x + 0.214x², the quadratic polynomial that minimizes the sum of the squared errors. The error for each data point is shown as a shaded line.

whose pseudoinverse is

Multiplying y by A⁺, we obtain the coefficient vector

which corresponds to the quadratic polynomial

F(x) = 1.200 - 0.757x + 0.214x²

as the closest-fitting quadratic to the given data, in a least-squares sense.

As a practical matter, we solve the normal equation (31.33) by multiplying y by A^T and then finding an LU decomposition of A^T A. If A has full rank, the matrix A^T A is guaranteed to be nonsingular, because it is symmetric and positive-definite. (See Exercise 31.1-3 and Theorem 31.6.)

Exercises

Prove that every diagonal element of a symmetric positive-definite matrix is positive.

Let be a 2 2 symmetric positive-definite matrix. Prove that its determinant ac - b² is positive by "completing the square" in a manner similar to that used in the proof of Lemma 31.15.

Prove that the maximum element in a symmetric positive-definite matrix lies on the diagonal.

Prove that the determinant of each leading submatrix of a symmetric positive-definite matrix is positive.

Let A_k denote the kth leading submatrix of a symmetric positive-definite matrix A. Prove that det(A_k)/det(A_k-1) is the kth pivot during LU decomposition, where by convention det(A₀) = 1.

Find the function of the form

F(x) = c₁ + c₂x lg x + c₃e^x

that is the best least-squares fit to the data points

(1,1),(2,1),(3,3),(4,8) .

Show that the pseudoinverse A⁺ satisfies the following four equations:

AA⁺ A =  A ,

A⁺ AA⁺=  A⁺,

(AA⁺)^T=  AA⁺,

(A⁺A)^T=  A⁺A .

Problems

31-1 Shamir's boolean matrix multiplication algorithm

In Section 31.3, we observed that Strassen's matrix-multiplication algorithm cannot be applied directly to boolean matrix multiplication because the boolean quasiring Q = ({0, 1}, , , 0, 1) is not a ring. Theorem 31.10 showed that if we used arithmetic operations on words of O(lg n) bits, we could nevertheless apply Strassen's method to multiply n n boolean matrices in O(n^{lg 7}) time. In this problem, we investigate a probabilistic method that uses only bit operations to achieve nearly as good a bound but with some small chance of error.

a. Show that R = ({0, 1}, , , 0, 1), where is the XOR (exclusive-or) function, is a ring.

Let A = (a_ij) and B = (b_ij) be n n boolean matrices, and let C = (c_ij) = AB in the quasiring Q. Generate A' = (a'_ij) from A using the following randomized procedure:

If a_ij = 0, then let a'_ij = 0.

If A_ij = 1, then let a'_ij = 1 with probability 1/2 and let a'_ij = 0 with probability 1/2. The random choices for each entry are independent.

b. Let C' = (c'_ij) = A'B in the ring R. Show that c_ij = 0 implies c'_ij = 0. Show that c_ij = 1 implies c'_ij = 1 with probability 1/2.

c. Show that for any > 0, the probability is at most /n² that a given c'_ij never takes on the value c_ij for lg(n²/) independent choices of the matrix A'. Show that the probability is at least 1 - that all c'_ij take on their correct values at least once.

d. Give an O(n^{lg 7}lg n)-time randomized algorithm that computes the product in the boolean quasiring Q of two n n matrices with probability at least 1 - 1/n^k for any constant k > 0. The only operations permitted on matrix elements are , , and .

31-2 Tridiagonal systems of linear equations

Consider the tridiagonal matrix

a. Find an LU decomposition of A.

b. Solve the equation Ax = (1 1 1 1 1)^T by using forward and back substitution.

c. Find the inverse of A.

d. Show that for any n n symmetric, positive-definite, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LU decomposition. Argue that any method based on forming A^-1 is asymptotically more expensive in the worst case.

e. Show that for any n n nonsingular, tridiagonal matrix A and any n-vector b, the equation Ax = b can be solved in O(n) time by performing an LUP decomposition.

A practical method for interpolating a set of points with a curve is to use cubic splines. We are given a set {(x_i, y_i): i = 0,1, . . . , n} of n + 1 point-value pairs, where x₀ < x₁ < < x_n. We wish to fit a piecewise-cubic curve (spline) f(x) to the points. That is, the curve f(x) is made up of n cubic polynomials f_i(x) = a_i + b_ix + c_ix² + d_ix³ for i = 0,1, . . . , n - 1, where if x falls in the range x_i x x_i+ 1, then the value of the curve is given by f(x) = f_i(x - x_i). The points x_i at which the cubic polynomials are "pasted" together are called knots. For simplicity, we shall assume that x_i = i for i = 0,1, . . . , n.

To ensure continuity of f(x), we require that

f(x_i)   =  f_i(0)  =  y_i,

f(x_i+1) =  f_i(1)  =  y_i+1

for i = 0, 1, . . . , n - 1. To ensure that f(x) is sufficiently smooth, we also insist that there be continuity of the first derivative at each knot:

for i = 0, 1, . . . , n - 1.

a. Suppose that for i = 0, 1, . . . , n, we are given not only the point-value pairs {(x_i, y_i)} but also the first derivatives D_i = f'(x_i) at each knot. Express each coefficient a_i, b_i, c_i, and d_i in terms of the values y_i, y_i+1, D_i, and D_i+1. (Remember that x_i = i.) How quickly can the 4n coefficients be computed from the point-value pairs and first derivatives?

The question remains of how to choose the first derivatives of f(x) at the knots. One method is to require the second derivatives to be continuous at the knots:

for i = 0, 1, . . . ,n - 1. At the first and last knots, we assume that ; these assumptions make f(x) a natural cubic spline.

b. Use the continuity constraints on the second derivative to show that for i = 1, 2, . . . , n - 1,

D_i- 1 + 4D_i + D_i+ 1 = 3(y_i+ 1 - y_i- 1) .

(31.35)

2D₀+ D₁ = 3(y₁ - y₀) ,

(31.36)

D_n- 1 + 2D_n = 3(y_n - y_n- 1) .

(31.37)

d. Rewrite equations (31.35)--(31.37) as a matrix equation involving the vector D = D₀, D₁, . . . , D_n of unknowns. What attributes does the matrix in your equation have?

e. Argue that a set of n + 1 point-value pairs can be interpolated with a natural cubic spline in O(n) time (see Problem 31-2).

f. Show how to determine a natural cubic spline that interpolates a set of n + 1 points (x_i, y_i) satisfying x₀ < x₁ < < x_n, even when x_i is not necessarily equal to i. What matrix equation must be solved, and how quickly does your algorithm run?

Chapter notes

There are many excellent texts available that describe numerical and scientific computation in much greater detail than we have room for here. The following are especially readable: George and Liu [81], Golub and Van Loan [89], Press, Flannery, Teukolsky, and Vetterling[161, 162], and Strang[181, 182].

The publication of Strassen's algorithm in 1969 [183] caused much excitement. Before then, it was hard to imagine that the naive algorithm could be improved upon. The asymptotic upper bound on the difficulty of matrix multiplication has since been considerably improved. The most asymptotically efficient algorithm for multiplying n n matrices to date, due to Coppersmith and Winograd [52], has a running time of O(n^2.376). The graphical presentation of Strassen's algorithm is due to Paterson [155]. Fischer and Meyer [67] adapted Strassen's algorithm to boolean matrices (Theorem 31.10) .

Gaussian elimination, upon which the LU and LUP decompositions are based, was the first systematic method for solving linear systems of equations. It was also one of the earliest numerical algorithms. Although it was known earlier, its discovery is commonly attributed to C. F. Gauss (1777-1855). In his famous paper [183], Strassen also showed that an n n matrix can be inverted in O(n^{lg 7}) time. Winograd [203] originally proved that matrix multiplication is no harder than matrix inversion, and the converse is due to Aho, Hopcroft, and Ullman [4].

Strang [182] has an excellent presentation of symmetric positive-definite matrices and on linear algebra in general. He makes the following remark on page 334: "My class often asks about unsymmetric positive definite matrices. I never use that term."

Go to Chapter 32 Back to Table of Contents