Skip to main content

Section 8 Matrix Operations

Subsection Application: Algorithms for Matrix Multiplication

Matrix multiplication is widely used in applications ranging from scientific computing and pattern recognition to counting paths in graphs. As a consequence, much work is being done in developing efficient algorithms for matrix multiplication.

We will see that a matrix product can be calculated through the row-column method. Recall that the product of two \(2 \times 2\) matrices \(A = \left[ \begin{array}{cc} a_{11}\amp a_{12}\\a_{21}\amp a_{22} \end{array} \right]\) and \(B = \left[ \begin{array}{cc} b_{11}\amp b_{12}\\b_{21}\amp b_{22} \end{array} \right]\) is given by

\begin{equation*} AB = \left[ \begin{array}{cc} a_{11}b_{11}+a_{12}b_{21} \amp a_{11}b_{12}+a_{12}b_{22} \\ a_{21}b_{11}+a_{22}b_{21} \amp a_{21}b_{12}+a_{22}b_{22} \end{array} \right]\text{,} \end{equation*}

This product involves eight scalar multiplications and some scalar additions. As we will see, multiplication is more computationally expensive than addition, so we will focus on multiplication. In 1969, a German mathematician named Volker Strassen showed 13  that the product of two \(2 \times 2\) matrices can be calculated using only seven multiplications. While this is not much of an improvement, the Strassen algorithm can be applied to larger matrices, using matrix partitions (which allow for parallel computation), and its publication led to additional research on faster algorithms for matrix multiplication. More details are provided later in this section.

Subsection Introduction

A vector is a list of numbers in a specified order and a matrix is an ordered array of objects. In fact, a vector can be thought of as a matrix of size \(n \times 1\text{.}\) Vectors and matrices are so alike in this way that it would seem natural that we can define operations on matrices just as we did with vectors.

Recall that a matrix is made of rows and columns — the entries reading from left to right form the rows of the matrix and the entries reading from top to bottom form the columns. The number of rows and columns of a matrix is called the size of the matrix, so an \(m \times n\) matrix has \(m\) rows and \(n\) columns. If we label the entry in the \(i\)th row and \(j\)th column of a matrix \(A\) as \(a_{ij}\text{,}\) then we write \(A = [a_{ij}]\text{.}\)

We can generalize the operations of addition and scalar multiplication on vectors to matrices similarly. Given two matrices \(A=[a_{ij}]\) and \(B=[b_{ij}]\) of the same size, we define the sum \(A+B\) by

\begin{equation*} A+B= [ a_{ij}+b_{ij} ] \end{equation*}

when the sizes of the matrices \(A\) and \(B\) match. In other words, for matrices of the same size the matrix addition is defined by adding corresponding entries in the matrices. For example,

\begin{equation*} \left[ \begin{array}{rc} 1 \amp 2 \\ -2 \amp 3 \end{array} \right] + \left[ \begin{array}{cc} 1 \amp 3 \\ 2 \amp 4 \end{array} \right] = \left[ \begin{array}{cc} 2 \amp 5 \\ 0 \amp 7 \end{array} \right] \,\text{.} \end{equation*}

We define the scalar multiple of a matrix \(A=[a_{ij}]\) by scalar \(c\) to be the matrix \(cA\) defined by

\begin{equation*} cA= [ ca_{ij}]\text{,} \end{equation*}

This means that we multiply each entry of the matrix \(A\) by the scalar \(c\text{.}\) As an example,

\begin{equation*} 3 \left[ \begin{array}{rc} 1 \amp 2 \\ -2 \amp 3 \end{array} \right] = \left[ \begin{array}{rc} 3 \amp 6 \\ -6 \amp 9 \end{array} \right] \,\text{.} \end{equation*}

Even though we did not have a multiplication operation on vectors, we had a matrix-vector product, which is a special case of a matrix-matrix product since a vector is a matrix with one column. However, generalizing the matrix-vector product to a matrix-matrix product is not immediate as it is not immediately clear what we can do with the other columns. We will consider this question in this section.

Note that all of the matrix operations can be performed on a calculator. After entering each matrix in the calculator, just use \(+\text{,}\) \(-\) and × operations to find the result of the matrix operation. (Just for fun, try using \(\div\) with matrices to see if it will work.)

Preview Activity 8.1.

(a)

Pick three different varying sizes of pairs of \(A, B\) matrices which can be added. For each pair:

(i)

Find the matrices \(A+B\) and \(B+A\text{.}\)

(ii)

How are the two matrices \(A+B\) and \(B+A\) related? What does this tell us about matrix addition?

(b)

Let \(A = \left[ \begin{array}{rc} 1 \amp 0 \\ -2 \amp 8 \end{array} \right]\text{,}\) \(B = \left[ \begin{array}{cc} 1 \amp 1 \\ 3 \amp 4 \end{array} \right]\text{,}\) and \(C = \left[ \begin{array}{cr} 0 \amp -5 \\ 1 \amp 6 \end{array} \right]\text{.}\) Determine the entries of the matrix \(A + 2B - 7C\text{.}\)

(c)

Now we turn to multiplication of matrices. Our first goal is to find out what conditions we need on the sizes of matrices \(A\) and \(B\) if the matrix-matrix product \(AB\) is defined and what the size of the resulting product matrix is. We know the condition and the size of the result in the special case of \(B\) being a vector, i.e., a matrix with one column. So our conjectures for the general case should match what we know in the special case. In each part of this problem, use any appropriate tool (e.g., your calculator, Maple, Mathematica, Wolfram\(|\)Alpha) to determine the matrix product \(AB\text{,}\) if it exists. If you obtain a product, write it down and explain how its size is related to the sizes of \(A\) and \(B\text{.}\) If you receive an error, write down the error and guess why the error occurred and/or what it means.

(i)

\(A = \left[ \begin{array}{ccc} 1\amp 2\amp 0 \\ 0\amp 1\amp 1 \end{array} \right] \ \ \ \text{ and } \ \ \ B = \left[ \begin{array}{crc} 3\amp 5\amp 0\\0\amp -2\amp 1 \end{array} \right]\)

(ii)

\(A = \left[ \begin{array}{ccc} 1\amp 2\amp 0 \\ 0\amp 1\amp 1 \end{array} \right] \ \ \ \text{ and } \ \ \ B = \left[ \begin{array}{crc} 3\amp 0\\5\amp -2 \\ 0\amp 1 \end{array} \right]\)

(iii)

\(A = \left[ \begin{array}{cc} 1 \amp 2 \\ 3 \amp 4 \end{array} \right] \ \ \ \text{ and } \ \ \ B = \left[ \begin{array}{ccc} 1 \amp 1 \amp 1 \\1 \amp 0 \amp 1 \\ 0 \amp 2 \amp 0 \end{array} \right]\)

(iv)

\(A = \left[ \begin{array}{cc} 1 \amp 2 \\ 3 \amp 4 \\ 5 \amp 6 \\ 7 \amp 8 \end{array} \right] \ \ \ \text{ and } \ \ \ B = \left[ \begin{array}{rcc} 1 \amp 2 \amp 3 \\ -1 \amp 1 \amp 1 \end{array} \right]\)

(v)

Make a guess for the condition on the sizes of two matrices \(A, B\) for which the product \(AB\) is defined. How is the size of the product matrix related to the sizes of \(A\) and \(B\text{?}\)

(d)

The final matrix products in part c, when defined, might seem unrelated to the individual matrices at first. In this problem, we will uncover this relationship using our knowledge of the matrix-vector product. Let \(A = \left[ \begin{array}{rr} 3 \amp -1 \\ -2 \amp 3 \end{array} \right]\) and \(B = \left[ \begin{array}{ccc} 0 \amp 2 \amp 1 \\ 1 \amp 3 \amp 2 \end{array} \right]\text{.}\)

(i)

Calculate \(AB\) using any tool.

(ii)

Using the matrix-vector product, calculate \(A \vx\) where \(\vx\) is the first column (i.e., calculate \(A\begin{bmatrix}0 \\ 1 \end{bmatrix}\)), and then the second column of \(B\) (i.e., calculate \(A\begin{bmatrix}2 \\ 3 \end{bmatrix}\)), and then the third column of \(B\) (i.e., calculate \(A\begin{bmatrix}1 \\ 2 \end{bmatrix}\)). Do you notice these output vectors within \(AB\text{?}\)

(iii)

Describe as best you can a definition of \(AB\) using the matrix-vector product.

Subsection Properties of Matrix Addition and Multiplication by Scalars

Just as we were able to define an algebra of vectors with addition and multiplication by scalars, we can define an algebra of matrices. We will see that the properties of these operations on matrices are immediate generalizations of the properties of the operations on vectors. We will then see how the matrix product arises through the connection of matrices to linear transformations. Finally, we define the transpose of a matrix. The transpose of a matrix will be useful in applications such as graph theory and least-squares fitting of curves, as well as in advanced topics such as inner product spaces and the dual space of a vector space.

We learned in Preview Activity 8.1 that we can add two matrices of the same size together by adding corresponding entries and we can multiply any matrix by a scalar by multiplying each entry of the matrix by that scalar. More generally, if \(A = [a_{ij}]\) and \(B = [b_{ij}]\) are \(m \times n\) matrices and \(c\) is any scalar, then

\begin{equation*} A + B = [a_{ij}+b_{ij}] \ \ \text{ and } \ \ cA = [ca_{ij}]\text{.} \end{equation*}

As we have done each time we have introduced a new operation, we ask what properties the operation has. For example, you determined in Preview Activity 8.1 that addition of matrices is a commutative operation. More specifically, for every two \(m\times n\) matrices \(A\) and \(B\text{,}\) \(A+B=B+A\text{.}\) We can use similar arguments to verify the following properties of matrix addition and multiplication by scalars. Notice that these properties are very similar to the properties of addition and scalar multiplication of vectors we discussed earlier. This should come as no surprise since the \(n\)-dimensional vectors are \(n\times 1\) matrices. In a strange twist, we will see that matrices themselves can be considered as vectors when we discuss vector spaces in a later section.

Later on, we will see that these properties define the set of all \(m \times n\) matrices as a vector space. These properties just say that, regarding addition and multiplication by scalars, we can manipulate matrices just as we do real numbers. Note, however, we have not yet defined an operation of multiplication on matrices. That is the topic for the next section.

Subsection A Matrix Product

As we saw in Preview Activity 8.1, a matrix-matrix product can be found in a way which makes use of and also generalizes the matrix-vector product.

Definition 8.2.

The matrix product of a \(k \times m\) matrix \(A\) and an \(m \times n\) matrix \(B = [\vb_1 \ \vb_2 \ \cdots \ \vb_n]\) with columns \(\vb_1\text{,}\) \(\vb_2\text{,}\) \(\ldots\text{,}\) \(\vb_n\) is the \(k \times n\) matrix

\begin{equation*} [A\vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{.} \end{equation*}

We now consider the motivation behind this definition by thinking about the matrix transformations corresponding to each of the matrices \(A, B\) and \(AB\text{.}\) Recall that left multiplication by an \(m \times n\) matrix \(B\) defines a transformation \(T\) from \(\R^n\) to \(\R^m\) by \(T(\vx)=B\vx\text{.}\) The domain of \(T\) is \(\R^n\) because the number of components of \(\vx\) have to match the number of entries in each of row of \(B\) in order for the matrix-vector product \(B\vx\) to be defined. Similarly, a \(k \times m\) matrix \(A\) defines a transformation \(A\) from \(\R^m\) to \(\R^k\text{.}\) Since transformations are functions, we can compose them as long as the output vectors of the inside transformation lie in the domain of the outside transformation. Therefore if \(T\) is the inside transformation and \(S\) is the outside transformation, the composition \(S\circ T\) is defined. So a natural question to ask is if we are given

  • a transformation \(T\) from \(\R^n\) to \(\R^m\) where \(T(\vx) = B \vx\) for an \(m \times n\) matrix \(B\) and

  • a transformation \(S\) from \(\R^m\) to \(\R^k\) with \(S(\vy) = A \vy\) for some \(k \times m\) matrix \(A\text{,}\)

is there a matrix that represents the transformation \(S \circ T\) defined by \((S\circ T)(\vx)=S(T(\vx))\text{?}\) We investigate this question in the next activity in the special case of a \(2\times 3\) matrix \(A\) and a \(3\times 2\) matrix \(B\text{.}\)

Activity 8.2.

In this activity, we look for the meaning of the matrix product from a transformation perspective. Let \(S\) and \(T\) be matrix transformations defined by

\begin{equation*} S(\vy) = A \vy \ \ \ \text{ and } \ \ \ T(\vx) = B \vx\text{,} \end{equation*}

where

\begin{equation*} A = \left[ \begin{array}{ccc} 1\amp 2\amp 0 \\ 0\amp 1\amp 1 \end{array} \right] \ \ \ \text{ and } \ \ \ B = \left[ \begin{array}{cr} 3\amp 0\\5\amp -2 \\ 0\amp 1 \end{array} \right]\text{.} \end{equation*}
(a)

What are the domains and codomains of \(S\) and \(T\text{?}\) Why is the composite transformation \(S \circ T\) defined? What is the domain of \(S\circ T\text{?}\) What is the codomain of \(S\circ T\text{?}\) (Recall that \(S \circ T\) is defined by \((S \circ T)(\vx) = S(T(\vx))\text{,}\) i.e., we substitute the output \(T(\vx)\) as the input into the transformation \(S\text{.}\))

(b)

Let \(\vx = \left[ \begin{array}{c} x \\ y \end{array} \right]\text{.}\) Determine the components of \(T(\vx)\text{.}\)

(c)

Find the components of \(S\circ T(\vx)=S(T(\vx))\text{.}\)

(d)

Find a matrix \(C\) so that \(S(T(\vx)) = C\vx\text{.}\)

(e)

Use the definition of composition of transformations and the definitions of the \(S\) and \(T\) transformations to explain why it is reasonable to define \(AB\) to be the matrix \(C\text{.}\) Does the matrix \(C\) agree with the

\begin{equation*} AB = \left[ \begin{array}{cr} 13 \amp -4 \\ 5 \amp -1 \end{array} \right] \end{equation*}

you found in Preview Activity 8.1 using technology?

We now consider this result in the general case of a \(k\times m\) matrix \(A\) and an \(m \times n\) matrix \(B\text{,}\) where \(A\) and \(B\) define matrix transformations \(S\) and \(T\text{,}\) respectively. In other words, \(S\) and \(T\) are matrix transformations defined by \(S(\vx) = A\vx\) and \(T(\vx) = B\vx\text{.}\) The domain of \(S\) is \(\R^m\) and the codomain is \(\R^k\text{.}\) The domain of \(T\) is \(\R^n\) and the codomain is \(\R^m\text{.}\) The composition \(S\circ T\) is defined because the output vectors of \(T\) are in \(\R^m\) and they lie in the domain of \(S\text{.}\) The domain of \(S\circ T\) is the same as the domain of \(T\) since the input vectors first go through the \(T\) transformation. The codomain of \(S\circ T\) is the same as the codomain of \(S\) since the final output vectors are produced by applying the \(S\) transformation.

Let us see how we can obtain the matrix corresponding to the transformation \(S\circ T\text{.}\) Let \(B = \left[ \vb_1 \ \vb_2 \ \cdots \ \vb_n \right]\text{,}\) where \(\vb_j\) is the \(j\)th column of \(B\text{,}\) and let \(\vx = \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right]\text{.}\) Recall that the matrix vector product \(B\vx\) is the linear combination of the columns of \(B\) with the corresponding weights from \(\vx\text{.}\) So

\begin{equation*} T(\vx) = B\vx = x_1 \vb_1 + x_2 \vb_2 + \cdots + x_n \vb_n\text{.} \end{equation*}

Note that each of the \(\vb_j\) vectors are in \(\R^m\) since \(B\) is an \(m\times n\) matrix. Therefore, each of these vectors can be multiplied by matrix \(A\) and we can evaluate \(S(B\vx)\text{.}\) Therefore, \(S\circ T\) is defined and

\begin{equation} (S \circ T)(\vx) = S(T(\vx))= A(B\vx) = A\left( x_1 \vb_1 + x_2 \vb_2 + \cdots + x_n \vb_n\right)\text{.}\tag{8.1} \end{equation}

The properties of matrix-vector products show that

\begin{equation} A\left( x_1 \vb_1 + x_2 \vb_2 + \cdots + x_n \vb_n\right) = x_1 A\vb_1 + x_2 A\vb_2 + \cdots + x_n A\vb_n\text{.}\tag{8.2} \end{equation}

This expression is a linear combination of \(A\vb_i\)'s with \(x_i\)'s being the weights. Therefore, if we let \(C\) be the matrix with columns \(A \vb_1\text{,}\) \(A\vb_2\text{,}\) \(\ldots\text{,}\) \(A\vb_n\text{,}\) that is

\begin{equation*} C = [A \vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{,} \end{equation*}

then

\begin{equation} x_1 A\vb_1 + x_2 A\vb_2 + \cdots + x_n A\vb_n = C \vx\tag{8.3} \end{equation}

by definition of the matrix-vector product. Combining equations (8.1), (8.2), and (8.3) shows that

\begin{equation*} (S \circ T)(\vx) = C \vx \end{equation*}

where \(C = [A \vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{.}\)

Also note that since \(T(\vx)=B\vx\) and \(S(\vy)=A\vy\text{,}\) we find

\begin{equation} (S\circ T)(\vx)= S(T(\vx))= S(B\vx)=A(B(\vx)) \,\text{.}\tag{8.4} \end{equation}

Since the matrix representing the transformation \(S\circ T\) is the matrix

\begin{equation*} [A \vb_1 \ A\vb_2 \ \cdots \ A\vb_n] \end{equation*}

where \(\vb_1\text{,}\) \(\vb_2\text{,}\) \(\ldots\text{,}\) \(\vb_n\) are the columns of the matrix \(B\text{,}\) it is natural to define \(AB\) to be this matrix in light of equation (8.4).

Matrix multiplication has some properties that are unfamiliar to us as the next activity illustrates.

Activity 8.3.

Let \(A~=~\left[ \begin{array}{rr} 3 \amp -1 \\ -2 \amp 6 \end{array} \right]\text{,}\) \(B~=~\left[ \begin{array}{cc} 0 \amp 2 \\ 1 \amp 3 \end{array} \right]\text{,}\) \(C~=~\left[ \begin{array}{cc} 1 \amp 1 \\ 1 \amp 1 \end{array} \right]\text{,}\) \(D~=~\left[ \begin{array}{rr} 3 \amp -3 \\ -3 \amp 3 \end{array} \right]\) and \(E~=~\left[ \begin{array}{cc} 1 \amp 0 \\ 0 \amp 1 \end{array} \right]\text{.}\)

(a)

Find the indicated products (by hand or using a calculator).

\begin{equation*} AB \qquad BA \qquad DC \qquad AC \qquad BC \qquad AE \qquad EB \end{equation*}
(b)

Is matrix multiplication commutative? Explain.

(c)

Is there an identity element for matrix multiplication? In other words, is there a matrix \(I\) for which \(AI=IA=A\) for any matrix \(A\text{?}\) Explain.

(d)

If \(a\) and \(b\) are real numbers with \(ab=0\text{,}\) then we know that either \(a=0\) or \(b=0\text{.}\) Is this same property true with matrix multiplication? Explain.

(e)

If \(a\text{,}\) \(b\text{,}\) and \(c\) are real numbers with \(c \neq 0\) and \(ac = bc\text{,}\) we know that \(a=b\text{.}\) Is this same property true with matrix multiplication? Explain.

As we saw in Activity 8.3, there are matrices \(A, B\) for which \(AB\neq BA\text{.}\) On the other hand, there are matrices for which \(AB=BA\text{.}\) For example, this equality will always hold for a square matrix \(A\) and if \(B\) is the identity matrix of the same size. It also holds if \(A=B\text{.}\) If the equality \(AB=BA\) holds, we say that matrices \(A\) and \(B\) commute. So the identity matrix commutes with all square matrices of the same size and every matrix \(A\) commutes with \(A^k\) for any power \(k\text{.}\)

There is an alternative method of calculating a matrix product that we will often use that we illustrate in the next activity. This alternate version depends on the product of a row matrix with a vector. Suppose \(A = [a_1 \ a_2 \ \cdots \ a_n]\) is a \(1 \times n\) matrix and \(\vx = \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right]\) is an \(n \times 1\) vector. Then the product \(A \vx\) is the \(1 \times 1\) vector

\begin{equation*} [a_1 \ a_2 \ \cdots \ a_n]\left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right] = [a_1x_1+a_2x_2 + \cdots + a_nx_n]\text{.} \end{equation*}

In this situation, we usually identify the \(1 \times 1\) matrix with its scalar entry and write

\begin{equation} [a_1 \ a_2 \ \cdots \ a_n] \cdot \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right] = a_1x_1+a_2x_2 + \cdots + a_nx_n\text{.}\tag{8.5} \end{equation}

The product \(\cdot\) in (8.5) is called the scalar or dot product of \([a_1 \ a_2 \ \cdots \ a_n]\) with \(\left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \end{array} \right]\text{.}\)

Activity 8.4.

Let \(A = \left[ \begin{array}{crr} 1\amp -1\amp 2 \\ 3\amp 0\amp -4 \\ 2\amp -5\amp 1 \end{array} \right]\) and \(B = \left[ \begin{array}{cr} 4\amp -2 \\ 6\amp 0 \\ 1\amp 3 \end{array} \right]\text{.}\)

Let \(\va_i\) be the \(i\)th row of \(A\) and \(\vb_j\) the \(j\)th column of \(B\text{.}\) For example, \(\va_1=[ \, 1 \; -1 \; 2 \, ]\) and \(\vb_2 = \left[ \begin{array}{r} -2 \\ 0 \\ 3 \end{array} \right]\text{.}\)

Calculate the entries of the matrix \(C\text{,}\) where

\begin{equation*} C = \left[ \begin{array}{cc} \va_1 \cdot \vb_1 \amp \va_1 \cdot \vb_2 \\ \va_2 \cdot \vb_1 \amp \va_2 \cdot \vb_2 \\ \va_3 \cdot \vb_1 \amp \va_3 \cdot \vb_2 \end{array} \right]\,\text{,} \end{equation*}

where \(\va_i \cdot \vb_j\) refers to the scalar product of row \(i\) of \(A\) with column \(j\) of \(B\text{.}\) 14  Compare your result with the result of \(AB\) calculated via the product of \(A\) with the columns of \(B\text{.}\)

Activity 8.4 shows that these is an alternate way to calculate a matrix product. To see how this works in general, let \(A = [a_{ij}]\) be a \(k \times m\) matrix and \(B = [\vb_1 \ \vb_2 \ \cdots \ \vb_n]\) an \(m \times n\) matrix. We know that

\begin{equation*} AB = [A\vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{.} \end{equation*}

Now let \(\vr_1\text{,}\) \(\vr_2\text{,}\) \(\ldots\text{,}\) \(\vr_k\) be the rows of \(A\) so that \(A = \left[ \begin{array}{c} \vr_1 \\ \vr_2 \\ \vdots \\ \vr_k \end{array} \right]\text{.}\) First we argue that if \(\vx = \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_m \end{array} \right]\text{,}\) then

\begin{equation*} A \vx = \left[ \begin{array}{c} \vr_1 \cdot \vx \\ \vr_2 \cdot \vx \\ \vdots \\ \vr_k \cdot \vx \end{array} \right]\text{.} \end{equation*}

This is the scalar product (or dot product) definition of the matrix-vector product.

To show that this definition gives the same result as the linear combination definition of matrix-vector product, we first let \(A = [\vc_1 \ \vc_2 \ \cdots \ \vc_m]\text{,}\) where \(\vc_1\text{,}\) \(\vc_2\text{,}\) \(\ldots\text{,}\) \(\vc_m\) are the columns of \(A\text{.}\) By our linear combination definition of the matrix-vector product, we obtain

\begin{align*} A \vx \amp = x_1\vc_1 + x_2 \vc_2 + \cdots + x_m \vc_m\\ \amp = x_1 \left[ \begin{array}{c} a_{11}\\ a_{21}\\ \vdots\\ a_{k1} \end{array} \right] + x_2 \left[ \begin{array}{c} a_{12}\\ a_{22}\\ \vdots\\ a_{k2} \end{array} \right] + \cdots + x_m \left[ \begin{array}{c} a_{1m}\\ a_{2m}\\ \vdots\\ a_{km} \end{array} \right]\\ \amp = \left[ \begin{array}{c} a_{11}x_1+a_{12}x_2+ \cdots + a_{1m}x_m\\ a_{21}x_1+a_{22}x_2+ \cdots + a_{2m}x_m\\ \vdots\\ a_{k1}x_1+a_{k2}x_2+ \cdots + a_{km}x_m \end{array} \right]\\ \amp = \left[ \begin{array}{c} \vr_1 \cdot \vx\\ \vr_2 \cdot \vx\\ \vdots\\ \vr_k \cdot \vx \end{array} \right]\text{.} \end{align*}

Therefore, the above work shows that both linear combination and scalar product definitions give the same matrix-vector product.

Applying this to the matrix product \(AB\) defined in terms of the matrix-vector product, we see that

\begin{equation*} A \vb_j = \left[ \begin{array}{c} \vr_1 \cdot \vb_j \\ \vr_2 \cdot \vb_j \\ \vdots \\ \vr_k \cdot \vb_j \end{array} \right]\text{.} \end{equation*}

So the \(i,j\)th entry of the matrix product \(AB\) is found by taking the scalar product of the \(i\)th row of \(A\) with the \(j\)th column of \(B\text{.}\) In other words,

\begin{equation*} (AB)_{ij} = \vr_i \cdot \vb_j \end{equation*}

where \(\vr_i\) is the \(i\)th row of \(A\) and \(\vb_j\) is the \(j\)th column of \(B\text{.}\)

Subsection Properties of Matrix Multiplication

Activity 8.3 shows that we must be very careful not to assume that matrix multiplication behaves like multiplication of real numbers. However, matrix multiplication does satisfy some familiar properties. For example, we now have an addition and multiplication of matrices under certain conditions, so we might ask if matrix multiplication distributes over matrix addition. To answer this question we take two arbitrary \(k \times m\) matrices \(A\) and \(B\) and an arbitrary \(m \times n\) matrix \(C = [\vc_1 \ \vc_2 \ \cdots \ \vc_n]\text{.}\) Then

\begin{align*} (A+B)C \amp = [(A+B)\vc_1 \ (A+B)\vc_2 \ \cdots \ (A+B)\vc_n]\\ \amp = [A\vc_1+B\vc_1 \ A\vc_2+B\vc_2 \ \cdots \ A\vc_n+B\vc_n]\\ \amp = [A\vc_1 \ A\vc_2 \ \cdots \ A\vc_n] + [B\vc_1 \ B\vc_2 \ \cdots \ B\vc_n]\\ \amp = AC + BC\text{.} \end{align*}

Similar arguments can be used to show the following properties of matrix multiplication.

We verified the second part of this theorem and will assume that all of the properties of this theorem hold. The matrix \(I_n\) introduced in Theorem 8.3 is called the (multiplicative) identity matrix. We usually omit the word multiplicative and refer to the \(I_n\) simply as the identity matrix. This does not cause any confusion since we refer to the additive identity matrix as simply the zero matrix.

Definition 8.4.

Let \(n\) be a positive integer. The \(n \times n\) identity matrix \(I_n\) is the matrix \(I_n = [a_{ij}]\text{,}\) where \(a_{ii} = 1\) for each \(i\) and \(a_{ij} = 0\) if \(i \neq j\text{.}\)

We also write the matrix \(I_n\) as

\begin{equation*} I_n = \left[ \begin{array}{ccccccc} 1 \amp 0 \amp 0 \amp 0 \amp \cdots \amp 0 \amp 0 \\ 0 \amp 1 \amp 0 \amp 0 \amp \cdots \amp 0 \amp 0 \\ 0 \amp 0 \amp 1 \amp 0 \amp \cdots \amp 0 \amp 0 \\ \vdots \amp \amp \amp \amp \ddots \amp \amp \vdots \\ 0 \amp 0 \amp 0 \amp 0 \amp \cdots \amp 1 \amp 0 \\ 0 \amp 0\amp 0 \amp 0 \amp \cdots \amp 0 \amp 1 \end{array} \right]\text{.} \end{equation*}

The matrix \(I_n\) has the property that for any \(n \times n\) matrix \(A\text{,}\)

\begin{equation*} AI_n = I_n A = A\,\text{.} \end{equation*}

so \(I_n\) is a multiplicative identity in the set of all \(n \times n\) matrices. More generally, for an \(m\times n\) matrix \(A\text{,}\)

\begin{equation*} AI_n = I_mA = A \,\text{.} \end{equation*}

Subsection The Transpose of a Matrix

One additional operation on matrices is the transpose. The transpose of a matrix occurs in many useful formulas in linear algebra and in applications of linear algebra.

Definition 8.5.

The transpose of an \(m \times n\) matrix \(A = [a_{ij}]\) is the \(n \times m\) matrix \(A^{\tr}\) whose \(i,j\)th entry is \(a_{ji}\text{.}\)

Written out, the transpose of the \(m \times n\) matrix

\begin{equation*} A = \left[ \begin{array}{ccccc} a_{11} \amp a_{12} \amp \cdots \amp a_{1n-1} \amp a_{1n} \\ a_{21} \amp a_{22} \amp \cdots \amp a_{2n-1} \amp a_{2n} \\ \vdots \amp \amp \ddots \amp \amp \vdots \\ a_{m1} \amp a_{m2} \amp \cdots \amp a_{mn-1} \amp a_{mn} \end{array} \right] \end{equation*}

is the \(n \times m\) matrix

\begin{equation*} A^{\tr} = \left[ \begin{array}{ccccc} a_{11} \amp a_{21} \amp \cdots \amp a_{m-11} \amp a_{m1} \\ a_{12} \amp a_{22} \amp \cdots \amp a_{m-12} \amp a_{m2} \\ \vdots \amp \amp \ddots \amp \amp \vdots \\ a_{1n} \amp a_{2n} \amp \cdots \amp a_{m-1n} \amp a_{mn} \end{array} \right]\text{.} \end{equation*}

In other words, the transpose of a matrix \(A\) is the matrix \(A^{\tr}\) whose rows are the columns of \(A\text{.}\) Alternatively, the transpose of \(A\) is the matrix \(A^{\tr}\) whose columns are the rows of \(A\text{.}\) We can also view the transpose of \(A\) as the reflection of \(A\) across its main diagonal, where the diagonal of a matrix \(A = [a_{ij}]\) consists of the entries of the form \([a_{ii}]\text{.}\)

Activity 8.5.

(a)

Find the transpose of each of the indicated matrices.

\begin{equation*} \left[ \begin{array}{cccc} 1 \amp 2 \amp 3 \amp 4 \\ 5 \amp 6 \amp 7 \amp 8 \end{array} \right] \qquad \left[ \begin{array}{r} 1 \\ -1 \\ 0 \end{array} \right] \qquad \left[ \begin{array}{cr} 1 \amp 2 \\ 4 \amp -3 \\ 0 \amp -1 \end{array} \right] \end{equation*}
(b)

Find the transpose of the new matrix for each part above. What can you conjecture based on your results?

(c)

There are certain special types of matrices that are given names.

Definition 8.6.

Let \(A\) be a square matrix whose \(ij\)th entry is \(a_{ij}\text{.}\)

  1. The matrix \(A\) is a diagonal matrix if \(a_{ij} = 0\) whenever \(i \neq j\text{.}\)

  2. The matrix \(A\) is a symmetric matrix if \(A^{\tr} = A\text{.}\)

  3. The matrix \(A\) is an upper triangular if \(a_{ij} = 0\) whenever \(i > j\text{.}\)

  4. The matrix \(A\) is a lower triangular if \(a_{ij} = 0\) whenever \(i \lt j\text{.}\)

(i)

Find an example of a diagonal matrix \(A\text{.}\) What can you say about \(A^{\tr}\text{?}\)

(ii)

Find an example of a non-diagonal symmetric matrix \(B\text{.}\) If \(B^{\tr} = B\text{,}\) must \(B\) be a square matrix?

(iii)

Find an example of an upper triangular matrix \(C\text{.}\) What kind of a matrix is \(C^{\tr}\text{?}\)

We will see later that diagonal matrices are important in that their powers are easy to calculate. Symmetric matrices arise frequently in applications such as in graph theory as adjacency matrices and in quantum mechanics as observables, and have many useful properties including being diagonalizable and having real eigenvalues, as we will also see later.

Subsection Properties of the Matrix Transpose

As with every other operation, we want to understand what properties the matrix transpose has. Properties of transposes are shown in the following theorem.

The one property that might seem strange is the third one. To understand this property, suppose \(A\) is an \(m \times n\) matrix and \(B\) an \(n \times k\) matrix so that the product \(AB\) is defined. We will argue that \((AB)^{\tr} = B^{\tr}A^{\tr}\) by comparing the \(i,j\)th entry of each side.

  • First notice that the \(i,j\)th entry of \((AB)^{\tr}\) is the \(j,i\)th entry of \(AB\text{.}\) The \(j,i\)th entry of \(AB\) is found by taking the scalar product of the \(j\)th row of \(A\) with the \(i\)th column of \(B\text{.}\) Thus, the \(i,j\)th entry of \((AB)^{\tr}\) is the scalar product of the \(j\)th row of \(A\) with the \(i\)th column of \(B\text{.}\)

  • The \(i,j\)th entry of \(B^{\tr}A^{\tr}\) is the scalar product of the \(i\)th row of \(B^{\tr}\) with the \(j\)th column of \(A^{\tr}\text{.}\) But the \(i\)th row of \(B^{\tr}\) is the \(i\)th column of \(B\) and the \(j\)th column of \(A^{\tr}\) is the \(j\)th row of \(A\text{.}\) So the \(i,j\)th entry of \(B^{\tr}A^{\tr}\) is the scalar product of the \(j\)th row of \(A\) with the \(i\)th column of \(B\text{.}\)

Since the two matrices \((AB)^{\tr}\) and \(B^{\tr}A^{\tr}\) have the same size and same corresponding entries, they are the same matrix.

Subsection Examples

What follows are worked examples that use the concepts from this section.

Example 8.8.

Let

\begin{equation*} \begin{array}{ccc} A = \left[ \begin{array}{ccrc} 1\amp 2\amp 0\amp 1\\3\amp 0\amp -4\amp 5\\7\amp 6\amp -1\amp 0 \end{array} \right] \amp \amp B = \left[ \begin{array}{rcr} -2\amp 4\amp -3\\5\amp 1\amp 9\\1\amp 1\amp -2 \end{array} \right] \\ \amp \amp \\ C = \left[ \begin{array}{crc} 0\amp -1\amp 6\\3\amp -2\amp 5\\1\amp 0\amp 4 \end{array} \right] \amp \amp D = \left[ \begin{array}{cr} 10\amp -4\\5\amp 2\\8\amp -1 \end{array} \right] \\ \amp \amp \\ E = \left[ \begin{array}{cr} 1\amp 0\\4\amp -3\\5\amp -1 \end{array} \right] \amp \text{ and } \amp F = \left[ \begin{array}{rcr} -2\amp 1\amp 5\\6\amp 3\amp -8\\1\amp 0\amp -1\\ 7\amp 0\amp -5 \end{array} \right]. \end{array}\text{.} \end{equation*}

Determine the results of the following operations, if defined. If not defined, explain why.

(a)

\(AF\)

Solution.

Since \(A\) is a \(3 \times 4\) matrix and \(F\) is a \(4 \times 3\) matrix, the number of columns of \(A\) equals the number of rows of \(F\) and the matrix produce \(AF\) is defined. Recall that if \(F = [\vf_1 \ \vf_2 \ \vf_3]\text{,}\) where \(\vf_1\text{,}\) \(\vf_2\text{,}\) \(\vf_3\) are the columns of \(F\text{,}\) then \(AF = [A\vf_1 \ A\vf_2 \ A\vf_3]\text{.}\) Recall also that \(A \vf_1\) is the linear combination of the columns of \(A\) with weights from \(\vf_1\text{,}\) so

\begin{align*} A\vf_1 \amp = \left[ \begin{array}{ccrc} 1\amp 2\amp 0\amp 1\\ 3\amp 0\amp -4\amp 5\\ 7\amp 6\amp -1\amp 0 \end{array} \right] \left[ \begin{array}{r} -2\\ 6\\ 1\\ 7 \end{array} \right]\\ \amp = (-2) \left[ \begin{array}{c} 1\\ 3\\ 7 \end{array} \right] + (6) \left[ \begin{array}{c} 2\\ 0\\ 6 \end{array} \right] + (1) \left[ \begin{array}{r} 0\\ -4\\ -1 \end{array} \right] + (7) \left[ \begin{array}{c} 1\\ 5\\ 0 \end{array} \right]\\ \amp = \left[ \begin{array}{c} -2+12+0+7\\ -6+0-4+35\\ -14+36-1+0\end{array} \right]\\ \amp = \left[ \begin{array}{c} 17\\ 25\\ 21 \end{array} \right]\text{,} \end{align*}
\begin{align*} A\vf_2 \amp = \left[ \begin{array}{ccrc} 1\amp 2\amp 0\amp 1\\ 3\amp 0\amp -4\amp 5\\ 7\amp 6\amp -1\amp 0 \end{array} \right] \left[ \begin{array}{c} 1\\ 3\\ 0\\ 0 \end{array} \right]\\ \amp = (1) \left[ \begin{array}{c} 1\\ 3\\ 7 \end{array} \right] + (3) \left[ \begin{array}{c} 2\\ 0\\ 6 \end{array} \right] + (0) \left[ \begin{array}{r} 0\\ -4\\ -1 \end{array} \right] + (0) \left[ \begin{array}{c} 1\\ 5\\ 0 \end{array} \right]\\ \amp = \left[ \begin{array}{c} 1+6+0+0\\ 3+0+0+0\\ 7+18+0+0\end{array} \right]\\ \amp = \left[ \begin{array}{c} 7\\ 3\\ 25 \end{array} \right]\text{,} \end{align*}

and

\begin{align*} A\vf_3 \amp = \left[ \begin{array}{ccrc} 1\amp 2\amp 0\amp 1\\ 3\amp 0\amp -4\amp 5\\ 7\amp 6\amp -1\amp 0 \end{array} \right] \left[ \begin{array}{r} 5\\ -8\\ -1\\ -5 \end{array} \right]\\ \amp = (5) \left[ \begin{array}{c} 1\\ 3\\ 7 \end{array} \right] - (8) \left[ \begin{array}{c} 2\\ 0\\ 6 \end{array} \right] - (1) \left[ \begin{array}{r} 0\\ -4\\ -1 \end{array} \right] - (5) \left[ \begin{array}{c} 1\\ 5\\ 0 \end{array} \right]\\ \amp = \left[ \begin{array}{c} 5-16-0-5\\ 15-0+4-25\\ 35-48+1-0\end{array} \right]\\ \amp = \left[ \begin{array}{r} -16\\ -6\\ -12 \end{array} \right]\text{.} \end{align*}

So \(AF = \left[ \begin{array}{ccr} 17\amp 7\amp -16\\25\amp 3\amp -6\\21\amp 25\amp -12 \end{array} \right]\text{.}\) Alternatively, if \(A = \left[ \begin{array}{c} \va_1\\ \va_2 \\ \va_3 \\ \va_4 \end{array} \right]\text{,}\) then the matrix product \(AF\) is the matrix whose \(ij\) entry is \(\va_i \cdot \vf_j\text{.}\) Using this method we have

\begin{equation*} AF = \left[ \begin{array}{ccc} \va_1 \cdot \vf_1\amp \va_1 \cdot \vf_2 \amp \va_1 \cdot \vf_3 \\ \va_2 \cdot \vf_1\amp \va_2 \cdot \vf_2 \amp \va_2 \cdot \vf_3 \\ \va_3 \cdot \vf_1\amp \va_3 \cdot \vf_2 \amp \va_3 \cdot \vf_3 \end{array} \right]\text{.} \end{equation*}

Now

\begin{align*} \va_1 \cdot \vf_1 \amp = (1)(-2)+(2)(6)+(0)(1)+(1)(7) = 17\\ \va_1 \cdot \vf_2 \amp = (1)(1)+(2)(3)+(0)(0)+(1)(0) = 7\\ \va_1 \cdot \vf_3 \amp = (1)(5)+(2)(-8)+(0)(-1)+(1)(-5) = -16\\ \va_2 \cdot \vf_1 \amp = (3)(-2)+(0)(6)+(-4)(1)+(5)(7) = 25\\ \va_2 \cdot \vf_2 \amp = (3)(1)+(0)(3)+(-4)(0)+(5)(0) = 3\\ \va_2 \cdot \vf_3 \amp = (3)(5)+(0)(-8)+(-4)(-1)+(5)(-5) = -6\\ \va_3 \cdot \vf_1 \amp = (7)(-2)+(6)(6)+(-1)(1)+(0)(7) = 21\\ \va_3 \cdot \vf_2 \amp = (7)(1)+(6)(3)+(-1)(0)+(0)(0) = 25\\ \va_3 \cdot \vf_3 \amp =(7)(5)+(6)(-8)+(-1)(-1)+(0)(-5) = -12\text{,} \end{align*}

so \(AF = \left[ \begin{array}{ccr} 17\amp 7\amp -16\\25\amp 3\amp -6\\21\amp 25\amp -12 \end{array} \right]\text{.}\)

(b)

\(A(BC)\)

Solution.

Since \(BC\) is a \(3 \times 3\) matrix but \(A\) is \(3 \times 4\text{,}\) the number of columns of \(A\) is not equal to the number of rows of \(BC\text{.}\) We conclude that \(A(BC)\) is not defined.

(c)

\((BC)A\)

Solution.

Since \(BC\) is a \(3 \times 3\) matrix and \(A\) is \(3 \times 4\text{,}\) the number of columns of \(BC\) is equal to the number of rows of \(A\text{.}\) Thus, the quantity \((BC)A\) is defined. First we calculate \(BC\) using the dot product of the rows of \(B\) with the columns of \(C\text{.}\) Letting \(B = \left[ \begin{array}{c} \vb_1 \\ \vb_2 \\ \vb_3 \end{array} \right]\) and \(C = [\vc_1 \ \vc_2 \ \vc_3]\text{,}\) where \(\vb_1\text{,}\) \(\vb_2\text{,}\) and \(\vb_3\) are the rows of \(B\) and \(\vc_1\text{,}\) \(\vc_2\text{,}\) and \(\vc_3\) are the columns of \(C\text{,}\) we have

\begin{equation*} BC = \left[ \begin{array}{ccc} \vb_1 \cdot \vc_1 \amp \vb_1 \cdot \vc_2 \amp \vb_1 \cdot \vc_3 \\ \vb_2 \cdot \vc_1 \amp \vb_2 \cdot \vc_2 \amp \vb_2 \cdot \vc_3 \\ \vb_3 \cdot \vc_1 \amp \vb_3 \cdot \vc_2 \amp \vb_3 \cdot \vc_3 \end{array} \right]\text{.} \end{equation*}

Now

\begin{align*} \vb_1 \cdot \vc_1 \amp = (-2)(0)+(4)(3)+(-3)(1) = 9\\ \vb_1 \cdot \vc_1 \amp = (-2)(-1)+(4)(-2)+(-3)(0) = -6\\ \vb_1 \cdot \vc_1 \amp = (-2)(6)+(4)(5)+(-3)(4) = -4\\ \vb_1 \cdot \vc_1 \amp = (5)(0)+(1)(3)+(9)(1) = 12\\ \vb_1 \cdot \vc_1 \amp = (5)(-1)+(1)(-2)+(9)(0) = -7\\ \vb_1 \cdot \vc_1 \amp = (5)(6)+(1)(5)+(9)(4) = 71\\ \vb_1 \cdot \vc_1 \amp =(1)(0)+(1)(3)+(-2)(1) = 1\\ \vb_1 \cdot \vc_1 \amp = (1)(-1)+(1)(-2)+(-2)(0) = -3\\ \vb_1 \cdot \vc_1 \amp = (1)(6)+(1)(5)+(-2)(4) = 3\text{,} \end{align*}

so \(BC = \left[ \begin{array}{crr} 9\amp -6\amp -4 \\ 12\amp -7\amp 71 \\ 1\amp -3\amp 3 \end{array} \right]\text{.}\) If \(BC = \left[ \begin{array}{c} \vr_1 \\ \vr_2 \\ \vr_3 \end{array} \right]\) and \(A = [\vs_1 \ \vs_2 \ \vs_3 \ \vs_4]\text{,}\) where \(\vr_1\text{,}\) \(\vr_2\text{,}\) and \(\vr_3\) are the rows of \(BC\) and \(\vs_1\text{,}\) \(\vs_2\text{,}\) \(\vs_3\text{,}\) and \(\vs_4\) are the columns of \(A\text{,}\) then

\begin{equation*} (BC)A = \left[ \begin{array}{cccc} \vr_1 \cdot \vs_1 \amp \vr_1 \cdot \vs_2 \amp \vr_1 \cdot \vr_3 \amp \vr_1 \cdot \vs_4 \\ \vr_2 \cdot \vs_1 \amp \vr_2 \cdot \vs_2 \amp \vr_2 \cdot \vs_3 \amp \vr_2 \cdot \vs_4 \\ \vr_3 \cdot \vs_1 \amp \vr_3 \cdot \vs_2 \amp \vr_3 \cdot \vs_3 \amp \vr_3 \cdot \vs_4 \end{array} \right]\text{.} \end{equation*}

Now

\begin{align*} \vr_1 \cdot \vs_1 \amp = (9)(1)+(-6)(3)+(-4)(7) = -37\\ \vr_1 \cdot \vs_2 \amp = (9)(2)+(-6)(0)+(-4)(6) = -6\\ \vr_1 \cdot \vs_3 \amp = (9)(0)+(-6)(-4)+(-4)(-1) = 28\\ \vr_1 \cdot \vs_4 \amp = (9)(1)+(-6)(5)+(-4)(0) = -21\\ \vr_2 \cdot \vs_1 \amp = (12)(1)+(-7)(3)+(71)(7) = 488\\ \vr_2 \cdot \vs_2 \amp = (12)(2)+(-7)(0)+(71)(6) = 450\\ \vr_2 \cdot \vs_3 \amp = (12)(0)+(-7)(-4)+(71)(-1) = -43\\ \vr_2 \cdot \vs_4 \amp = (12)(1)+(-7)(5)+(71)(0) = -23\\ \vr_3 \cdot \vs_1 \amp = (1)(1)+(-3)(3)+(3)(7) = 13\\ \vr_3 \cdot \vs_2 \amp = (1)(2)+(-3)(0)+(3)(6) = 20\\ \vr_3 \cdot \vs_3 \amp = (1)(0)+(-3)(-4)+(3)(-1) = 9\\ \vr_3 \cdot \vs_4 \amp = (1)(1)+(-3)(5)+(3)(0) = -14\text{,} \end{align*}

so \((BC)A = \left[ \begin{array}{rrrr} -37\amp -6\amp 28\amp -21 \\ 488\amp 450\amp -43\amp -23 \\ 13\amp 20\amp 9\amp -14 \end{array} \right]\text{.}\)

(d)

\((B+C)D\)

Solution.

Since \(B\) and \(C\) are both \(3 \times 3\) matrices, their sum is defined and is a \(3 \times 3\) matrix. Because \(D\) is \(3 \times 2\) matrix, the number of columns of \(B+C\) is equal to the number of rows of \(D\text{.}\) Thus, the quantity \((B+C)D\) is defined and, using the row-column method of matrix multiplication as earlier,

\begin{align*} (B+C)D \amp = \left( \left[ \begin{array}{rcr} -2\amp 4\amp -3\\ 5\amp 1\amp 9\\ 1\amp 1\amp -2 \end{array} \right] + \left[ \begin{array}{crc} 0\amp -1\amp 6\\ 3\amp -2\amp 5\\ 1\amp 0\amp 4 \end{array} \right] \right) \left[ \begin{array}{cr} 10\amp -4\\ 5\amp 2\\ 8\amp -1 \end{array} \right]\\ \amp = \left[ \begin{array}{ccc} -2+0\amp 4-1\amp -3+6\\ 5+3\amp 1-2\amp 9+5\\ 1+1\amp 1+0\amp -2+4 \end{array} \right] \left[ \begin{array}{cr} 10\amp -4\\ 5\amp 2\\ 8\amp -1 \end{array} \right]\\ \amp = \left[ \begin{array}{rrc} -2\amp 3\amp 3\\ 8\amp -1\amp 14\\ 2\amp 1\amp 2 \end{array} \right] \left[ \begin{array}{cr} 10\amp -4\\ 5\amp 2\\ 8\amp -1 \end{array} \right]\\ \amp = \left[ \begin{array}{cr} 19\amp 11\\ 187\amp -48\\ 41\amp -8 \end{array} \right]\text{.} \end{align*}
(e)

\(D^{\tr}E\)

Solution.

Since \(D^{\tr}\) is a \(2 \times 3\) matrix and \(E\) is \(3 \times 2\text{,}\) the number of columns of \(D^{\tr}\) is equal to the number of rows of \(E\text{.}\) Thus, \(D^{\tr}E\) is defined and

\begin{align*} D^{\tr}E \amp = \left[ \begin{array}{cr} 10\amp -4\\ 5\amp 2\\ 8\amp -1 \end{array} \right]^{\tr} \left[ \begin{array}{cr} 1\amp 0\\ 4\amp -3\\ 5\amp -1 \end{array} \right]\\ \amp = \left[ \begin{array}{rcr} 10\amp 5\amp 8\\ -4\amp 2\amp -1 \end{array} \right] \left[ \begin{array}{cr} 1\amp 0\\ 4\amp -3\\ 5\amp -1 \end{array} \right]\\ \amp = \left[ \begin{array}{rr} 70\amp -23\\ -1\amp -5 \end{array} \right]\text{.} \end{align*}
(f)

\(\left(A^{\tr}+F\right)^{\tr}\)

Solution.

The fact that \(A\) is a \(3 \times 4\) matrix means that \(A^{\tr}\) is a \(4 \times 3\) matrix. Since \(F\) is also a \(4 \times 3\) matrix, the sum \(A^{\tr}+F\) is defined. The transpose of any matrix is also defined, so \(\left(A^{\tr}+F\right)^{\tr}\) is defined and

\begin{align*} \left(A^{\tr}+F\right)^{\tr} \amp = \left( \left[ \begin{array}{ccrc} 1\amp 2\amp 0\amp 1\\ 3\amp 0\amp -4\amp 5\\ 7\amp 6\amp -1\amp 0 \end{array} \right]^{\tr} + \left[ \begin{array}{rcr} -2\amp 1\amp 5\\ 6\amp 3\amp -8\\ 1\amp 0\amp -1\\ 7\amp 0\amp -5 \end{array} \right] \right)^{\tr}\\ \amp = \left( \left[ \begin{array}{crr} 1\amp 3\amp 7\\ 2\amp 0\amp 6\\ 0\amp -4\amp -1\\ 1\amp 5\amp 0 \end{array} \right] + \left[ \begin{array}{rcr} -2\amp 1\amp 5\\ 6\amp 3\amp -8\\ 1\amp 0\amp -1\\ 7\amp 0\amp -5 \end{array} \right] \right)^{\tr}\\ \amp = \left( \left[ \begin{array}{ccc} 1-2\amp 3+1\amp 7+5\\ 2+6\amp 0+3\amp 6-8\\ 0+1\amp -4+0\amp -1-1\\ 1+7\amp 5+0\amp 0-5 \end{array} \right] \right)^{\tr}\\ \amp = \left( \left[ \begin{array}{rrr} -1\amp 4\amp 12\\ 8\amp 3\amp -2\\ 1\amp -4\amp -2\\ 8\amp 5\amp -5 \end{array} \right] \right)^{\tr}\\ \amp = \left[ \begin{array}{rrrr} -1\amp 8\amp 1\amp 8\\ 4\amp 3\amp -4\amp 5\\ 12\amp -2\amp -2\amp -5 \end{array} \right]\text{.} \end{align*}

Example 8.9.

Let \(A = \left[ \begin{array}{cr} 2\amp -1\\7\amp -2 \end{array} \right]\) and \(B = \left[ \begin{array}{rc} 4\amp 6 \\ -3\amp 5 \end{array} \right]\text{.}\)

(a)

Determine the matrix sum \(A+B\text{.}\) Then use this sum to calculate \((A+B)^2\text{.}\)

Solution.

Adding corresponding terms shows that \(A+B = \left[ \begin{array}{cc} 6\amp 5\\4\amp 3 \end{array} \right]\text{.}\) Squaring this sum yields the result \((A+B)^2 = \left[ \begin{array}{cc} 56\amp 45 \\ 36\amp 29 \end{array} \right]\text{.}\)

(b)

Now calculate \((A+B)^2\) in a different way. Use the fact that matrix multiplication distributes over matrix addition to expand (like foiling) \((A+B)^2\) into a sum of matrix products. The calculate each summand and add to find \((A+B)^2\text{.}\) You should obtain the same result as part (a). If not, what could be wrong?

Solution.

Expanding \((A+B)^2\) (remember that matrix multiplication is not commutative) gives us

\begin{align*} (A+B)^2 \amp = (A+B)(A+B)\\ \amp = A^2 + AB + BA + B^2\\ \amp = \left[ \begin{array}{rr} -3\amp 0\\ 0\amp -3 \end{array} \right] + \left[ \begin{array}{cc} 11\amp 7\\ 34\amp 32 \end{array} \right] + \left[ \begin{array}{cr} 50\amp -16\\ 29\amp -7 \end{array} \right] + \left[ \begin{array}{rc} -2\amp 54\\ -27\amp 7 \end{array} \right]\\ \amp = \left[ \begin{array}{cc} 56\amp 45\\ 36\amp 29 \end{array} \right] \end{align*}

just as in part (a). If instead you obtained the matrix \(\left[ \begin{array}{cc} 17\amp 68 \\41\amp 68 \end{array} \right]\) you likely made the mistake of equating \((A+B)^2\) with \(A^2+2AB+B^2\text{.}\) These two matrices are not equal in general, because we cannot say that \(AB\) is equal to \(BA\text{.}\)

Subsection Summary

In this section we defined a matrix sum, scalar multiples of matrices, the matrix product, and the transpose of a matrix.

  • The sum of two \(m \times n\) matrices \(A = [a_{ij}]\) and \(B = [b_{ij}]\) is the \(m \times n\) matrix \(A+B\) whose \(i,j\)th entry is \(a_{ij} + b_{ij}\text{.}\)

  • If \(A = [a_{ij}]\) is an \(m \times n\) matrix, the scalar multiple \(kA\) of \(A\) by the scalar \(k\) is the \(m \times n\) matrix whose \(i,j\)th entry is \(ka_{ij}\text{.}\)

  • If \(A\) is a \(k \times m\) matrix and \(B = [\vb_1 \ \vb_2 \ \cdots \ \vb_n]\) is an \(m \times n\) matrix, then the matrix product \(AB\) of the matrices \(A\) and \(B\) is the \(k \times n\) matrix

    \begin{equation*} [A\vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{.} \end{equation*}

    The matrix product is defined in this way so that the matrix of a composite \(S \circ T\) of linear transformations is the product of matrices of \(S\) and \(T\text{.}\)

  • An alternate way of calculating the product of an \(k \times m\) matrix \(A\) with rows \(\vr_1\text{,}\) \(\vr_2\text{,}\) \(\ldots\text{,}\) \(\vr_k\) and an \(m \times n\) matrix \(B\) with columns \(\vb_1\text{,}\) \(\vb_2\text{,}\) \(\ldots\text{,}\) \(\vb_n\) is that the product \(AB\) is the \(k \times n\) matrix whose \(i,j\)th entry is \(\vr_i \cdot \vb_j\text{.}\)

  • Matrix multiplication does not behave as the standard multiplication on real numbers. For example, we can have a product of two non-zero matrices equal to the zero matrix and there is no cancellation law for matrix multiplication.

  • The transpose of an \(m \times n\) matrix \(A = [a_{ij}]\) is the \(n \times m\) matrix \(A^{\tr}\) whose \(i,j\)th entry is \(a_{ji}\text{.}\)

Exercises Exercises

1.

Calculate \(AB\) for each of the following matrix pairs by hand in two ways.

(a)

\(A = \left[ \begin{array}{cc} 1\amp 0\\0\amp 1\\0\amp 0 \end{array} \right]\text{,}\) \(B = \left[ \begin{array}{cc} a\amp b \\ c\amp d \end{array} \right]\)

(b)

\(A = \left[ \begin{array}{ccr} 1\amp 0\amp -1 \end{array} \right]\text{,}\) \(B = \left[ \begin{array}{cc} 1\amp 2 \\ 2\amp 3 \\ 3\amp 4 \end{array} \right]\)

2.

For each of the following \(A\) matrices, find all \(2\times 2\) matrices \(B=\left[ \begin{array}{cc} a\amp b\\c\amp d \end{array} \right]\) which commute with the given \(A\text{.}\) (Two matrices \(A\) and \(B\) commute with each other if \(AB = BA\text{.}\))

(a)

\(A = \left[ \begin{array}{cc} 2\amp 0 \\ 0 \amp 2 \end{array} \right]\)

(b)

\(A = \left[ \begin{array}{cc} 2\amp 0 \\ 0 \amp 3 \end{array} \right]\)

(c)

\(A = \left[ \begin{array}{cc} 0\amp 1 \\ 0 \amp 0 \end{array} \right]\)

3.

Find all possible, if any, \(X\) matrices satisfying each of the following matrix equations.

(a)

\(\left[ \begin{array}{cc} 1\amp 2 \\ 0 \amp 2 \end{array} \right] X = \left[ \begin{array}{cc} 0\amp 1 \\ 0 \amp 0 \end{array} \right]\)

(b)

\(\left[ \begin{array}{rr} 1\amp -2 \\ -2 \amp 4 \end{array} \right] X = \left[ \begin{array}{cc} 0\amp 1 \\ 0 \amp 0 \end{array} \right]\)

(c)

\(\left[ \begin{array}{rr} 1\amp -2 \\ -2 \amp 4 \end{array} \right] X = \left[ \begin{array}{cc} 0\amp 1 \\ 0 \amp -2 \end{array} \right]\)

4.

For each of the following \(A\) matrices, compute \(A^2=AA, A^3=AAA, A^4\text{.}\) Use your results to conjecture a formula for \(A^m\text{.}\) Interpret your answer geometrically using the transformation interpretation.

(a)

\(A = \left[ \begin{array}{cc} 2\amp 0 \\ 0 \amp 3 \end{array} \right]\)

(b)

\(A = \left[ \begin{array}{cc} 1\amp 1 \\ 0 \amp 1 \end{array} \right]\)

(c)

\(A = \left[ \begin{array}{cr} 0\amp -1 \\ 1 \amp 0 \end{array} \right]\)

5.

If \(A\vv=2\vv\) for unknown \(A\) matrix and \(\vv\) vector, determine an expression for \(A^2 \vv\text{,}\) \(A^3\vv\text{,}\) …, \(A^m\vv\text{.}\)

6.

If \(A\vv=2\vv\) and \(A\vu=3\vu\text{,}\) find an expression for \(A^m(a\vv+b\vu)\) in terms of \(\vv\) and \(\vu\text{.}\)

7.

A matrix \(A\) is a nilpotent matrix if \(A^m=0\text{,}\) i.e., \(A^m\) is the zero matrix, for some positive integer \(m\text{.}\) Explain why the matrices

\begin{equation*} A = \left[ \begin{array}{cc} 0 \amp a \\ 0 \amp 0 \end{array} \right] \, , \, B = \left[ \begin{array}{ccc} 0 \amp a \amp b \\ 0 \amp 0\amp c\\ 0\amp 0\amp 0 \end{array} \right] \end{equation*}

are nilpotent matrices.

8.

Suppose \(A\) is an \(n\times n\) matrix for which \(A^2=0\text{.}\) Show that there is a matrix \(B\) for which \((I_n+A)B=I_n\) where \(I_n\) is the identity matrix of size \(n\text{.}\)

9.

Let \(A\text{,}\) \(B\text{,}\) and \(C\) be \(m \times n\) matrices and let \(a\) and \(b\) be scalars. Verify Theorem 8.1. That is, show that

(a)

\(A+B = B+A\)

(b)

\((A+B) + C = A + (B+C)\)

(c)

The \(m \times n\) matrix \(0\) whose entries are all 0 has the property that \(A + 0 = A\text{.}\)

(d)

The scalar multiple \((-1)A\) of the matrix \(A\) has the property that \((-1)A + A = 0\text{.}\)

(e)

\((a+b) A = aA + bA\)

(f)

\(a(A+B) = aA + aB\)

(g)

\((ab) A = a(bA)\)

(h)

\(1A=A\text{.}\)

10.

Let \(A\text{,}\) \(B\text{,}\) and \(C\) be matrices of the appropriate sizes for all sums and products to be defined and let \(a\) be a scalar. Verify the remaining parts of Theorem 8.3. That is, show that

(a)

\((AB)C = A(BC)\)

(b)

\(A(B+C) = AB + AC\)

(c)

There is a square matrix \(I_n\) with the property that \(AI_n = A\) or \(I_nA = A\) for whichever product is defined.

(d)

\(a(AB) = (aA)B = A(aB)\)

11.

Let \(A = [a_{ij}]\) and \(B = [b_{ij}]\) be matrices of the appropriate sizes, and let \(a\) be a scalar. Verify the remaining parts of Theorem 8.7. That is, show that

(a)

\(\left(A^{\tr}\right)^{\tr} = A\)

(b)

\((A+B)^{\tr} = A^{\tr} + B^{\tr}\)

(c)

\((aA)^{\tr} = aA^{\tr}\)

12.

The matrix exponential is an important tool in solving differential equations. Recall from calculus that the Taylor series expansion for \(e^x\) centered at \(x=0\) is

\begin{equation*} e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots\text{,} \end{equation*}

and that this Taylor series converges to \(e^x\) for every real number \(x\text{.}\) We extend this idea to define the matrix exponential \(e^A\) for any square matrix \(A\) with real entries as

\begin{equation*} e^A = \sum_{n=0}^{\infty} \frac{1}{n!}A^n = I_n + A + \frac{1}{2!}A^2 + \frac{1}{3!}A^3 + \cdots \end{equation*}

We explore this idea with an example. Let \(B = \left[ \begin{array}{cr} 2\amp 0\\0\amp -1 \end{array} \right]\text{.}\)

(a)

Calculate \(B^2\text{,}\) \(B^3\text{,}\) \(B^4\text{.}\) Explain why \(B^n = \left[ \begin{array}{cc} 2^n\amp 0\\0\amp (-1)^n \end{array} \right]\) for any positive integer \(n\text{.}\)

(b)

Show that \(I_2 + B + B^2 + B^3 + B^4\) is equal to

\begin{equation*} \left[ \begin{array}{cc} 1+2+\frac{2^2}{2} + \frac{2^3}{3!} + \frac{2^4}{4!}\amp 0\\0\amp 1+(-1)+\frac{(-1)^2}{2} + \frac{(-1)^3}{3!} + \frac{(-1)^4}{4!} \end{array} \right]\text{.} \end{equation*}
(c)

Explain why \(e^B = \left[ \begin{array}{cc} e^2\amp 0\\0\amp e^{-1} \end{array} \right]\text{.}\)

13.

Show that if \(A\) and \(B\) are \(2\times 2\) rotation matrices, then \(AB\) is also a \(2\times 2\) rotation matrix.

14.

Label each of the following statements as True or False. Provide justification for your response. Throughout, assume that matrices are of the appropriate sizes so that any matrix sums or products are defined.

(a) True/False.

For any three matrices \(A, B, C\) with \(A \neq 0\text{,}\) \(AB=AC\) implies \(B=C\text{.}\)

(b) True/False.

For any three matrices \(A, B, C\) with \(A \neq 0\text{,}\) \(AB=CA\) implies \(B=C\text{.}\)

(c) True/False.

If \(A^2\) is the zero matrix, then \(A\) itself is the zero matrix.

(d) True/False.

If \(AB=BA\) for every \(n \times n\) matrix \(B\text{,}\) then \(A\) is the identity matrix \(I_n\text{.}\)

(e) True/False.

If matrix products \(AB\) and \(BA\) are both defined, then \(A\) and \(B\) are both square matrices of the same size.

(f) True/False.

If \(\vx_1\) is a solution for \(A\vx=\vb_1\) (i.e., that \(A\vx_1 = \vb_1\)) and \(\vx_2\) is a solution for \(B\vx=\vb_2\text{,}\) then \(\vx_1+\vx_2\) is a solution for \((A+B)\vx=\vb_1+\vb_2\text{.}\)

(g) True/False.

If \(B\) is an \(m\times n\) matrix with two equal columns, then the matrix \(AB\) has two equal columns for every \(k \times m\) matrix.

(h) True/False.

If \(A^2 = I_2\text{,}\) then \(A=-I_2\) or \(A=I_2\text{.}\)

Subsection Project: Strassen's Algorithm and Partitioned Matrices

Strassen's algorithm is an algorithm for matrix multiplication that can be more efficient than the standard row-column method. To understand this method, we begin with the \(2 \times 2\) case which will highlight the essential ideas.

Project Activity 8.6.

We first work with the \(2 \times 2\) case.

(a)

Let \(A = [a_{ij}] = \left[ \begin{array}{cc} 1\amp 2\\3\amp 4 \end{array} \right]\) and \(B = [b_{ij}] = \left[ \begin{array}{cc} 5\amp 6\\7\amp 8 \end{array} \right]\text{.}\)

(i)

Calculate the matrix product \(AB\text{.}\)

(ii)

Rather than using eight multiplications to calculate \(AB\text{,}\) Strassen came up with the idea of using the following seven products:

\begin{align*} h_1 \amp = (a_{11}+a_{22})(b_{11}+b_{22})\\ h_2 \amp = (a_{21}+a_{22})b_{11}\\ h_3 \amp = a_{11}(b_{12}-b_{22})\\ h_4 \amp = a_{22}(b_{21}-b_{11})\\ h_5 \amp = (a_{11}+a_{12})b_{22}\\ h_6 \amp =(a_{21}-a_{11})(b_{11}+b_{12})\\ h_7 \amp = (a_{12}-a_{22})(b_{21}+b_{22})\text{.} \end{align*}

Calculate \(h_1\) through \(h_7\) for the given matrices \(A\) and \(B\text{.}\) Then calculate the quantities

\begin{equation*} h_1+h_4-h_5+h_7, \ h_3+h_5, \ h_2+h_4, \ \text{ and } h_1+h_3-h_2+h_6\text{.} \end{equation*}

What do you notice?

(b)

Now we repeat part (a) in general. Suppose we want to calculate the matrix product \(AB\) for arbitrary \(2 \times 2\) matrices \(A = \left[ \begin{array}{cc} a_{11}\amp a_{12}\\a_{21}\amp a_{22} \end{array} \right]\) and \(B = \left[ \begin{array}{cc} b_{11}\amp b_{12}\\b_{21}\amp b_{22} \end{array} \right]\text{.}\) Let

\begin{align*} h_1 \amp = (a_{11}+a_{22})(b_{11}+b_{22})\\ h_2 \amp = (a_{21}+a_{22})b_{11}\\ h_3 \amp = a_{11}(b_{12}-b_{22})\\ h_4 \amp = a_{22}(b_{21}-b_{11})\\ h_5 \amp = (a_{11}+a_{12})b_{22}\\ h_6 \amp =(a_{21}-a_{11})(b_{11}+b_{12})\\ h_7 \amp = (a_{12}-a_{22})(b_{21}+b_{22})\text{.} \end{align*}

Show that

\begin{equation*} AB = \left[ \begin{array}{cc} h_1+h_4-h_5+h_7 \amp h_3+h_5 \\ h_2+h_4 \amp h_1+h_3-h_2+h_6 \end{array} \right]\text{.} \end{equation*}

The next step is to understand how Strassen's algorithm can be applied to larger matrices. This involves the idea of partitioned (or block) matrices. Recall that the matrix-matrix product of the \(k \times m\) matrix \(A\) and the \(m \times n\) matrix \(B = [\vb_1 \ \vb_2 \ \cdots \ \vb_n]\) is defined as

\begin{equation*} AB = [A\vb_1 \ A\vb_2 \ \cdots \ A\vb_n]\text{.} \end{equation*}

In this process, we think of \(B\) as being partitioned into \(n\) columns. We can expand on this idea to partition both \(A\) and \(B\) when calculating a matrix-matrix product.

Project Activity 8.7.

We illustrate the idea of partitioned matrices with an example. Let \(A = \left[ \begin{array}{crcrc} 1\amp -2\amp 3\amp -6\amp 4 \\ 7\amp 5\amp 2\amp -1\amp 0 \\ 3\amp -8\amp 1\amp 0\amp 9 \end{array} \right]\text{.}\)We can partition \(A\) into smaller matrices

\begin{equation*} A = \left[ \begin{array}{crc|rc} 1\amp -2\amp 3\amp -6\amp 4 \\ 7\amp 5\amp 2\amp -1\amp 0 \\ \hline 3\amp -8\amp 1\amp 0\amp 9 \end{array} \right]\text{,} \end{equation*}

which are indicated by the vertical and horizontal lines. As a shorthand, we can describe this partition of \(A\) as

\begin{equation*} A = \left[ \begin{array}{cc} A_{11}\amp A_{12} \\ A_{21}\amp A_{22} \end{array} \right]\text{,} \end{equation*}

where \(A_{11} = \left[ \begin{array}{crc} 1\amp -2\amp 3\\ 7\amp 5\amp 2 \end{array} \right]\text{,}\) \(A_{12} = \left[ \begin{array}{rc} -6\amp 4 \\ -1\amp 0 \end{array} \right]\text{,}\) \(A_{21} = \left[ \begin{array}{crc} 3\amp -8\amp 1 \end{array} \right]\text{,}\) and \(A_{22} = [0 \ 9 ]\text{.}\) The submatrices \(A_{ij}\) are called blocks. If \(B\) is a matrix such that \(AB\) is defined, then \(B\) must have five rows. As an example, \(AB\) is defined if \(B = \left[ \begin{array}{cc} 1\amp 3\\2\amp 0 \\ 4\amp 1\\6\amp 5\\4\amp 2 \end{array} \right]\text{.}\) The partition of \(A\) breaks \(A\) up into blocks with three and two columns, respectively. So if we partition \(B\) into blocks with three and two rows, then we can use the blocks to calculate the matrix product \(AB\text{.}\) For example, partition \(B\) as

\begin{equation*} B = \left[ \begin{array}{cc} 1\amp 3\\2\amp 0 \\ 4\amp 1\\ \hline 6\amp 5\\4\amp 2 \end{array} \right] = \left[ \begin{array}{c} B_{11}\\B_{21} \end{array} \right]\text{.} \end{equation*}

Show that

\begin{equation*} AB = \left[ \begin{array}{cc} A_{11}\amp A_{12} \\ A_{21}\amp A_{22} \end{array} \right] \left[ \begin{array}{c} B_{11}\\B_{21} \end{array} \right] = \left[ \begin{array}{cc} A_{11}B_{11}+A_{12}B_{21} \\ A_{21}B_{11}+A_{22}B_{21} \end{array} \right]\text{.} \end{equation*}

An advantage to using partitioned matrices is that computations with them can be done in parallel, which lessens the time it takes to do the work. In general, we can multiply partitioned matrices as though the submatrices are scalars. That is,

\begin{equation*} \left[ \begin{array}{cccc} A_{11}\amp A_{12}\amp \cdots\amp A_{1m} \\ A_{21}\amp A_{22}\amp \cdots\amp A_{2m}\\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ A_{i1}\amp A_{i2}\amp \cdots\amp A_{im} \\ \vdots \amp \vdots \amp \ddots \amp \vdots \\ A_{k1}\amp A_{k2}\amp \cdots\amp A_{km} \end{array} \right] \left[ \begin{array}{cccccc} B_{11}\amp B_{12}\amp \cdots\amp B_{1j} \amp \cdots \amp B_{1n} \\ B_{21}\amp B_{22}\amp \cdots\amp B_{2j}\amp \cdots \amp B_{2n}\\ \vdots \amp \vdots \amp \ddots \amp \vdots\amp \ddots\amp \vdots \\ B_{m1}\amp B_{m2}\amp \cdots\amp B_{mj} \amp \cdots \amp B_{mn} \end{array} \right] = [P_{ij}]\text{,} \end{equation*}

where

\begin{equation*} P_{ij} = A_{i1}B_{1j} + A_{i2}B_{2j} + \cdots + A_{im}B_{mj} = \sum_{t=1}^{m} A_{it}B_{tj}\text{,} \end{equation*}

provided that all the submatrix products are defined.

Now we can apply Strassen's algorithm to larger matrices using partitions. This method is sometimes referred to as divide and conquer.

Project Activity 8.8.

Let \(A\) and \(B\) be two \(r \times r\) matrices. If \(r\) is not a power of \(2\text{,}\) then pad the rows and columns of \(A\) and \(B\) with zeros to make them of size \(2^m \times 2^m\) for some integer \(m\text{.}\) (From a practical perspective, we might instead just use unequal block sizes.) Let \(n = 2^m\text{.}\) Partition \(A\) and \(B\) as

\begin{equation*} A = \left[ \begin{array}{cc} A_{11}\amp A_{12}\\A_{21}\amp A_{22} \end{array} \right] \ \text{ and } \ B = \left[ \begin{array}{cc} B_{11}\amp B_{12}\\B_{21}\amp B_{22} \end{array} \right]\text{,} \end{equation*}

where each submatrix is of size \(\frac{n}{2} \times \frac{n}{2}\text{.}\) Now we use the Strassen algorithm just as in the \(2 \times 2\) case, treating the submatrices as if they were scalars (with the additional constraints of making sure that the dimensions match up so that products are defined, and ensuring we multiply in the correct order). Letting

\begin{align*} M_1 \amp = (A_{11}+A_{22})(B_{11}+B_{22})\\ M_2 \amp = (A_{21}+A_{22})B_{11}\\ M_3 \amp = A_{11}(B_{12}-B_{22})\\ M_4 \amp = A_{22}(B_{21}-B_{11})\\ M_5 \amp = (A_{11}+A_{12})B_{22}\\ M_6 \amp =(A_{21}-A_{11})(B_{11}+B_{12})\\ M_7 \amp = (A_{12}-A_{22})(B_{21}+B_{22})\text{,} \end{align*}

then the same algebra as in Project Activity 8.6 shows that

\begin{equation*} AB = \left[ \begin{array}{cc} M_1+M_4-M_5+M_7 \amp M_3+M_5 \\ M_2+M_4 \amp M_1+M_3-M_2+M_6 \end{array} \right]\text{.} \end{equation*}

Apply Strassen's algorithm to calculate the matrix product \(AB\text{,}\) where

\begin{equation*} A = \left[ \begin{array}{crr} 1\amp 3\amp -1\\2\amp 4\amp 6\\7\amp -2\amp 5 \end{array} \right] \text{ and } B = \left[ \begin{array}{crc} 2\amp 5\amp 3\\2\amp -4\amp 1\\1\amp 6\amp 4 \end{array} \right]\text{.} \end{equation*}

While Strassen's algorithm can be more efficient, it does not always speed up the process. We investigate this in the next activity.

Project Activity 8.9.

We introduce a little notation to help us describe the efficiency of our calculations. We won't be formal with this notation, rather work with it in an informal way. Big O (the letter “O” ) notation is used to describe the complexity of an algorithm. Generally speaking, in computer science big O notation can be used to describe the run time of an algorithm, the space used by the algorithm, or the number of computations required. The letter “O” is used because the behavior described is also called the order. Big O measures the asymptotic time of an algorithm, not its exact time. For example, if it takes \(6n^2-n+8\) steps to complete an algorithm, then we say that the algorithm grows at the order of \(n^2\) (we ignore the constants and the smaller power terms, since they become insignificant as \(n\) increases) and we describe its growth as \(O{\left(n^2\right)}\text{.}\) To measure the efficiency of an algorithm to determine a matrix product, we will measure the number of operations it takes to calculate the product.

(a)

Suppose \(A\) and \(B\) are \(n \times n\) matrices. Explain why the operation of addition (that is, calculating \(A+B\)) is \(O{\left(n^2\right)}\text{.}\)

(b)

Suppose \(A\) and \(B\) are \(n \times n\) matrices. How many multiplications are required to calculate the matrix product \(AB\text{?}\) Explain.

(c)

The standard algorithm for calculating a matrix product of two \(n \times n\) matrices requires \(n^3\) multiplications and a number of additions. Since additions are much less costly in terms of operations, the standard matrix product is \(O{\left(n^3\right)}\text{.}\) We won't show it here, but using Strassen's algorithm on a product of \(2^m \times 2^m\) matrices is \(O{\left(n^{\log_2(7)}\right)}\text{,}\) where \(n = 2^m\text{.}\) That means that Strassen's algorithm applied to an \(n \times n\) matrix (where \(n\) is a power of \(2\)) requires approximately \(n^{\log_2(7)}\) multiplications. We use this to analyze situations to determine when Strassen's algorithm is computationally more efficient than the standard algorithm.

(i)

Suppose \(A\) and \(B\) are \(5 \times 5\) matrices. Determine the number of multiplications required to calculate the matrix product \(AB\) using the standard matrix product. Then determine the approximate number of multiplications required to calculate the matrix product \(AB\) using Strassen's algorithm. Which is more efficient? (Remember, we can only apply Strassen's algorithm to square matrices whose sizes are powers of \(2\text{.}\))

(ii)

Repeat part i. with \(125 \times 125\) matrices. Which method is more efficient?

As a final note, Strassen's algorithm is approximately \(O{\left(n^{2.81}\right)}\text{.}\) As of 2018, the best algorithm for matrix multiplication, developed by Virginia Williams at Stanford University, is approximately \(O{\left(n^{2.373}\right)}\text{.}\) 15 

Strassen, Volker, Gaussian Elimination is not Optimal, Number. Math. 13, p. 354-356, 1969
Recall from Exercise 5 of Section 5 that the scalar product \(\vu \cdot \vv\) of a \(1 \times n\) matrix \(\vu = [u_1 \ u_2 \ \ldots \ u_n]\) and an \(n \times 1\) vector \(\vv=\left[ \begin{array}{c} v_1\\ v_2\\ \vdots \\ v_n \end{array} \right]\) is \(\vu \cdot \vv = u_1v_1 + u_2v_2 + u_3v_3 + \cdots + u_nv_n\text{.}\)
V. V. Williams, Multiplying matrices in \(O{\left(n^{2.373}\right)}\) time, Stanford University, (2014).