Understand the orthogonal decomposition of a vector with respect to a subspace.

Understand the relationship between orthogonal decomposition and orthogonal projection.

Understand the relationship between orthogonal decomposition and the closest vector on / distance to a subspace.

Learn the basic properties of orthogonal projections as linear transformations and as matrix transformations.

Recipes: orthogonal projection onto a line, orthogonal decomposition by solving a system of equations, orthogonal projection via a complicated matrix product.

Let be a subspace of and let be a vector in In this section, we will learn to compute the closest vector to in The vector is called the orthogonal projection of onto This is exactly what we will use to almost solve matrix equations, as discussed in the introduction to Chapter 7.

Subsection7.3.1Orthogonal Decomposition

We begin by fixing some notation.

Notation

Let be a subspace of and let be a vector in We denote the closest vector to on by

To say that is the closest vector to on means that the difference is orthogonal to the vectors in

In other words, if then we have where is in and is in The first order of business is to prove that the closest vector always exists.

Theorem(Orthogonal decomposition)

Let be a subspace of and let be a vector in Then we can write uniquely as

Let so by this fact in Section 7.2. Let be a basis for and let be a basis for We showed in the proof of this fact in Section 7.2 that is linearly independent, so it forms a basis for Therefore, we can write

where and Since is orthogonal to the vector is the closest vector to on so this proves that such a decomposition exists.

As for uniqueness, suppose that

for in and in Rearranging gives

Since and are subspaces, the left side of the equation is in and the right side is in Therefore, is in and in so it is orthogonal to itself, which implies Hence and which proves uniqueness.

Definition

Let be a subspace of and let be a vector in The expression

for in and in is called the orthogonal decomposition of with respect to and the closest vector is the orthogonal projection of onto

Since is the closest vector on to the distance from to the subspace is the length of the vector from to i.e., the length of To restate:

Closest vector and distance

Let be a subspace of and let be a vector in

The orthogonal projection is the closest vector to in

Now we turn to the problem of computing and Of course, since really all we need is to compute The following theorem gives a method for computing the orthogonal projection onto a column space. To compute the orthogonal projection onto a general subspace, usually it is best to rewrite the subspace as the column space of a matrix, as in this important note in Section 3.3.

Theorem

Let be an matrix, let and let be a vector in Then the matrix equation

in the unknown vector is consistent, and is equal to for any solution

Proof

Let be the orthogonal decomposition with respect to By definition lies in and so there is a vector in with Choose any such vector We know that lies in which is equal to by this important note in Section 7.2. We thus have

and so

This exactly means that is consistent. If is any solution to then by reversing the above logic, we conclude that

Example(Orthogonal projection onto a line)

Let be a line in and let be a vector in By the theorem, to find we must solve the matrix equation where we regard as an matrix (the column space of this matrix is exactly ). But and so is a solution of and hence

In the special case where we are projecting a vector in onto a line our formula for the projection can be derived very directly and simply. The vector is a multiple of say This multiple is chosen so that is perpendicular to as in the following picture.

In other words,

Using the distributive property for the dot product and isolating the variable gives us that

When is a matrix with more than one column, computing the orthogonal projection of onto means solving the matrix equation In other words, we can compute the closest vector by solving a system of linear equations. To be explicit, we state the theorem as a recipe:

Recipe: Compute an orthogonal decomposition

Let be a subspace of Here is a method to compute the orthogonal decomposition of a vector with respect to

Rewrite as the column space of a matrix In other words, find a a spanning set for and let be the matrix with those columns.

Compute the matrix and the vector

Form the augmented matrix for the matrix equation in the unknown vector and row reduce.

This equation is always consistent; choose one solution Then

In the context of the above recipe, if we start with a basis of then it turns out that the square matrix is automatically invertible! (It is always the case that is square and the equation is consistent, but need not be invertible in general.)

Corollary

Let be an matrix with linearly independent columns and let Then the matrix is invertible, and for all vectors in we have

Proof

We will show that which implies invertibility by the invertible matrix theorem in Section 6.1. Suppose that Then so by the theorem. But (the orthogonal decomposition of the zero vector is just so and therefore is in Since the columns of are linearly independent, we have so as desired.

Let be a vector in and let be a solution of Then so

The corollary applies in particular to the case where we have a subspace of and a basis for To apply the corollary, we take to be the matrix with columns

In this subsection, we change perspective and think of the orthogonal projection as a function of This function turns out to be a linear transformation with many nice properties, and is a good example of a linear transformation which is not originally defined as a matrix transformation.

We have to verify the defining properties of linearity in Section 4.3. Let be vectors in and let and be their orthogonal decompositions. Since and are subspaces, the sums and are in and respectively. Therefore, the orthogonal decomposition of is so
Now let be a scalar. Then is in and is in so the orthogonal decomposition of is and therefore,
Since satisfies the two defining properties in Section 4.3, it is a linear transformation.

Any vector in is in the range of because for such vectors. On the other hand, for any vector in the output is in so is the range of

We compute the standard matrix of the orthogonal projection in the same way as for any other transformation: by evaluating on the standard coordinate vectors. In this case, this means projecting the standard coordinate vectors onto the subspace.

For the final assertion, we showed in the proof of this theorem that there is a basis of of the form where is a basis for and is a basis for Each is an eigenvector of indeed, for we have

because is in and for we have

because is in Therefore, we have found a basis of eigenvectors, with associated eigenvalues ( ones and zeros). Now we use the diagonalization theorem in Section 6.4.

We emphasize that the properties of projection matrices would be very hard to prove in terms of matrices. By translating all of the statements into statements about linear transformations, they become much more transparent. For example, consider the projection matrix we found in this example. Just by looking at the matrix it is not at all obvious that when you square the matrix you get the same matrix back.

As we saw in this example, if you are willing to compute bases for and then this provides a third way of finding the standard matrix for projection onto indeed, if is a basis for and is a basis for then

where the middle matrix in the product is the diagonal matrix with ones and zeros on the diagonal. However, since you already have a basis for it is faster to multiply out the expression as in the corollary.