Eigenvalues and eigenvectors

A great many matrices (more generally linear operators) are characterized by their eigenvalues and eigenvectors. They play a crucial role in all branches of science and engineering. Most of the time, finding them requires resorting to numerical methods. So we discuss some simpler methods.

Characteristic Polynomial

Generally speaking, eigenvalues of a square matrix $A$ are roots of the so-called characteristic polynomial:

\begin{displaymath}
\det\vert A - \lambda I\vert = P(\lambda) = 0
\end{displaymath}

That is, start with the matrix $A$ and modify it by subtracting the same variable $\lambda$ from each diagonal element. Then calculate the determinant of the resulting matrix and you get a polynomial. Here is how it works using a $3\times 3$ matrix:

\begin{displaymath}
A =
\left(
\begin{array}{rrr}
1/2 & 3/2 & 0 \\
3/2 & 1/2 & 0 \\
0 & 0 & 1
\end{array}\right)
\end{displaymath}


\begin{displaymath}
\det\vert A - \lambda I\vert = \left\vert
\begin{array}{rrr...
...ambda^2 - \lambda^3 = (-1 - \lambda)(1 - \lambda)(2 - \lambda)
\end{displaymath}

The three zeros of this cubic polynomial are $(-1, 1, 2)$, so this matrix has three distinct eigenvalues. For an $n\times n$ matrix we get a polynomial of degree $n$. Why? It is easy to see if we remember from the previous section that the determinant is a sum over products of matrix elements. One of those products runs down the diagonal. Since each diagonal element has a $\lambda$ in it, the diagonals alone give you a polynomial of degree $n$. The other products have fewer diagonal elements, so they can't increase the degree of the polynomial. beyond $n$.

A polynomial of degree $n$ has $n$ roots, although some of them may appear more than once. Call the zeros of the characteristic polynomial, i.e. , the eigenvalues, $\lambda_i$. If we factor the polynomial in terms of its roots, we get

\begin{displaymath}
P(\lambda) = (\lambda_1 - \lambda)(\lambda_2 - \lambda) \ldots{} (\lambda_n - \lambda).
\end{displaymath}

Notice that the determinant of the matrix itself is the value of the characteristic polynomial at $\lambda = 0$. Plugging in $\lambda = 0$ into the factored expression above leads to the result that the determinant of the matrix is the product of its eigenvalues.

\begin{displaymath}
\det\vert A\vert = P(0) = \lambda_1 \lambda_2 \ldots{} \lambda_n
\end{displaymath}

Eigenvalue equation

When a matrix has a zero determinant, we can find a nontrivial (column vector) solution $v$ to the equation

\begin{displaymath}
(A - \lambda I)v = 0
\end{displaymath}

or

\begin{displaymath}
A v = \lambda v
\end{displaymath}

This is the standard equation for eigenvalue $\lambda$ and eigenvector $v$. There can be as many as $n$ linearly independent solutions to this equation as follows

\begin{displaymath}
A v_i = \lambda_i v_i   .
\end{displaymath}

Notice that the eigenvector is not unique. We can multiply both sides of the equation by a constant $c$ to see that if $v_i$ is a solution for eigenvalue $\lambda_i$, so is $c v_i$.

Often we deal with real symmetric matrices (the transpose of the matrix is equal to the itself). In that case the eigenvectors form a complete set of orthogonal vectors. They can be used to define the directions of coordinate axes, so we can write any $n$ dimensional vector $x$ as a linear combination

\begin{displaymath}
x = \alpha_1 v_1 + \alpha_2 v_2 + \ldots{} + \alpha_n v_n
\end{displaymath}

where the coefficient $\alpha_i$ is the component of the vector in the direction $v_i$.

More generally, if there are $n$ linearly independent eigenvectors $v_i$, this is also possible. Then we have a simple method for finding the eigenvalue with the largest magnitude, namely the ``power method.''

Power method

The power method originates from the general statement that we can use the eigenvectors of a matrix to represent any vector $x$:

\begin{displaymath}
x = \alpha_1 v_1 + \alpha_2 v_2 + \ldots{} + \alpha_n v_n
\end{displaymath}

We multiply by $A$ and get

\begin{eqnarray*}
Ax &=& \alpha_1 Av_1 + \alpha_2 Av_2 + \ldots{} + \alpha_n Av...
...v_1 + \alpha_2 \lambda_2 v_2 + \ldots{} + \alpha_n \lambda_n v_n
\end{eqnarray*}

So we get a new vector whose coefficients are each multiplied by the corresponding eigenvalue: $\alpha_i \lambda_i$. The coefficients with the larger eigenvalues get bigger compared with the coefficients with smaller eigenvalues. So let's say we have sorted the eigenvalues so the one with smallest magnitude is $\lambda_1$, and the one with the largest magnitude is $\lambda_n$. If we multiply by $A$ $m$ times, the coefficients become $\alpha_i \lambda_i^m$. If we keep going, the $n$th term (corresponding to the largest eigenvalue) will eventually swamp all the others and we get (for very large $m$)

\begin{displaymath}
A^m x \approx \lambda_n^m \alpha_n v_n = y
\end{displaymath}

So we get an eigenvector corresponding to the largest eigenvalue. Another way of saying this is that when we hit the vector $x$ with the matrix $A$ we get a new vector that tends to point more in the direction of the leading eigenvector $v_n$. The more factors of $A$ we pile on, the more precisely it points in that direction.

So how do we get the eigenvalue if we have an eigenvector $y$ pointing in the direction of $v_n$? If it points in that direction, we must have $y = c v_n$. Then remember that eigenvectors satisfy

\begin{displaymath}
Ay = A cv_n = \lambda_n c v_n = \lambda_n y   .
\end{displaymath}

That is, the vector $Ay$ has every component of $y$ multiplied by the eigenvalue $\lambda_n$. We can use any of the components to read off the answer.

To turn this process into a practical algorithm, we normalize the vector after each multiplication by $A$. Normalization simply means multiplying by a constant to put the vector in our favorite standard form. A common normalization is to divide by the Cartesian norm so we get a vector of unit length. The normalization we will use here is dividing the whole vector by the component with the largest magnitude. If we take the absolute values of the components, the largest one is called the infinity norm. So we divide by the infinity norm if that component is positive and by minus the infinity norm if it is negative. Here is an example. Suppose we have

\begin{displaymath}
y = (3, 2, -1).
\end{displaymath}

The infinity norm is 3. We divide by 3 to normalize to get $x$:

\begin{displaymath}
x = (1, 2/3, -1/3).
\end{displaymath}

Let's call the component with the largest magnitude the ``leading component''. In the example, the leading component in $y$ is the first one and it is positive. If it was $-3$, instead, we'd divide by $-3$. The goal is to get a vector proportioanl to the original vector, but with one component equal to $+1$ and with the rest of the components no larger in magnitude than 1.

The reason we pick this way of normalizing the vector is that we can then easily read off the eigenvalue by looking at what happens to the leading component when we multiply by $A$. Also, the infinity norm is cheaper to compute, since it doesn't require any arithmetic - just comparisons.

Suppose, after normalizing $y$ to get $x$, we multiply by $A$ one more time and get

\begin{displaymath}
Ax = (2, 4/3, -2/3)  .
\end{displaymath}

We can read off the eigenvalue from the leading component. It is 2. Of course, we could check every component to see that each one got multiplied by 2.

So here is the power algorithm

  Start with any arbitrary vector y.
  Repeat steps 1 - 3 until convergence.
  Step 1: Normalize y to get x.  The leading component is now 1.
  Step 2: Compute y = Ax.
  Step 3: The approximate eigenvalue is the new leading component.
Convergence means that when you normalize $y$, the new $x$ is close enough to the old $x$ to declare victory. You choose what is close enough to suit the requirements of your problem.

Notice that with the power algorithm, you can start with any arbitrary vector and the method will converge to the same result. Well, that is almost any starting vector. You might be unlucky and pick a starting vector that has a zero component $\alpha_n$ along the leading eigenvector $v_n$. But numerical calculations are usually subject to roundoff error, so even if you unwittingly started witn $\alpha_n =
0$, chances are very good that after a few hits with the matrix $A$, you develop a tiny nonzero $\alpha_n$, and then it is just a matter of time before its coefficient grows to dominate the iteration.

Inverse power method

The inverse power method works with the inverse $A^{-1}$, assuming it exists. It is easy to check that the eigenvectors of the matrix $A$ are also eigenvectors of its inverse, but the eigenvalues are the algebraic inverses:

\begin{displaymath}
A^{-1} v_i = \mu_i v_i
\end{displaymath}

where $\mu_i = 1/\lambda_i$. So now the eigenvalue $\mu_i$ with the largest magnitude corresponds to the eigenvalue $\lambda_i$ with the smallest magnitude.

So we can get the largest and smallest eigenvalues. How do we get the ones between? For a matrix whose eigenvalues are all real, we can do this by generalizing the inverse power method. We take the inverse of the shifted matrix $(A - qI)$, where $q$ is any number we like. (We intend to vary $q$.) The eigenvectors of this matrix are still the same as the eigenvectors of $A$:

\begin{displaymath}
(A - qI)^{-1} v_i = \mu_i v_i
\end{displaymath}

where, now, $\mu_i = 1/(\lambda_i - q)$. Which is the largest $\mu_i$? It depends on $q$. If $q$ is close to one of the $\lambda_i$'s, then $\vert\mu_i\vert$ is maximum for that $i$. So if we hold that $q$ fixed and run the power method, we eventually get the eigenvector $v_i$. Then we change $q$ and rerun the power method. It's like tuning a radio dial. As $q$ gets close to a new eigenvalue, we get the next ``broadcast station'', i.e. the next eigenvector. If we keep going, eventually, we get them all.

Clearly, if any of the eigenvalues are complex, we would have a lot of searching to do, because we'd need to search the entire complex plane, and not just the real line interval between $\lambda_1$ and $\lambda_n$. There are better methods.

Other methods: QR algorithm

The power and inverse power method are simple and very easy to implement, but if you want all the eigenvalues, those methods are very inefficient. There are other, much more sophisticated and efficient methods, though. Here we describe in broad terms the Householder/$QR$ algorithm for real symmetric matrices. For details, please see standard texts in numerical methods.

A real symmetric matrix $A$ can be put into diagonal form by a real orthogonal similarity transform. In other words, there exists a real orthogonal matrix $Q$ such that the product (similarity transform)

\begin{displaymath}
\Lambda = \tilde Q A Q
\end{displaymath} (1)

is a diagonal matrix $\Lambda$. (An orthogonal matrix $Q$ is one whose transpose $\tilde Q$ is its inverse: $Q\tilde Q = \tilde Q Q = 1$.) This solves the problem, because the eigenvalues of the matrix $A$ are the diagonal values in $\Lambda$, and the eigenvectors are the column vectors of $Q$. We say that the transform $Q$ ``diagonalizes'' the matrix.

Of course, finding the transform $Q$ is a challenge. With the Householder/$QR$ algorithm it is done through an iterative process that eventually converges to the answer. The first step in the process, the Householder step, is to find an orthogonal similarity transform that puts the matrix in tridiagonal form. This can be done exactly in a finite number of steps. The Householder method finds a matrix $P$ that is not only orthogonal, it is symmetric ($P = \tilde P$):

\begin{displaymath}
\hat A = P A P   .
\end{displaymath} (2)

The matrix $\hat A$ is tridiagonal (and real and symmetric).

In the next phase, the $QR$ phase, we apply a succession of orthogonal similarity transforms $Q^{(i)}$ on the tridiagonal matrix that make the off-diagonal values smaller. Eventually they become small enough that we can say it is diagonal for all intents and purposes. The first similarity transform is applied to the tridiagonal matrix $\hat A$:

\begin{displaymath}
\hat A^{(1)} = \tilde Q^{(1)} \hat A Q^{(1)}  .
\end{displaymath} (3)

The transform is constructed so the resulting matrix $\hat A^{(1)}$ is still tridiagonal, but the off-diagonal elements are smaller. Then we apply the second similarity transform to the result above:
\begin{displaymath}
\hat A^{(2)} = \tilde Q^{(2)} \hat A^{(1)} Q^{(2)}  .
\end{displaymath} (4)

We keep going until eventually $\hat A^{(n)}$, for large $n$, is close enough to a diagonal matrix that we can call it our $\Lambda$. Putting all the transforms together, we get
\begin{displaymath}
\Lambda = \lim_{n\to\infty} \tilde Q^{(n)} \tilde Q^{(n-1)}...
...lde Q^{(1)} P A P
Q^{(1)} Q^{(2)} \ldots{} Q^{(n-1)} Q^{(n)}
\end{displaymath} (5)

It is easy to show that the product of real orthogonal matrices is also real orthogonal. So the product
\begin{displaymath}
Q = PQ^{(1)} Q^{(2)} \ldots{} Q^{(n-1)} Q^{(n)}
\end{displaymath} (6)

is the orthogonal matrix that diagonalizes $A$. That is what we wanted.

So where does the $R$ in the $QR$ algorithm come in? At each step the tridiagonal matrix $\hat A^{(i)}$ is factored into a product of an orthogonal matrix $Q^{(i)}$ and an upper triangular matrix $R^{(i)}$

\begin{displaymath}
\hat A^{(i)} = Q^{(i)} R^{(i)}.
\end{displaymath} (7)

Hence the name $QR$. This factorization can be done exactly with a finite number of steps. Then one can show that the combination
\begin{displaymath}
\hat A^{(i+1)} = \tilde Q^{(i)} \hat A^{(i)} Q^{(i)}  .
\end{displaymath} (8)

is tridiagonal. That gives us the next matrix in the sequence. The same factorization is done on it, and the process continues.

As we can see, finding the eigenvalues this way takes some work. It turns out that the rate of convergence depends on the spacing of the eigenvalues. If two eigenvalues are very close to each other (compared with their average spacing), more iterations are needed to get convergence. If they are well separated, fewer iterations are required.

The $QR$ algorithm also works with (complex) Hermitian matrices ($A^\dagger = A$). This covers a vast number of cases in science and engineering. The eigenvalues are still real, but the similarity transform is unitary ( $Q^\dagger Q = Q Q^\dagger = 1$). Here the dagger means the complex conjugate transpose.