A great many matrices (more generally linear operators) are
characterized by their eigenvalues and eigenvectors. They play a
crucial role in all branches of science and engineering. Most of the
time, finding them requires resorting to numerical methods. So we
discuss some simpler methods.
Generally speaking, eigenvalues of a square matrix are roots of
the so-called characteristic polynomial:
That is, start with the matrix and modify it by subtracting the
same variable from each diagonal element. Then calculate
the determinant of the resulting matrix and you get a polynomial.
Here is how it works using a matrix:
The three zeros of this cubic polynomial are , so this
matrix has three distinct eigenvalues. For an matrix we
get a polynomial of degree . Why? It is easy to see if we
remember from the previous section that the determinant is a sum over
products of matrix elements. One of those products runs down the
diagonal. Since each diagonal element has a in it, the
diagonals alone give you a polynomial of degree . The other
products have fewer diagonal elements, so they can't increase the
degree of the polynomial. beyond .
A polynomial of degree has roots, although some of them may
appear more than once. Call the zeros of the characteristic
polynomial, i.e. , the eigenvalues, . If we factor
the polynomial in terms of its roots, we get
Notice that the determinant of the matrix itself is the value of the
characteristic polynomial at . Plugging in
into the factored expression above leads to the result that the
determinant of the matrix is the product of its eigenvalues.
When a matrix has a zero determinant, we can find a nontrivial (column
vector) solution to the equation
This is the standard equation for eigenvalue and eigenvector
. There can be as many as linearly independent solutions to
this equation as follows
Notice that the eigenvector is not unique. We can multiply both sides
of the equation by a constant to see that if is a solution
for eigenvalue , so is .
Often we deal with real symmetric matrices (the transpose of the
matrix is equal to the itself). In that case the eigenvectors form a
complete set of orthogonal vectors. They can be used to define the
directions of coordinate axes, so we can write any dimensional
vector as a linear combination
where the coefficient is the component of the vector in the
More generally, if there are linearly independent eigenvectors
, this is also possible. Then we have a simple method for
finding the eigenvalue with the largest magnitude, namely the ``power
The power method originates from the general statement that we can use
the eigenvectors of a matrix to represent any vector :
We multiply by and get
So we get a new vector whose coefficients are each multiplied by the
. The coefficients with
the larger eigenvalues get bigger compared with the coefficients with
smaller eigenvalues. So let's say we have sorted the eigenvalues so
the one with smallest magnitude is , and the one with the
largest magnitude is . If we multiply by times,
the coefficients become
. If we keep going, the
th term (corresponding to the largest eigenvalue) will eventually
swamp all the others and we get (for very large )
So we get an eigenvector corresponding to the largest eigenvalue.
Another way of saying this is that when we hit the vector with the
matrix we get a new vector that tends to point more in the
direction of the leading eigenvector . The more factors of
we pile on, the more precisely it points in that direction.
So how do we get the eigenvalue if we have an eigenvector pointing
in the direction of ? If it points in that direction, we must
have . Then remember that eigenvectors satisfy
That is, the vector has every component of multiplied by the
eigenvalue . We can use any of the components to read
off the answer.
To turn this process into a practical algorithm, we normalize the
vector after each multiplication by . Normalization simply means
multiplying by a constant to put the vector in our favorite standard
form. A common normalization is to divide by the Cartesian norm so we
get a vector of unit length. The normalization we will use here is
dividing the whole vector by the component with the largest magnitude.
If we take the absolute values of the components, the largest one is
called the infinity norm. So we divide by the infinity norm if
that component is positive and by minus the infinity norm if it is
negative. Here is an example. Suppose we have
The infinity norm is 3. We divide by 3 to normalize to get :
Let's call the component with the largest magnitude the ``leading
component''. In the example, the leading component in is the
first one and it is positive. If it was , instead, we'd divide by
. The goal is to get a vector proportioanl to the original
vector, but with one component equal to and with the rest of the
components no larger in magnitude than 1.
The reason we pick this way of normalizing the vector is that we can
then easily read off the eigenvalue by looking at what happens to the
leading component when we multiply by . Also, the infinity norm is
cheaper to compute, since it doesn't require any arithmetic - just
Suppose, after normalizing to get , we multiply by one more
time and get
We can read off the eigenvalue from the leading component. It is 2.
Of course, we could check every component to see that each one got
multiplied by 2.
So here is the power algorithm
Start with any arbitrary vector y.
Repeat steps 1 - 3 until convergence.
Step 1: Normalize y to get x. The leading component is now 1.
Step 2: Compute y = Ax.
Step 3: The approximate eigenvalue is the new leading component.
Convergence means that when you normalize , the new is close
enough to the old to declare victory. You choose what is close
enough to suit the requirements of your problem.
Notice that with the power algorithm, you can start with any arbitrary
vector and the method will converge to the same result. Well, that is
almost any starting vector. You might be unlucky and pick a starting
vector that has a zero component along the leading
eigenvector . But numerical calculations are usually subject to
roundoff error, so even if you unwittingly started witn , chances are very good that after a few hits with the matrix ,
you develop a tiny nonzero , and then it is just a matter of
time before its coefficient grows to dominate the iteration.
The inverse power method works with the inverse , assuming it
exists. It is easy to check that the eigenvectors of the matrix
are also eigenvectors of its inverse, but the eigenvalues are the
. So now the eigenvalue with the
largest magnitude corresponds to the eigenvalue with the
So we can get the largest and smallest eigenvalues. How do we get the
ones between? For a matrix whose eigenvalues are all real, we can do
this by generalizing the inverse power method. We take the inverse of
the shifted matrix , where is any number we like. (We
intend to vary .) The eigenvectors of this matrix are still the
same as the eigenvectors of :
. Which is the largest
? It depends on . If is close to one of the
's, then is maximum for that . So if we hold
that fixed and run the power method, we eventually get the
eigenvector . Then we change and rerun the power method.
It's like tuning a radio dial. As gets close to a new eigenvalue,
we get the next ``broadcast station'', i.e. the next
eigenvector. If we keep going, eventually, we get them all.
Clearly, if any of the eigenvalues are complex, we would have a lot of
searching to do, because we'd need to search the entire complex plane,
and not just the real line interval between and
. There are better methods.
The power and inverse power method are simple and very easy to
implement, but if you want all the eigenvalues, those methods are very
inefficient. There are other, much more sophisticated and efficient
methods, though. Here we describe in broad terms the Householder/
algorithm for real symmetric matrices. For details, please see
standard texts in numerical methods.
A real symmetric matrix can be put into diagonal form by a real
orthogonal similarity transform. In other words, there exists a real
orthogonal matrix such that the product (similarity transform)
is a diagonal matrix . (An orthogonal matrix is one
whose transpose is its inverse:
.) This solves the problem, because the eigenvalues of the matrix
are the diagonal values in , and the eigenvectors are the
column vectors of . We say that the transform ``diagonalizes''
Of course, finding the transform is a challenge. With the
Householder/ algorithm it is done through an iterative process
that eventually converges to the answer. The first step in the
process, the Householder step, is to find an orthogonal similarity
transform that puts the matrix in tridiagonal form. This can be done
exactly in a finite number of steps. The Householder method finds a
matrix that is not only orthogonal, it is symmetric ():
The matrix is tridiagonal (and real and symmetric).
In the next phase, the phase, we apply a succession of orthogonal
similarity transforms on the tridiagonal matrix that make
the off-diagonal values smaller. Eventually they become small enough
that we can say it is diagonal for all intents and purposes. The
first similarity transform is applied to the tridiagonal matrix :
The transform is constructed so the resulting matrix is
still tridiagonal, but the off-diagonal elements are smaller. Then we
apply the second similarity transform to the result above:
We keep going until eventually , for large , is close
enough to a diagonal matrix that we can call it our .
Putting all the transforms together, we get
It is easy to show that the product of real orthogonal matrices is
also real orthogonal. So the product
is the orthogonal matrix that diagonalizes . That is what we wanted.
So where does the in the algorithm come in? At each step the
tridiagonal matrix is factored into a product of an
orthogonal matrix and an upper triangular matrix
Hence the name . This factorization can be done exactly with a
finite number of steps. Then one can show that the combination
is tridiagonal. That gives us the next matrix in the sequence. The
same factorization is done on it, and the process continues.
As we can see, finding the eigenvalues this way takes some work. It
turns out that the rate of convergence depends on the spacing of the
eigenvalues. If two eigenvalues are very close to each other
(compared with their average spacing), more iterations are needed to
get convergence. If they are well separated, fewer iterations are
The algorithm also works with (complex) Hermitian matrices
(). This covers a vast number of cases in science and
engineering. The eigenvalues are still real, but the similarity
transform is unitary (
). Here the
dagger means the complex conjugate transpose.