One way to minimize a general function of several variables is to use
the method of steepest descent. This method is based on the fact that
the gradient of a function points opposite the “fallline”, the path
of steepest descent. The idea is to start somewhere, say at the
starting vector and then step off in the direction of
steepest descent by an amount

(38) 
so that the new trial parameter vector is
.
The problem with this method is that we don't know how far to go along
the line of steepest descent, i.e. we don't know what to take for the
constant. Actually the problem isn't just one of choosing one
constant: we might want to step with a different constant for each
component . We could make up an algorithm that chooses a constant
or set of constants, and then if, after taking a step, we find that
increases, backs up and takes a smaller step. However, we
can do better. Let's first consider a different approach and then
we'll return to this question.