
When f
(x) > 0, this means that f
(x) increases as we move to the right, and f
(x)
decreases as we move to the left. This means f
(x − ) < 0 and f
(x + ) > 0 for small
enough . In other words, as we move right, the slope begins to point uphill to the right,
and as we move left, the slope begins to point uphill to the left. Thus, when f
(x) = 0
and f
(x) > 0, we can conclude that x is a local minimum. Similarly, when f
(x) = 0
and f
(x) < 0, we can conclude that x is a local maximum. This is known as the second
derivative test. Unfortunately, when f
(x) = 0, the test is inconclusive. In this case x
may be a saddle point, or a part of a flat region.
In multiple dimensions, we need to examine all of the second derivatives of the
function. These derivatives can be collected together into a matrix called the Hessian
matrix. The Hessian matrix H(f)(x) is defined such that
H(f)(x)
i,j
=
∂
2
∂x
i
∂x
j
f(x).
Equivalently, the Hessian is the Jacobian of the gradient.
Anywhere that the second partial derivatives are continuous, the differential opera-
tors are commutative:
∂
2
∂x
i
∂x
j
f(x) =
∂
2
∂x
j
∂x
i
f(x).
This implies that h
i,j
= hj, i, so the Hessian matrix is symmetric at such points (which
includes nearly all inputs to nearly all functions we encounter in deep learning). Be-
cause the Hessian matrix is real and symmetric, we can decompose it into a set of real
eigenvalues and an orthogonal basis of eigenvectors.
Using the eigendecomposition Hessian matrix, we can generalize the second deriva-
tive test to multiple dimensions. At a critical point, where ∇
x
f(x) = 0, we can examine
the eigenvalues of the Hessian to determine whether the critical point is a local maxi-
mum, local minimum, or saddle point. When the Hessian is positive definite
1
, the point
is a local minimum. This can be seen by observing that the directional second deriva-
tive in any direction must be positive, and making reference to the univariate second
derivative test. Likewise, when the Hessian is negative definite
2
, the point is a local
maximum. In multiple dimensions, it is actually possible to find positive evidence of
saddle points in some cases. When at least one eigenvalue is positive and at least one
eigenvalue is negative, we know that x is a local maximum on one cross section of f
but a local minimum on another cross section. See Fig. 4.4 for an example. Finally,
the multidimensional second derivative test can be inconclusive, just like the univariate
version. The test is inconclusive whenever all of the non-zero eigenvalues have the same
sign, but at least one eigenvalue is zero. This is because the univariate second derivative
test is inconclusive in the cross section corresponding to the zero eigenvalue.
The Hessian can also be useful for understanding the performance of gradient descent.
When the Hessian has a poor condition number, gradient descent performs poorly. This
is because in one direction, the derivative increases rapidly, while in another direction, it
1
all its eigenvalues are positive
2
all its eigenvalues are negative
63