
When f
00
(x) > 0, this means that f
0
(x) increases as we move to the right, and f
0
(x)
decreases as we move to the left. This means f
0
(x − ) < 0 and f
0
(x + ) > 0 for small
enough . In other words, as we move right, the slope begins to point uphill to the right,
and as we move left, the slope begins to point uphill to the left. Thus, when f
0
(x) = 0
and f
00
(x) > 0, we can conclude that x is a local minimum. Similarly, when f
0
(x) = 0
and f
00
(x) < 0, we can conclude that x is a local maximum. This is known as the second
derivative test. Unfortunately, when f
00
(x) = 0, the test is inconclusive. In this case x
may be a saddle point, or a part of a flat region.
In multiple dimensions, we need to examine all of the second derivatives of the
function. These derivatives can be collected together into a matrix called the Hessian
matrix. The Hessian matrix H(f)(x) is defined such that
H(f)(x)
i,j
=
∂
2
∂x
i
∂x
j
f(x).
Equivalently, the Hessian is the Jacobian of the gradient.
Anywhere that the second partial derivatives are continuous, the differential opera-
tors are commutative, i.e. their order can be swapped:
∂
2
∂x
i
∂x
j
f(x) =
∂
2
∂x
j
∂x
i
f(x).
This implies that H
i,j
= H
j,i
, so the Hessian matrix is symmetric at such points. Most of
the functions we encounter in the context of deep learning have a symmetric Hessian al-
most everywhere. Because the Hessian matrix is real and symmetric, we can decompose
it into a set of real eigenvalues and an orthogonal basis of eigenvectors.
Using the eigendecomposition of the Hessian matrix, we can generalize the second
derivative test to multiple dimensions. At a critical point, where ∇
x
f(x) = 0, we can
examine the eigenvalues of the Hessian to determine whether the critical point is a local
maximum, local minimum, or saddle point. When the Hessian is positive definite
1
, the
point is a local minimum. This can be seen by observing that the directional second
derivative in any direction must be positive, and making reference to the univariate
second derivative test. Likewise, when the Hessian is negative definite
2
, the point is a
local maximum. In multiple dimensions, it is actually possible to find positive evidence
of saddle points in some cases. When at least one eigenvalue is positive and at least
one eigenvalue is negative, we know that x is a local maximum on one cross section of
f but a local minimum on another cross section. See Fig. 4.4 for an example. Finally,
the multidimensional second derivative test can be inconclusive, just like the univariate
version. The test is inconclusive whenever all of the non-zero eigenvalues have the same
sign, but at least one eigenvalue is zero. This is because the univariate second derivative
test is inconclusive in the cross section corresponding to the zero eigenvalue.
The Hessian can also be useful for understanding the performance of gradient descent.
When the Hessian has a poor condition number, gradient descent performs poorly. This
1
all its eigenvalues are positive
2
all its eigenvalues are negative
63