
CHAPTER 14. MONTE CARLO METHODS
the parenthetical is hard to parse is the term āstochastic matrixā deļ¬ned any-
where? make sure itās in the index at least whoever ļ¬nishes writing this section
should also ļ¬nish making the math notation consistent terms in this section need
to be in the index
We can think of Ļ as a vector (with the probability for each possible value x
in the element indexed by x, Ļ(x)) and T as a corresponding stochastic matrix
(with row index x
0
and column index x), i.e., with non-negative entries that sum
to 1 over elements of a column. Then, the above equation becomes
T Ļ = Ļ
an eigenvector equation that says that Ļ is the eigenvector of T with eigenvalue
1. It can be shown (Perron-Frobenius theorem) that this is the largest possible
eigenvalue, and the only one with value 1 under mild conditions (for example
T (x
0
| x) > 0). We can also see this equation as a ļ¬xed point equation for the
update of the distribution associated with each step of the Markov chain. If we
start a chain by picking x
0
ā¼ p
0
, then we get a distribution p
1
= T p
0
after one
step, and p
t
= Tp
tā1
= T
t
p
0
after t steps. If this recursion converges (the chain
has a so-called stationary distribution), then it converges to a ļ¬xed point which
is precisely p
t
= Ļ for t ā ā, and the dynamical systems view meets and agrees
with the eigenvector view.
This condition guarantees that repeated applications of the transition sam-
pling procedure donāt change the distribution over the state of the Markov chain.
Running the Markov chain until it reaches its equilibrium distribution is called
āburning inā the Markov chain.
Unfortunately, there is no theory to predict how many steps the Markov chain
must run before reaching its equilibrium distribution
1
, nor any way to tell for sure
that this event has happened. Also, even though successive samples come from the
same distribution, they are highly correlated with each other, so to obtain multiple
samples one should run the Markov chain for many steps between collecting each
sample. Markov chains tend to get stuck in a single mode of Ļ(x) for several
steps. The speed with which a Markov chain moves from mode to mode is called
its mixing rate. Since burning in a Markov chain and getting it to mix well may
take several sampling steps, sampling correctly from an EBM is still a somewhat
costly procedure.
TODO: mention Metropolis-Hastings
Of course, all of this depends on ensuring Ļ(x) = p(x) . Fortunately, this
is easy so long as p(x) is deļ¬ned by an EBM. The simplest method is to use
Gibbs sampling, in which sampling from T (x
0
| x) is accomplished by selecting
1
although in principle the ratio of the two leading eigenvalues of the transition operator gives
us some clue, and the largest eigenvalue is 1.
441