x
0
and column index x), i.e., with non-negative entries that sum to 1 over elements of a
column. Then, the above equation becomes
T π = π
an eigenvector equation that says that π is the eigenvector of T with eigenvalue 1. It can
be shown (Perron-Frobenius theorem) that this is the largest possible eigenvalue, and
the only one with value 1 under mild conditions (for example T(x
0
| x) > 0). We can also
see this equation as a fixed point equation for the update of the distribution associated
with each step of the Markov chain. If we start a chain by picking x
0
∼ p
0
, then we
get a distribution p
1
= Tp
0
after one step, and p
t
= Tp
t−1
= T
t
p
0
after t steps. If this
recursion converges (the chain has a so-called stationary distribution), then it converges
to a fixed point which is precisely p
t
= π for t → ∞, and the dynamical systems view
meets and agrees with the eigenvector view.
This condition guarantees that repeated applications of the transition sampling pro-
cedure don’t change the distribution over the state of the Markov chain. Running the
Markov chain until it reaches its equilibrium distribution is called “burning in” the
Markov chain.
Unfortunately, there is no theory to predict how many steps the Markov chain must
run before reaching its equilibrium distribution, nor any way to tell for sure that this
event has happened. Also, even though successive samples come from the same distri-
bution, they are highly correlated with each other, so to obtain multiple samples one
should run the Markov chain for many steps between collecting each sample. Markov
chains tend to get stuck in a single mode of π(x) for several steps. The speed with which
a Markov chain moves from mode to mode is called its mixing rate. Since burning in
a Markov chain and getting it to mix well may take several sampling steps, sampling
correctly from an EBM is still a somewhat costly procedure.
TODO: mention Metropolis-Hastings
Of course, all of this depends on ensuring π(x) = p(x) . Fortunately, this is easy
so long as p(x) is defined by an EBM. The simplest method is to use Gibbs sampling,
in which sampling from T (x
0
| x) is accomplished by selecting one variable x
i
and
sampling it from p conditioned on its neighbors in G. It is also possible to sample
several variables at the same time so long as they are conditionally independent given
all of their neighbors.
TODO: discussion of mixing example with 2 binary variables that prefer to both
have the same state IG’s graphic from lecture on adversarial nets
TODO: refer to this figure in the text:
TODO: refer to this figure in the text
15.1.1 Markov Chain Theory
TODO
State Perron’s theorem
DEFINE detailed balance
280