
CHAPTER 19. APPROXIMATE INFERENCE
data.
In Sec. 18.2 we saw that one possible explanation for the role of dream sleep in
human beings and animals is that dreams could provide the negative phase samples
that Monte Carlo training algorithms use to approximate the negative gradient of
the log partition function of undirected models. Another possible explanation for
biological dreaming is that it is providing samples from
p
(
h, v
) which can be used
to train an inference network to predict
h
given
v
. In some senses, this explanation
is more satisfying than the partition function explanation. Monte Carlo algorithms
generally do not perform well if they are run using only the positive phase of the
gradient for several steps then with only the negative phase of the gradient for
several steps. Human beings and animals are usually awake for several consecutive
hours then asleep for several consecutive hours. It is not readily apparent how this
schedule could support Monte Carlo training of an undirected model. Learning
algorithms based on maximizing
L
can be run with prolonged periods of improving
q
and prolonged periods of improving
θ
, however. If the role of biological dreaming
is to train networks for predicting
q
, then this explains how animals are able to
remain awake for several hours (the longer they are awake, the greater the gap
between
L
and
log p
(
v
), but
L
will remain a lower bound) and to remain asleep
for several hours (the generative model itself is not modified during sleep) without
damaging their internal models. Of course, these ideas are purely speculative, and
there is no hard evidence to suggest that dreaming accomplishes either of these
goals. Dreaming may also serve reinforcement learning rather than probabilistic
modeling, by sampling synthetic experiences from the animal’s transition model,
on which to train the animal’s policy. Or sleep may serve some other purpose not
yet anticipated by the machine learning community.
19.5.2 Other Forms of Learned Inference
This strategy of learned approximate inference has also been applied to other
models. Salakhutdinov and Larochelle (2010) showed that a single pass in a
learned inference network could yield faster inference than iterating the mean field
fixed point equations in a DBM. The training procedure is based on running the
inference network, then applying one step of mean field to improve its estimates,
and training the inference network to output this refined estimate instead of its
original estimate.
We have already seen in Sec. 14.8 that the predictive sparse decomposition
model trains a shallow encoder network to predict a sparse code for the input.
This can be seen as a hybrid between an autoencoder and sparse coding. It is
possible to devise probabilistic semantics for the model, under which the encoder
657