comes to a total of 19,899 values. This means that using the directed graphical model
reduced our number of parameters by a factor of more than 50!
In general, to model n discrete variables each having k values, the cost of the single
table approach scales like O(k
n
), as we’ve observed before. Now suppose we build a
directed graphical model over these variables. If m is the maximum number of variables
appearing in a single conditional probability distribution, then the cost of the tables
for the directed model scales like O(k
m
). As long as we can design a model such that
m << n, we get very dramatic savings.
In other words, so long as each variable has few parents in the graph, the distri-
bution can be represented with very few parameters. Some restrictions on the graph
structure (e.g. it is a tree) can also guarantee that operations like computing marginal
or conditional distributions over subsets of variables are efficient.
It’s important to realize what kinds of information can be encoded in the graph, and
what can’t be. The graph just encodes simplifying assumptions about which variables
are conditionally independent from each other. It’s also possible to make other kinds of
simplifying assumptions. For example, suppose we assume Bob always runs the same
regardless of how Alice performed. (In reality, he might get overconfident and lazy and
run slower if Alice was especially fast, or he might try especially hard to make up time
if Alice was slower than usual). Then the only effect Alice has on Bob’s finishing time
is we must add Alice’s finishing time to the total amount of time we think Bob needs
to run. This could let us make a model with O(k) efficiency instead of O(k
2
). However,
note that t
0
and t
1
are still directly dependent with this assumption. It is a useful
assumption, but not one that can be encoded in a graph. It’s just part of our definition
of the conditional probability distribution p(t
1
| t
0
).
9.2.2 Undirected models
Directed graphical models make a lot of intuitive sense in many situations. Usually
these are situations where we understand the causality, and the causality only flows in
one direction. (Such as in the relay race example: earlier runners cause effects on the
finishing times of later runners; later runners do not cause any effects on the finishing
times of earlier runners) Not all situations fit this model, so directed models are not
always the most natural formalism to use.
Suppose we want to model a distribution over three binary variables–whether or not
you are sick, whether or not your coworker is sick, and whether or not your roommate is
sick. As in the relay race example, we can still make simplifying assumptions about the
kinds of interactions that take place: assuming that your coworker and your roommate
do not know each other, it is very unlikely that one of them will give the other a disease
such as a cold directly. This event can be seen as so rare that it is acceptable not to
model it. However, it is reasonably likely that either of them could give you a cold, and
that you could pass it on to the other. We can model the indirect transmission of a cold
from your coworker to your roommate by modeling the transmission of the cold from
your coworker to you and the transmission of the cold from you to your roommate.
In this case, it’s just as easy for you to cause your roommate to get sick as it is for
106