ments presented here are summarized from or inspired by (Pearl, 1988).
Nearly all activities require some ability to reason in the presence of uncertainty. In
fact, beyond mathematical statements that are true by definition, it is difficult to think
of any proposition that is absolutely true or any event that is absolutely guaranteed to
occur.
One source of uncertainty is incomplete observability. When we cannot observe
something, we are uncertain about its true nature. In machine learning, it is often the
case that we can observe a large amount of data, but there is not a data instance for
every situation we care about. We are also generally not able to observe directly what
process generates the data. Since we are uncertain about what process generates the
data, we are also uncertain about what happens in the situations for which we have not
observed data points. Lack of observability can also give rise to apparent stochasticity.
Deterministic systems can appear stochastic when we cannot observe all of the variables
that drive the behavior of the system. For example, consider a game of Russian roulette.
The outcome is deterministic if you know which chamber of the revolver is loaded. If
you do not know this important information, then it is a game of chance. In many
cases, we are able to observe some quantity, but our measurement is itself uncertain.
For example, laser range finders may have several centimeters of random error.
Uncertainty can also arise from the simplifications we make in order to model real-
world processes. For example, if we discretize space, then we immediately become
uncertain about the precise position of objects: each object could be anywhere within
the discrete cell that we know it occupies.
Conceivably, the universe itself could have stochastic dynamics, but we make no
claim on this subject.
In many cases, it is more practical to use a simple but uncertain rule rather than a
complex but certain one, even if our modeling system has the fidelity to accommodate
a complex rule. For example, the simple rule “Most birds fly” is cheap to develop and
is broadly useful, while a rule of the form, “Birds fly, except for very young birds that
have not yet learned to fly, sick or injured birds that have lost the ability to fly, flightless
species of birds including the cassowary, ostrich, and kiwi. . . ” is expensive to develop,
maintain, and communicate, and after all of this effort is still very brittle and prone to
failure.
Given that we need a means of representing and reasoning about uncertainty, it is
not immediately obvious that probability theory can provide all of the tools we want
for artificial intelligence applications. Probability theory was originally developed to
analyze the frequencies of events. It is easy to see how probability theory can be used
to study events like drawing a certain hand of cards in a game of poker. These kinds
of events are often repeatable, and when we say that an outcome has a probability p
of occurring, it means that if we repeated the experiment (e.g., draw a hand of cards)
infinitely many times, then proportion p of the repetitions would result in that outcome.
This kind of reasoning does not seem immediately applicable to propositions that are
not repeatable. If a doctor analyzes a patient and says that the patient has a 40% chance
of having the flu, this means something very different—we can not make infinitely many
replicas of the patient, nor is there any reason to believe that different replicas of the
36