VII-VIII This book is a monograph [a scholarly piece of
writing of essay or book length on a specific, often limited subject -JLJ] on practical aspects of probabilistic networks
(a.k.a. probabilistic graphical models) and is intended to provide a comprehensive guide for practitioners that wish to understand,
construct, and analyze decision support systems based on probabilistic networks, including a number of different
variants of Bayesian networks and influence diagrams... inference in probabilistic networks is based on a well-established
theoretical foundation of probability calculus and decision theory, and hence provides mathematically coherent methods for
deriving conclusions under uncertainty, where multiple sources of information and complex interaction patterns are involved
p.3 Solving an intellectually challenging task can be characterized
as a process of deriving conclusions (new pieces of knowledge) by manipulating a (large) body of knowledge,
typically including definitions of entities (objects, concepts, events, phenomena, etc.), relations among them, and observations
of states (values) of some of the entities.
p.3 By formulating the physician's knowledge in an appropriate formal (computer) language for which
there exist methods for making inferences to manipulate pieces of knowledge formulated in this language, the reasoning conducted
by the physician can be automated and carried out by a computer. [JLJ - perhaps this concept can be used to "teach"
a machine how to "play" a game]
p.4 Randomness and uncertain judgment [are] inherent in most real-world decision problems.
We therefore need a method (paradigm) that supports representation of quantitative measures of uncertain statements
and a method for combining the measures such that reasoning and decision making under uncertainty can be automated.
p.6 A rule-based system, like any other knowledge representation scheme, represents a certain part
of the world (the problem domain) only up to some precision. This implies that certain (causal) mechanisms
might be ignored as being unimportant for the precision (or level of detail) at which conclusions need to be drawn...
Violating the "causal direction" in formulating rules is, however, not advisable.
p.10 The construction of a Bayesian network thus runs in two phases. First,
given the problem at hand, one identifies the relevant variables and the (causal) relations among them. The
resulting DAG [acyclic directed graph] specifies a set of dependence and independence assumptions that will be enforced on
the joint probability distributions... one for each '"family"... of the DAG.
A Bayesian network can be constructed manually, (semi-) automatically from data, or through a combination
of a manual and a data driven process, where partial knowledge about a structure as well as parameters (i.e., conditional
probabilities) blend with statistical information extracted from databases of cases (i.e., previous joint observations of
values of the variables)... Extensive guidance on how to manually construct a probabilistic network is the core of
this book.
p.17 Probabilistic networks are graphical models of (causal) interactions among a set of variables,
where the variables are represented as vertices (nodes) of a graph and the interactions (direct dependencies) as directed
edges (links or arcs) between the vertices. Any pair of unconnected vertices of such a graph indicates (conditional)
independence between the variables represented by these vertices under particular circumstances that can easily be read from
the graph. Hence, probabilistic networks capture a set of (conditional) dependencies and independence properties associated
with the variables represented in the network.
Graphs have proven themselves an intuitive language for representing such dependence
and independence statements, and thus provide excellent language for communicating and discussing dependence and independence
relations among problem-domain variables.
p.24, 25 Causality plays an important role in the process of constructing probabilistic network
models. There are a number of reasons why proper modeling of causal relations is important or helpful... one does
not have to construct a model where the links can be interpreted as causal relations, it just makes the model much more intuitive,
eases the process of getting the dependence and independence relations right, and significantly eases the process of eliciting
the conditional probabilities of the model.
p.49 The single most important key to efficient inference in probabilistic networks is the ability to take
advantage of the distributive law (i.e., to find optimal (or near optimal) sequences in which the variables are marginalized
out)... Variables of a probabilistic network that have no descendants and are never observed are called barren variables,
as they provide no information relevant for the inference process... and may hence be removed from the network.
p.63 Many real-life situations can be modeled as a domain of entities represented as random variables
in a probabilistic network. A probabilistic network is a clever graphical representation of dependence and independence
relations between random variables... A probabilistic network represents and processes probabilistic knowledge...The
graphical representation of a probabilistic network describes knowledge of a problem domain in a precise manner. The graphical
representation is intuitive and easy to comprehend, making it an ideal tool for communication of domain knowledge between
experts, users, and systems. For these reasons, the formalism of probabilistic networks is becoming an increasingly
popular knowledge representation for reasoning and decision making under uncertainty.
p.74 Decision Making Under Uncertainty
The framework of influence diagrams (Howard & Matheson
1981) is an effective modeling framework for representation and analysis of (Bayesian) decision making under uncertainty.
Influence diagrams provide a natural representation for capturing the semantics of decision making with a minimum
of clutter and confusion for the decision maker (Shachter & Peot 1992). Solving a decision problem amounts to
(i) determining an optimal strategy that maximizes the expected utility for the decision maker and (ii) computing the maximal
expected utility of adhering to this strategy.
p.107 We build knowledge bases in order to formulate our knowledge about a certain problem domain in a structured
way. The purpose of the knowledge base is to support our reasoning about events and decisions in a domain with inherent
uncertainty... An expert system consists of a knowledge base and an inference engine... The knowledge
base is the Bayesian network or influence diagram, whereas the inference engine consists of a set of generic methods that
applies the knowledge formulated in the knowledge base on task-specific data sets, known as evidence, to compute solutions
to queries against the knowledge base. The knowledge base alone is of limited use if it cannot be applied to update
our belief about the state of the world or to identify (optimal) decisions in the light of new knowledge... the knowledge
bases we consider are probabilistic networks.
p.111 Given a query and a set of evidence variables, the contribution from a nuisance variable does
not depend on the observed values of the evidence variables. Hence, if a query is to be solved with respect to multiple
instantiations over the evidence variables, then the nuisance variables (and barren variables) may be eliminated in
a preprocessing step to obtain the relevant network (Lin and Druzdzel 1997). The relevant network consists
of target variables, evidence variables, and variables on paths between target and evidence variables only.
p.122 Probabilistic inference is the task of updating our belief about the state of the world in light of
evidence. Evidence on discrete variables, be it hard or soft evidence, is treated as in the case of discrete Bayesian networks.
p.124 We build decision models in order to support efficient reasoning and decision making under
uncertainty in a given problem domain. Reasoning under uncertainty is the task of computing our updated beliefs in
(unobserved) events given observations on other events whereas decision making under uncertainty is the task of identifying
the (optimal) decision strategy for the decision maker given observations.
p.137 We build decision models in order to support efficient reasoning and decision making under
uncertainty in a given problem domain. Reasoning under uncertainty is the task of computing our updated beliefs in
(observed) events given observations on other events [i.e., evidence] whereas decision making under uncertainty is the task
of identifying the (optimal) decision strategy for the decision maker given observations.
p.144 There are many good reasons to choose probabilistic networks as the modeling framework,
including the coherent and mathematically sound handling of uncertainty and normative decision making, the automated construction
and adaptation of models based on data, the intuitive and compact representation of cause-effect relations and (conditional)
dependence and independence relations, the efficient solution of queries given evidence, and the ability to support a whole
range of analyses of the results produced, including conflict analysis, sensitivity analysis (with respect to both parameters
and evidence), and value-of-information analysis.
p.145 There are four ground characteristics that constitute the foundation of (normative) probabilistic
models:
Graphical representation of causal relations among domain entities (variables).
The notion of causality is central in probabilistic networks, meaning that a directed link from one variable
to another (usually) signifies a causal relation among the two...
Strengths of probabilistic relations are represented by (conditional) probabilities.
Causal relations among variables are seldom deterministic in the sense that if the cause is present, then the effect can be
concluded by certainty...
Preferences are represented as utilities on a numerical scale. All sorts of preferences
that are relevant in a decision scenario must be expressed on a numerical scale...
Recommendations are based on the principle of maximal expected utility. As the
reasoning performed by a probabilistic network is normative, the outcome (e.g., most likely diagnosis or suggested decision)
is guaranteed to provide a recommended course of action that maximizes the expected utility to the extent that the model is
a 'true' representation of problem domain.
p.146-147 we might set up the following criteria to be met for probabilistic networks to potentially be
a good candidate technology for solving the problem at hand:
Well defined variables. The variables and events (i.e., possible values of the
variables) of the problem domain need to be well-defined...
Highly structured problem domain with identifiable cause-effect relations. Well-established
and detailed knowledge should be available concerning structure (variables and (causal) links), conditional probabilities,
and utilities (preferences). In general, the structure needs to be static (i.e., not changing over time), although re-estimation
of structure (often the usage of learning tools; see chapter 8) can be performed...
Uncertainty associated with the cause-effect relations. If all cause-effect relations
are deterministic (i.e., all conditional probabilities either take the value 0 or the value 1), more efficient technologies
probably exist...
Repetitive problem solving. Often, for the (sometimes large) effort invested
in constructing a probabilistic network to pay off, the problem solved should be of a repetitive nature. A physician
diagnosing respiratory diseases, an Internet company profiling their customers, and a bank deciding to grant loans to its
customers are all examples of problems that need to be solved over and over again, where
the involved variables and causal mechanisms are invariant over time, and only the values observed for (some of) the variables
differ...
Maximization of expected utility. For the probabilistic network framework to be
a natural choice, the problem at hand should most probably contain an element of decision making involving a desire to maximize
the expected utility of a decision.
p.148 Constraint variables (see chapter 7) also depend deterministically on its parent variables. Such "artificial"
variables can be handy in many modeling situations, for example, reducing the number of conditional probabilities needed to
be specified or enforcing constraints on the combinations of states among a subset of the variables.
p.149-150 Identifying the variables of a problem domain is not always an easy task, and requires some practicing...
one needs to focus on the problem (possible diagnoses, classifications, predictions, decisions, etc. to be
made) and the relevant pieces of information for solving the problem... In the process of identifying the
variables it can be useful to distinguish between different types of variables:
Problem variables: These are the variables of interest; i.e., those for which we want to
compute their posterior probability given observations of values for information variables (see next item). Usually,
the values of problem variables cannot be observed; otherwise, there would not be any point in constructing a probabilistic
network in the first place...
Information variables: These are the variables for which observations may be available,
and which can provide information relevant for solving the problem. Two sub-categories of information variables can
be distinguished: Background information... Symptom information...
Mediating variables: These are unobservable variables for which posterior probabilities
are not of immediate interest, but which play important roles for achieving correct conditional independence and dependence
properties and/or efficient inference.
p.152-153 Given an initial set of variables identified for a given problem domain, the
next step in the model construction process concerns the identification and verification of (causal) links of the model.
p.170 When constructing a model (probabilistic or not) it is crucial to realize that real-world
problem domains are usually embedded in a complex reality involving interaction with numerous different aspects of the real
world in a way that can never be fully captured in a model. Also, the internal causal mechanisms of a problem domain
can almost always only be approximately described in a model. Thus it is important to bear in mind that all models
are wrong, but that some might be useful.
p.171"In his writings, William of Occam (or Ockham) (1284-1347) stressed the Aristotelian
principle that entities must not be multiplied beyond what is necessary. This principle became known as Occam's
Razor or the law of parsimony; a problem should be stated in its basic and simplest terms. In science, the simplest
theory that fits the facts of the problem is the one that should be selected. This rule is interpreted to mean that the simplest
of two or more competing theories is preferable and that an explanation for unknown phenomena should first be attempted in
terms of what is already known.
p.174 we pointed to the fact that the best models are usually constructed through deliberate
use of the law of parsimony (or Occam's razor).
p.220 An influence diagram is useful for solving problems of decision making under uncertainty.
The variables of an influence diagram consist of a mixture of random variables and decision variables. The
random variables are used for representing uncertainty while the decision variables represent entities under the full control
of the decision maker. The state of a random variable may be observable or hidden while the state of a decision variable is
under the full control of the decision maker.
p.261 It is difficult or even impossible to construct models covering all aspects of (complex) problem
domains of interest. A model is therefore most often an approximation of a problem domain that is designed to be
applied according to the assumptions as determined by the background condition or context of the model. If a model
is used under circumstances not consistent with the background condition, the results will in general be unreliable.