xiii In the last decade, owing partly to advances in graphical models,
causality has undergone a major transformation: from a concept shrouded in mystery into a mathematical object with well-defined
semantics and well-founded logic... Put simply, causality has been mathematized.
This book provides a systematic account of this transformation, addressed
primarily to readers in the fields of statistics, artificial intelligence, philosophy, cognitive science, and the health and
social sciences.
xiii-xiv Ten years ago... I was working within the empiricist tradition. In this tradition, probabilistic
relationships constitute the foundations of human knowledge, whereas causality simply provides useful ways of abbreviating
and organizing intricate patterns of probabilistic relationships. Today, my view is quite different. I now take causal
relationships to be the fundamental building blocks both of physical reality and of human understanding of that reality, and
I regard probabilistic relationships as but the surface phenomena of the causal machinery that underlies and propels our understanding
of the world.
p.42-43 An autonomous intelligent system attempting to build a workable model of its environment
cannot rely exclusively on preprogrammed causal knowledge; rather, it must be able to translate direct observations to cause-and-effect
relationships. However, given that statistical analysis is driven by covariation, not causation, and assuming that
the bulk of human knowledge derives from uncontrolled observations, we must still identify the clues that prompt people
to perceive causal relationships in the data. We must also find a computational model that emulates this perception...
the barometer falls before it rains yet does not cause the rain. In fact, the statistical and philosophical
literature has adamantly warned analysts that, unless one knows in advance all causally relevant factors or unless
one can carefully manipulate some variables, no genuine causal inferences are possible.
p.43 [N. Cliff quoted] No computer program can take account of variables that are not in the analysis
p.44 A causal structure serves as a blueprint for forming a "causal model" - a precise specification
of how each variable is influenced by its parents in the DAG [Directed Acyclic Graph], as in the structural equation
model of (1.40). Here we assume that Nature is at liberty to impose arbitrary functional relationships by introducing arbitrary
(yet mutually independent) disturbances. These disturbances reflect "hidden" or unmeasurable conditions and exceptions that
Nature chooses to govern by some undisclosed probability function.
p.49-50 the search for the minimal model then boils down to reconstructing the structure of a DAG
D from queries about conditional independencies, assuming that those independencies reflect d-separation
conditions [conditional independence] in some underlying DAG D0... the reconstructed DAG will not be unique...
Such graphical representation was introduced in Verma and Pearl (1990) under the name pattern. A pattern is a partially
directed DAG, in particular, a graph in which some edges are directed and some are nondirected. The directed edges represent
arrows that are common to every member in the equivalent class of D0, while the undirected edges represent
ambivalence
p.60 Returning to causal inference, our question then amounts to assessing whether there are enough
discriminating clues in a typical learning environment (say, in skill acquisition tasks or in epidemiological studies) to
allow us to make reliable discriminations between cause and effect. This can only be determined by experiments -
once we understand the logic behind the available clues and once we learn to piece these clues together coherently in large
programs that tackle real-life problems.
p.61 The prevailing paradigm in the machine-learning literature has been to define each hypothesis (or theory,
or concept) as a subset of observable instances; once we observe the entire extension of this subset, the hypothesis is defined
unambiguously. This is not the case in causal modeling. Even if the training sample exhausts the hypothesis subset (in our
case, this corresponds to observing P precisely), we are still left with a vast number of equivalent theories, each stipulating
a drastically different set of causal claims. Therefore, fitness to data is an insufficient criterion for validating
causal theories... Causal models should therefore be chosen by a criterion that challenges their stability
against changing conditions, and these show up in the data in the form of virtual control variables.
p.61 Although few have challenged the principle of minimality (to do so would amount to challenging scientific
induction), objections have been voiced against the way we defined the objects of minimization - namely, causal models. Definition
2.2.2 assumes that the stochastic terms ui are mutually independent, an assumption that endows each model
with the Markov property: conditioned on its parents (direct causes), each variable is independent of its nondescendants.
This implies, among the other ramifications of d-separation, several familiar relationships between causation and association
that are usually associated with Reichenbach's (1956) principle of common cause - for example, "no
correlation without causation", "causes screen off their effects," "no action at a distance."
p.62 Ironically, perhaps the strongest evidence for the ubiquity of the Markov condition [that is, conditioned
on its parents (direct causes), each variable is independent of its nondescendants] can be found in the philosophical program
known as "probabilistic causality" (see Section 7.5), of which Cartwright is a leading proponent. In this
program, causal dependence is defined as a probabilistic dependence that persists after conditioning on some set of
relevant factors... This definition rests on the assumption that conditioning on the right set of factors enables
one to suppress all spurious associations
p.112 The ability to predict the effect of interventions without enumerating those interventions
in advance is one of the main advantages we draw from causal modeling and one of the main functions served by the
notion of causation.