The Case for Using Probabilistic Knowledge

p.38"The positional estimate should not be a general-purpose affair; it should be specific to each given situation... Capablanca pointed out that the basis for a positional estimate is the control of fields. Control of fields does not mean control of the whole board, but control of only those fields that may be used in the impending play. Therefore, one must strive for control of the field consisting of those trajectories in which the pieces can move, but have not moved yet.

At the node in the search tree where we find ourselves at a given moment, we must unravel all those sheaves of trajectories which have not yet been developed and determine which player has control of the majority of the fields consisting of the trajectories not yet used in the play. This allows us to forecast the result of the play - the result of a search which, in particular, had to be renounced at the terminal nodes of the variations for lack of resources."

p.38"We shall show later that the positional estimate allows us to solve the question of priorities... Thus the positional estimate, with the development of the sheaf of trajectories, should be produced at every node in the search tree . We may assert that the squares under control define the usable mobility and maneuverability of the pieces. Better maneuverability of pieces often also determines the positional superiority."

p.39"It was assumed that the control of squares involves only those pieces that lie at a distance of one move from the controlled square (and a blockading piece must lie on the square itself).

p.39"To sum up: The positional estimate is computed at every node of the search tree. The procedure is substantially more complex than the procedure for computing a material score. All sheaves of trajectories included in the play but not yet used (in whole or in part) are taken into account in computing the positional value."

p.39"we see that we may get a first approximation on those trajectories on which the pieces have not yet had time to move.

Thus the basic factor in the positional estimate is proportional to the ratio Kw / Kb, where Kw and Kb are the numbers of squares controlled by White and Black, respectively."

p.1-2"Reasoning about any realistic domain always requires that some simplification be made. The very act of preparing knowledge to support reasoning requires that we leave many facts unknown, unsaid, or crudely summarized... An alternative to the extremes of ignoring or enumerating exceptions is to summarize them, i.e., provide some warning signs to indicate which areas of the minefield are more dangerous than others. Summarization is essential if we wish to find a reasonable compromise between safety and speed of movement. This book studies a language in which summaries of exceptions in the minefield of judgment and belief can be represented and processed... One way to summarize exceptions is to assign to each proposition a numerical measure of uncertainty and then combine these measures according to uniform syntactic principles"

What if we base our search focus for our computer chess program on moves that create promising positions? We would evaluate our pieces on their effectiveness in the game. We can build a model of 'piece effectiveness' and base it on the relationships each piece has with the other pieces. We would then need to find a way to build this model in a way that was fast but reasonably accurate. We might choose to use probability as our basis for evaluation since certainty takes time and we do not have much time to generate our evaluation.

p.12"Our goal is to make intensional systems [Intensional systems deal with uncertainty in a context sensitive manner. They try to model the interdependencies and relevance relationships of the variables in the system. - JLJ] operational by making relevance relationships explicit, thus curing the impotence of declarative statements such as P(B|A)=p [JLJ - notation for the statement: the probability of B given A is p]. As mentioned earlier, the reason one cannot act on the basis of such declarations is that one must first make sure that other items in the knowledge base are irrelevant to B and hence can be ignored. The trick, therefore, is to encode knowledge in such a way that the ignorable is recognizable, or better yet, that the unignorable is quickly identified and is readily accessible... In effect, what network representations offer is a dynamically updated list of all currently valid licenses to ignore, and licenses to ignore constitute permissions to act."

p.313"6.3.1 Information Sources and Their Values: It is generally accepted that information is a useful commodity, that acting in an informed fashion is preferable to acting under ignorance. This is why people accumulate information when it is available and purchase information when it is scarce. People also possess strong intuition about whether one information source is more valuable (more reliable and pertinent) than another... The value of any information source is defined as the difference between the utilities of two optimal strategies, one providing the freedom of choosing different actions for different source outcomes, the other providing no such freedom. This criterion can be used to rate the usefulness of various information sources and to decide whether a piece of information is worth acquiring."

p.318"6.4.1 Focusing Attention: Control is the process of scheduling the activation of information sources, both external (e.g., acquiring new input) and internal (e.g., invoking rules or updating beliefs). Decision analysis provides a framework for scheduling all computational activities so as to focus on specific goals - updating the belief in a target set of hypotheses, shifting attention to a new set, and terminating the activity once we reach an acceptable level of confidence in a hypothesis.

The main reason for focusing attention on a select set of target hypotheses is to economize the acquisition of new data. Let us imagine a subset S of the nodes (normally the leaves) that are known to be sensory or observable nodes for a given problem domain (e.g., laboratory tests in medical diagnosis). In general, the instantiation of any of these sensory nodes incurs a positive cost, and the utility of the information they convey might be insufficient to justify this cost. Thus, it is important to decide which node in S should be instantiated first, based on the information it contributes to the decision at hand, i.e., the target node. If utility information is available, then the value node naturally is the target. If we lack utility information, we assign priorities to pending information sources based on their degree of informativeness."

p.326"The task of controlling reasoning activities was formulated as that of finding an optimal schedule for activating information sources. Decision theory provides a framework for assessing the knowledge and computations needed to perform this optimization precisely. It turns out that the knowledge required is often unavailable... Subgoaling strategies emerge as a reasonable compromise; they are computationally tractable... they still provide a focused way of acquiring information."

A model is an effective tool for simplifying a complex situation, and it allows us to manipulate the model in a way that allows strategic insight.

p.1"Real-world problems... are often described as complex... Furthermore... a variety of factors... tend to distort our judgment of a situation.

One way of trying to better handle reality - in spite of these limitations and biases - is to use representations of reality called models."

A model might be constructed to satisfy the needs of an individual or organization, perhaps to obtain insight for planning purposes.

p.2"the purpose of a model is to satisfy the need of some person or organization having a particular interest in one or several aspects of the object, but not in a comprehensive understanding of its properties."

p.3"Definition 2 (Model) A model is a representation of an object, expressed in a specific language and in a usable form, and is intended to satisfy one or several need(s) of some stakeholder(s) of the object."

We use models to produce information and to help us reason about uncertainty.

p.3"Models are thus used to produce information (evaluations, appropriate decisions or actions) on the basis of some input information, considered as valid. This process is called inference."

p.4"the way a model is constructed obviously depends on several factors, such as the nature of the object, the stakeholder's need(s), the available knowledge and information, the time and resources devoted to the model elaboration, etc. Nevertheless, we may identify two invariants in the process of constructing a model... Splitting the object into elements... Saying how it works: the modeling language"

One way to construct a model is with a graph or picture showing the relationships among the various parts.

p.5"most successful or unsuccessful attempts of mankind to overcome the complexity of reality have involved, at some stage, a form [of] a graphical representation."

p.5"During the modeling process, the exact circumstances in which the model is going to be used (especially, what input data the model will process) are, to a large extent, unknown. Also, some of the attributes remain unknown when the model is used: the attributes which are at some stage unknown are more conveniently described by variables."

We can use probability to model our doubt in the state of influence of one variable over another.

p.7"Doubt is a typically human faculty which can be considered as the basis of any scientific process... The construction of a probabilistic model requires the systematic examination of all possible values of each variable... it is hard to imagine a more precise representation of an object: each of the theoretically possible configurations of the object is considered, and to each of them is associated one element of the infinite set [0;1]. [JLJ - a probability between 0 and 1]"

If we can split our model into several subsets (which are ideally independent from each other), it becomes easier to operate.

p.9"Following Descartes's precept of dividing the difficulties, one may try to split the set of n variables into several subsets of smaller sizes which can relatively be analyzed separately... Then the modeling problem can be transformed into two simpler ones."

A Bayesian network is a graphical representation of the influences that one variable has over another variable.

p.11"In the lorry [truck] driver and doped athlete examples, we have identified the most direct and significant influences between the variables, and simplified the derivation of the joint probability distribution. By representing these influences in a graphical form, we now introduce the notion of [a] Bayesian network."

We can draw conclusions from a Bayesian network via the propagation of evidence.

p.26"Inference [section title] The most crucial task of an expert system is to draw conclusions based on new evidence. The mechanism of drawing conclusions in a system that is based on a probabilistic graphical model is known as propagation of evidence. Propagation of evidence involves essentially updating probabilities given observed variables of a model (also known as belief updating)."

We might instead use a rule-based system to model our network of interacting pieces on the chessboard. We might feel safe because we do not have to address uncertainty. But this does not guarantee that our rule-based model is any better, or that it is correct. It might just simply be wrong because we just do not know enough to be certain how we stand.

p.31"Rule-based systems capture heuristic knowledge from the experts and allow for a direct construction of a classification relation... Rule-based systems may be expected to perform well for problems that cannot be modeled using causality as a guiding principle, or when a problem is too complicated to be modeled as a causal graph."

Bayesian networks are effective tools for modeling processes of medical reasoning. Decision making in an emergency room is quite similar to choosing moves in a chess game. Bayesian networks have been successfully created to aid decision making in an emergency room. Perhaps they can be used as well to aid decision making in a chess game.

p.32"Bayesian networks are recognized as a convenient tool for modeling processes of medical reasoning. There are several features of Bayesian networks that are specially useful in modeling in medicine. One of these features is that they allow us to combine expert knowledge with existing clinical data."

p.54"The use of Bayesian networks in biomedical sciences can be traced as far back as the early decades of the 20th century, when Sewell Wright developed path analysis to aid the study of genetic inheritance. Neglected for many years, Bayesian Networks were reintroduced in the early 1980s as an analytic tool capable [of] encoding the information acquired from human experts. Compared to decision-rule based 'expert-systems' that were limited in their ability to reason under uncertainty, Bayesian networks were probabilistic expert systems that used probability theory to account for uncertainty in automated reasoning for diagnostic and prognostic tasks. This type of probabilistic reasoning was made possible by the development of algorithms to propagate probabilistic information through a network."

p.71"Bayesian networks provide a flexible modeling framework to describe complex systems in a modular way."

p.84"The BN [Bayesian network] can provide useful information for crime risk factor analysis."

p.185"Bayesian networks provide a general and effective framework for knowledge representation and reasoning under uncertainty."

p.210"Once the BN [Bayesian network] has been constructed, it is enlarged by including decision and utility nodes, thus transforming it into an influence diagram."

p.384"Although Bayesian networks are certainly not the Holy Grail of artificial intelligence, they definitely are a solid basis for knowledge engineering. They allow us to use various sources of knowledge, even contradicting ones, to make knowledge embedded in data explicit, to use this knowledge for various types of problem solving, and finally to improve it through online learning.

Artificial intelligence remains a challenge for the next decades. Indeed, intelligence cannot be limited to inference and learning, but requires action. Embedding artificial intelligence systems in the real world is probably the next challenge of artificial intelligence, far beyond simply connecting an offline 'artificially intelligent system' to external sensors and actuators."

Let's now look at quotes from Bayesian Networks and Influence Diagrams by Kjaerulff and Madsen.

If you wish to look at practical applications for Bayesian networks, this book is a good place to start.

VII-VIII"This book is a monograph [a scholarly piece of writing of essay or book length on a specific, often limited subject -JLJ] on practical aspects of probabilistic networks (a.k.a. probabilistic graphical models) and is intended to provide a comprehensive guide for practitioners that wish to understand, construct, and analyze decision support systems based on probabilistic networks, including a number of different variants of Bayesian networks and influence diagrams... inference in probabilistic networks is based on a well-established theoretical foundation of probability calculus and decision theory, and hence provides mathematically coherent methods for deriving conclusions under uncertainty, where multiple sources of information and complex interaction patterns are involved"

We build our probabilistic networks with a goal in mind: to solve an intellectually challenging task, or to derive conclusions from a body of knowledge.

p.3"Solving an intellectually challenging task can be characterized as a process of deriving conclusions (new pieces of knowledge) by manipulating a (large) body of knowledge, typically including definitions of entities (objects, concepts, events, phenomena, etc.), relations among them, and observations of states (values) of some of the entities."

We first identify the relevant variables and the causal relations among them. For our chess program, this is essentially the influence each piece exerts in the game - the pressure (constrained or unconstrained) it exerts on other pieces and the constraints it places on the ability of the enemy pieces to pressure our pieces.

p.10"The construction of a Bayesian network thus runs in two phases. First, given the problem at hand, one identifies the relevant variables and the (causal) relations among them. The resulting DAG [acyclic directed graph] specifies a set of dependence and independence assumptions that will be enforced on the joint probability distributions... one for each 'family'... of the DAG.
A Bayesian network can be constructed manually, (semi-) automatically from data, or through a combination of a manual and a data driven process, where partial knowledge about a structure as well as parameters (i.e., conditional probabilities) blend with statistical information extracted from databases of cases (i.e., previous joint observations of values of the variables)... Extensive guidance on how to manually construct a probabilistic network is the core of this book."

An important step in evaluation of piece pressure is determining which pieces have no influence on the piece in question and can be eliminated from calculations of effectiveness.

p.49"The single most important key to efficient inference in probabilistic networks is the ability to take advantage of the distributive law (i.e., to find optimal (or near optimal) sequences in which the variables are marginalized out)... Variables of a probabilistic network that have no descendants and are never observed are called barren variables, as they provide no information relevant for the inference process... and may hence be removed from the network."

p.63"Many real-life situations can be modeled as a domain of entities represented as random variables in a probabilistic network. A probabilistic network is a clever graphical representation of dependence and independence relations between random variables... A probabilistic network represents and processes probabilistic knowledge...The graphical representation of a probabilistic network describes knowledge of a problem domain in a precise manner. The graphical representation is intuitive and easy to comprehend, making it an ideal tool for communication of domain knowledge between experts, users, and systems. For these reasons, the formalism of probabilistic networks is becoming an increasingly popular knowledge representation for reasoning and decision making under uncertainty."

p.74"Decision Making Under Uncertainty
The framework of influence diagrams (Howard & Matheson 1981) is an effective modeling framework for representation and analysis of (Bayesian) decision making under uncertainty. Influence diagrams provide a natural representation for capturing the semantics of decision making with a minimum of clutter and confusion for the decision maker (Shachter & Peot 1992). Solving a decision problem amounts to (i) determining an optimal strategy that maximizes the expected utility for the decision maker and (ii) computing the maximal expected utility of adhering to this strategy."

p.107"We build knowledge bases in order to formulate our knowledge about a certain problem domain in a structured way. The purpose of the knowledge base is to support our reasoning about events and decisions in a domain with inherent uncertainty... An expert system consists of a knowledge base and an inference engine... The knowledge base is the Bayesian network or influence diagram, whereas the inference engine consists of a set of generic methods that applies the knowledge formulated in the knowledge base on task-specific data sets, known as evidence, to compute solutions to queries against the knowledge base. The knowledge base alone is of limited use if it cannot be applied to update our belief about the state of the world or to identify (optimal) decisions in the light of new knowledge... the knowledge bases we consider are probabilistic networks."

p.111"Given a query and a set of evidence variables, the contribution from a nuisance variable does not depend on the observed values of the evidence variables. Hence, if a query is to be solved with respect to multiple instantiations over the evidence variables, then the nuisance variables (and barren variables) may be eliminated in a preprocessing step to obtain the relevant network (Lin and Druzdzel 1997). The relevant network consists of target variables, evidence variables, and variables on paths between target and evidence variables only."

p.122"Probabilistic inference is the task of updating our belief about the state of the world in light of evidence. Evidence on discrete variables, be it hard or soft evidence, is treated as in the case of discrete Bayesian networks."

p.124"We build decision models in order to support efficient reasoning and decision making under uncertainty in a given problem domain. Reasoning under uncertainty is the task of computing our updated beliefs in (unobserved) events given observations on other events whereas decision making under uncertainty is the task of identifying the (optimal) decision strategy for the decision maker given observations."

p.137"We build decision models in order to support efficient reasoning and decision making under uncertainty in a given problem domain. Reasoning under uncertainty is the task of computing our updated beliefs in (observed) events given observations on other events [i.e., evidence] whereas decision making under uncertainty is the task of identifying the (optimal) decision strategy for the decision maker given observations."

p.144"There are many good reasons to choose probabilistic networks as the modeling framework, including the coherent and mathematically sound handling of uncertainty and normative decision making, the automated construction and adaptation of models based on data, the intuitive and compact representation of cause-effect relations and (conditional) dependence and independence relations, the efficient solution of queries given evidence, and the ability to support a whole range of analyses of the results produced, including conflict analysis, sensitivity analysis (with respect to both parameters and evidence), and value-of-information analysis."

p.145"There are four ground characteristics that constitute the foundation of (normative) probabilistic models:

Graphical representation of causal relations among domain entities (variables). The notion of causality is central in probabilistic networks, meaning that a directed link from one variable to another (usually) signifies a causal relation among the two...

Strengths of probabilistic relations are represented by (conditional) probabilities. Causal relations among variables are seldom deterministic in the sense that if the cause is present, then the effect can be concluded by certainty...

Preferences are represented as utilities on a numerical scale. All sorts of preferences that are relevant in a decision scenario must be expressed on a numerical scale...

Recommendations are based on the principle of maximal expected utility. As the reasoning performed by a probabilistic network is normative, the outcome (e.g., most likely diagnosis or suggested decision) is guaranteed to provide a recommended course of action that maximizes the expected utility to the extent that the model is a 'true' representation of problem domain."

p.146-147"we might set up the following criteria to be met for probabilistic networks to potentially be a good candidate technology for solving the problem at hand:

Well defined variables. The variables and events (i.e., possible values of the variables) of the problem domain need to be well-defined...

Highly structured problem domain with identifiable cause-effect relations. Well-established and detailed knowledge should be available concerning structure (variables and (causal) links), conditional probabilities, and utilities (preferences). In general, the structure needs to be static (i.e., not changing over time), although re-estimation of structure (often the usage of learning tools; see chapter 8) can be performed...

Uncertainty associated with the cause-effect relations. If all cause-effect relations are deterministic (i.e., all conditional probabilities either take the value 0 or the value 1), more efficient technologies probably exist...

Repetitive problem solving. Often, for the (sometimes large) effort invested in constructing a probabilistic network to pay off, the problem solved should be of a repetitive nature. A physician diagnosing respiratory diseases, an Internet company profiling their customers, and a bank deciding to grant loans to its customers are all examples of problems that need to be solved over and over again, where the involved variables and causal mechanisms are invariant over time, and only the values observed for (some of) the variables differ...

Maximization of expected utility. For the probabilistic network framework to be a natural choice, the problem at hand should most probably contain an element of decision making involving a desire to maximize the expected utility of a decision."

p.148"Constraint variables (see chapter 7) also depend deterministically on its parent variables. Such 'artificial' variables can be handy in many modeling situations, for example, reducing the number of conditional probabilities needed to be specified or enforcing constraints on the combinations of states among a subset of the variables."

p.149-150"Identifying the variables of a problem domain is not always an easy task, and requires some practicing... one needs to focus on the problem (possible diagnoses, classifications, predictions, decisions, etc. to be made) and the relevant pieces of information for solving the problem... In the process of identifying the variables it can be useful to distinguish between different types of variables:

Problem variables: These are the variables of interest; i.e., those for which we want to compute their posterior probability given observations of values for information variables (see next item). Usually, the values of problem variables cannot be observed; otherwise, there would not be any point in constructing a probabilistic network in the first place...

Information variables: These are the variables for which observations may be available, and which can provide information relevant for solving the problem. Two sub-categories of information variables can be distinguished: Background information... Symptom information...

Mediating variables: These are unobservable variables for which posterior probabilities are not of immediate interest, but which play important roles for achieving correct conditional independence and dependence properties and/or efficient inference."

p.152-153"Given an initial set of variables identified for a given problem domain, the next step in the model construction process concerns the identification and verification of (causal) links of the model."

p.170"When constructing a model (probabilistic or not) it is crucial to realize that real-world problem domains are usually embedded in a complex reality involving interaction with numerous different aspects of the real world in a way that can never be fully captured in a model. Also, the internal causal mechanisms of a problem domain can almost always only be approximately described in a model. Thus it is important to bear in mind that all models are wrong, but that some might be useful."

p.171"In his writings, William of Occam (or Ockham) (1284-1347) stressed the Aristotelian principle that entities must not be multiplied beyond what is necessary. This principle became known as Occam's Razor or the law of parsimony; a problem should be stated in its basic and simplest terms. In science, the simplest theory that fits the facts of the problem is the one that should be selected. This rule is interpreted to mean that the simplest of two or more competing theories is preferable and that an explanation for unknown phenomena should first be attempted in terms of what is already known."

p.174"we pointed to the fact that the best models are usually constructed through deliberate use of the law of parsimony (or Occam's razor)."

p.220"An influence diagram is useful for solving problems of decision making under uncertainty. The variables of an influence diagram consist of a mixture of random variables and decision variables. The random variables are used for representing uncertainty while the decision variables represent entities under the full control of the decision maker. The state of a random variable may be observable or hidden while the state of a decision variable is under the full control of the decision maker."

p.261"It is difficult or even impossible to construct models covering all aspects of (complex) problem domains of interest. A model is therefore most often an approximation of a problem domain that is designed to be applied according to the assumptions as determined by the background condition or context of the model. If a model is used under circumstances not consistent with the background condition, the results will in general be unreliable."