ME290M, Spring 1999

ME290M
Expert Systems in Mechanical Engineering

Spring 1999, T-Th 12:30-2:00 pm
1165 Etcheverry Hall, Course Control No. 56369 http://best.me.berkeley.edu/~aagogino/me290m/s99


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]


The Logic of Probability Theory

1 Historical Perspective

Hacking [1975] speculates that gambling may be one of the earliest invention of humankind. Talus bones, the knucklebone or heel bone of a running animal, have been found in many ancient archeological sites. With animals such as deer, horse, oxen, or sheep, this bone is so formed that when it is thrown on a level surface it can only come up in 4 ways. Well polished and often engraved examples are regularly found on the sites of ancient Egypt. Tomb illustrations and scoring boards confirm their use in some kind of gambling.

The ancient Mesoamerican cultures (Mayas, Toltecs, and Aztecs) were also into gaming. They would risk their lives, their families, and all their possessions in games of chance. However, there is no evidence that the Mesoamericans had a concept of probability or if they did there is no evidence that they used it to their advantage. They believed in extreme fatalism. Although there were many events that appeared "random", they believed that the gods controlled everything. Thus whether you won at a game or not, depended less on your skill in calculating the odds and acting accordingly, than on the whims of the gods. A belief in extreme fatalism and causality precludes the development of the intellectual concepts of randomness and probability.

Hacking claims that probability mathematics is of Arabic origin. In fact, the word for chance, namely "hazard", is an Arabic word. The Indians, who were traders with the Arabic cultures, provide the first accounts of concepts of uncertainty and prediction in their written folklore. Consider the 4th century Indian story of Nala:

 

The god, Kali, had taken an interest in a young princess who is promised to marry Nala. In an act of jealousy, Kali is said to take possession of the body and soul of Nala. This is revealed to the public when Nala in a sudden frenzy of gambling loses his kingdom and wanders demented for many years.

Then on the advice of a snake-king that he meets in the forest, Nala takes a job as a charioteer to the foreign potentate Ptuparna. On a journey the king flaunts his mathematical skills by estimating the number of leaves and fruit on two great branches of a large tree.

Ptuparna is reported to have done this on the basis of a single twig that he examines. He estimates that the tree has 2095 fruit. Nala counts all night and is amazed by the accuracy of the king's guess. The king is proud of his skill and responds with

I of dice possess the science
and in numbers are thus skilled

The king agrees to teach Nala this science of gaming in exchange for some lessons in horsemanship. At the end of his training, Nala vomits out all of Kali's poisons, returns to his ever-faithful bride, and regains his kingdom in a single but fierce gambling game.

 

The story of Nala provides strong evidence that 4th century India recognized probability (in terms of dicing or gambling) as a science. It also demonstrates that the Indians recognized the important connection between "dicing" and estimating the number of leaves on a tree.

Even though the story of Nala was circulated in Europe in Sanskrit for many centuries, the Europeans did not make the critical connection between probability and estimation until the 17th century. Like the Aztecs, many point to the European belief in strict causality and determinism, as the cause for this myopia.

Formal probability theory, didn't develop until late into the 17th century with the mathematical and philosophical contributions of Pascal, Huygen, and Bernoulli.

2 Reasoning with Uncertain Beliefs

Statements about the world made by intelligent agents are often beliefs rather than absolute facts. In writing rules about the world in classical logic (e.g., A and A ==> B) we are assuming that out knowledge about the statements are known with certainty (at least we are modeling the knowledge as such). What if the statement A is uncertain or the belief in the rule (IF A B) is not absolute? If we do accept some relaxation of our absolute world of Aristotelian logic, surely we must allow for some measure of "degree of strength of belief" in the statements. Probability theory provides one formal logic to both quantify this uncertainty and a formal logic for propagating it. To extend first-order logic to reason with uncertainty we need to introduce the concept of an uncertain variable in probability theory. The axioms and theorems for logically reasoning with uncertain variables will be introduced in this lecture and associated algebra for manipulating discrete random variables in the Influence Diagram lecture.

3 Foundations of Probability

In the discussion of event algebra in the previous chapter, like our representations in classical logic, we talked about the occurrences or nonoccurrence of events. Many engineering problems are often not effectively representable by simple true-false statements concerning multiple events. Uncertainty occurs in engineering measurements, in design parameters and specifications, in estimation of future events (including failure events), and in modeling complex problems. Probability theory is one means of representing subjective knowledge of uncertainty and provides a mutlivariate logic for combining uncertain evidence and making inferences.

What then is probability? There are at least two points of view on the subject, sometimes called the frequentists and the Bayesian positions. The frequentist point of view is the one often presented in courses on statistics. A probability is simply the frequency of events given a large representative sample size. IEOR Professor Richard Barlow, has a license plate that says "probability does not exist". He represents a Bayesian viewpoint. In this view, a probability is not something physical; you can not directly measure a probability. In the words of the Italian mathematician de Finetti: "probability is a state of mind, not a state of matter".

Engineering-Economic Systems Professor Ron Howard of Stanford University proposes the following definition:

Definition: A probability is a measure assigned to an individual's subjective assessment of the likelihood of an event based on his state of information. The probability measure (1) depends on the state of information, (2) may change with new information, (3) may vary among individuals, and (4) corresponds to the areas on a Venn diagram.

4 Conditional Probability Notation

I will use the following notation to represent probabilities: Pr(A|B,S) is the probability of an event A given B has occurred for a given state of information S.

5 Axioms of Probability

In addition to the axioms of events given previously, only three new axioms are needed to provide a framework for assigning real numbers to the likelihood of occurrences of events for purposes of inference and computation. For purposes of this class, only three axioms will be needed. The first axiom requires that the probability measure is non-negative and the second normalizes the measure to one for certain events. The third axiom requires that the probabilities of two mutually exclusive events be additive. (The Dempster-Shafer theory of evidence is based on a relaxation of the second axiom.)

(1) For any event A, the probability measure is a real non-negative number such that Pr(A|S) >0.

(2) Pr(I|S) = 1, where I = the universe of all events.

(3) If AB=ø (i.e., if A and B are mutually exclusive) then Pr(A+B|S) = Pr(A|S) + Pr(B|S).

Consider what the axioms imply in terms of Venn diagrams of events.

Pr(I|S) = 1

Pr(A|S), Pr(B|S) > 0

Pr(A+B|S) = Pr(A|S) + Pr(B|S) if A and B are mutually exclusive

Figure 1: Venn Diagram of the Probability Axioms

The first axiom restricts the area in any region to be non-negative. The second axiom requires that the area of the entire diagram, corresponding to the universe of all events "I", be normalized to one. The third axiom implies that the area contained in two nonoverlapping regions be the sum of their individual areas.

6 Theorems of Probability

6.1 Fundamental Theorems:

(1) Pr(A'|S) = 1 - Pr(A|S)

(2) Pr(ø|S) = 0

(3) Pr(A + B|S) = Pr(A|S) + Pr(B|S) - Pr(AB|S)

6.2 Conditional Probability

If A and B are events with nonzero probabilities then Pr(B|A,S) is called the conditional probability of B given A and is defined as:

Pr(AB|S)
Pr(B|A, S) = æææææ
Pr(A|S)
Pr(AB|S)
Pr(A|B, S) = æææææ
Pr(B|S)
The conditional probability can be shown in the Venn diagram as the ratio of the area shared by both events (intersection) to the area of the conditioned event. We can look at conditional probability as a means of "renormalizing" the probability measure when new information shows that an event is known with certainty (i.e., with a probability equal to one).

Figure 2: Conditional Probability

6.3 Independence

If the probability of the product of two events is the product of their probabilities, then the two events are independent.

Pr(AB|S) = Pr(A|S) Pr(B|S) implies independence

In other words, A is probabilistically independent of B if having knowledge about B gives you no new information about A and visa versa. Thus

Pr(AB|S)Pr(A|S)Pr(B|S)
Pr(B|A, S) = æææææ = ææææææææ = Pr(B|S)
Pr(A|S)Pr(A|S)

 

Influence Diagrams: Conditional probabilistic influence and independence can be explicitly represented as expansions of joint probability distributions via the use of influence diagrams. The strong piece of information is the lack of an influence or the identification of (conditional) independence. We must assume that influences may be present if no information indicating independence is provided.

 

Definition: Consider two (uncertain) state variables x and y. y does not influence x if x is independent of y given S, i.e., if Pr(x|y,S) = Pr(x|S).

Figure 3: Two Node Influence Diagram

6.4 Chain Rule:

Figure 4: Venn Diagram of Three Events to Illustrate the Chain Rule

The concept of conditional probability can be expanded to accommodate any number of events. For example, with three events A,B, and C:

Pr(ABC|S) = Pr(AB|C,S)Pr(C|S) = Pr(A|BC,S)Pr(B|C,S)Pr(C|S) [Equation 1]

Or Pr(ABC|S) = Pr(BC|A,S)Pr(A|S) = Pr(B|CA,S)Pr(C|A,S)Pr(A|S) [Equation 2]

In fact, there are six different combinations possible (as shown in the next chapter).

Figure 5: Three Influence Diagram Expansions for Pr(ABC|S)

Expansions of Pr(ABC|S) in equations [1,2] are graphically shown in the influence diagram in Figures 5a&b. In fact, there are six different combinations possible if conditional independence is not considered (as shown in the Influence Diagram chapter). If conditional independence exists, any of the three arcs may be deleted. For example, in Figure 5c the joint expansion of Pr(ABC|S) is as described in equation [3] below where B is independent of C:

Pr(ABC|S) = Pr(AB|C,S) Pr(C|S) = Pr(A|BC,S) Pr(B|S)Pr(C|S) [Equation 3]<

7 Probability Trees

A probability tree is a succession of circular nodes (uncertain state variables) with branches. The branches emanating from each node represent the different possible values of the uncertain variables associated with the node. As shown in the probability tree in Figure 6, in representing equation [1], the first assignment in the probability tree is the probability node for the uncertain event C, because it is the only one whose unconditional probability is known. In this case, event C has two outcomes, C1 and C2, which are called the sample space of C,
WC = (C1 , C2). Next the conditional probability of B given C is added for each possible outcome of C. The sample space of event B is WB = (B1 , B2). Finally, the conditional probability of A given B and C is added to the ends of each branch involving A and B. Here event A has three outcomes and the sample space is WA = (A1 , A2, A3). The unconditional joint probability distribution is obtained by multiplying the values along the branches of the tree. The corresponding probability tree representation for equation [2] can be obtained by reversing the order of the events.

Although a probability tree and an influence diagram may correspond to the same joint probability distribution, the influence diagram is much more compact and the conditional independence between variables is explicit in the graphical representation. For example, the probability tree in Figure 6 also corresponds to the expansion in equation [3] and the influence diagram in Figure 5c (in addition to equation [1] and Figure 5a), because the tree does not explicitly differentiate between levels of conditional independence.

Figure 6: Probability Tree - Pr(ABC|S) = Pr(A|BC,S)Pr(B|C,S)Pr(C|S)

 

8 Bayes' Theorem

Bayes' theorem is the single most important relation in probabilistic expert systems. It forms the basis for probabilistic inference. Consider N mutually exclusive and collectively exhaustive events: A1, A2, . . . An

Ai Aj = ø, i­j Œ [1,N] (mutually exclusive)

A1 + A2 + . . . + An = I (collectively exhaustive)

Consider another event B that may overlap several events Ai, i=1,2 . . 6.

Figure 7

If we know all the probabilities Pr(Ai|S) and Pr(B|Ai,S) (i.e., we know all of the corresponding area relationships in the Venn diagram above), then we know

Pr(AiB|S) = Pr(Ai|S)Pr(B|Ai,S)

with a corresponding influence diagram of:

Figure 8: Pr(AB|S) = Pr(B|A,S) Pr (A|S)

 

Suppose we pose the question: what is Pr(Ai|B,S)?

In medical diagnostics, the mutually exclusive and collectively exhaustive events Ai could represent various distinct diseases. The event B might represent a symptom that, by itself, could occur in many of the diseases Ai. The probability Pr(B|Ai,S) is then the probability that symptom B will occur if disease Ai is present. The desired information in medical diagnostic expert systems is: what is the probability of disease Ai given that symptom B is present ? (i.e., what is Pr(Ai|B,S)?) In mechanical diagnostics, the symptoms might be evidence of a failure or of equipment malfunctioning. This may be in the form of sensor readings or human observation. Ai might be the set of mutually exclusive but collectively exhaustive possible causes of the failure.

How do we find the inverse relation Pr(Ai|B,S)? Let us find this relation from the basic rules of probability given in Section 6.

From the conditional probability in Section 6.2:

Pr(Ai, B|S)
Pr(Ai|B, S) = æææææ
Pr(B|S)


Pr(B| Ai, S)Pr(Ai|S)
= æææææææ [equation 3]
Pr(B|S)

Because the events Ai together are collectively exhaustive:

I = A1 + A2 + . . . An

Thus in event algebra,

B = IB = ( A1 + A2 + . . . + An) B

= A1B + A2B + . . . + AnB

Because the events Ai are mutually exclusive: (Ai, B)(Ai, B) = ø for i unequal to j, the probability of B is the sum of the probabilities of all combinations of Ai and B:

Pr(B|S) = Pr(A1, B|S) + Pr(A2, B|S) + . . . + Pr(An, B|S)

N
= S Pr(Aj, B|S)
j=1

N
= S Pr(B|Aj,S) Pr(Aj|S)[Equation 4]
j=1

Substituting equation [4] into [3] gives Bayes' Theorem:

Pr(B|Ai,S)Pr(Ai|S)
Pr(Ai, B |S) = ææææææææææ [Equation 5]
N
S Pr(B|Aj,S) Pr(Aj|S)
j=1

9 Example

In diagnostic expert systems, Bayes' theorem provides the mechanism to infer the probability of a failure or sequence of failures in view of the specific evidence and from the a priori probabilities of the failures and conditional probabilities relating the observations to the possible failures. For example, suppose the sequence of failures Fi is one of N mutually exclusive failure diagnoses possible for the failure of a butterfly valve to open, D is the data vector of specific evidence and sensor readings available, and H (for history) is the common state of knowledge. Then if Pr(Fi|S) is the a priori probability of the ith failure sequence given the present state of knowledge, the posterior probability of a failure given the observed data may be expanded as follows (using a change of variable from equations [3] and [5]):

Pr(D, Fi | S)
Pr(Fi|D,S) = ææææææææææ [Equation 6]
Pr(D | S)
Or

Pr(D|Fi,S)Pr(Fi|S)
Pr(Fi|D,S) = ææææææææææ [Equation 7]
N
S Pr(D | Fj,S)Pr(Fj | S)
j=1
Suppose that D = event that the valve is not open as evidenced by a human observers and the only possible causes of failure are: P =power supply is faulty and S = switch is faulty. A review of maintenance data gives the following frequency estimates of the probability of a failure in any particular day::

Pr(P) = 1/10

Pr(S) = 1/50

(Note: In order to simplify the nomenclature at this point, I will drop the conditioning term for the state of information "S" at this point. It should be implied in all probability representations following, however.)

An expert who is familiar with these valves gives you the following rules:

  1. If the power supply or switch is faulty, the valve will not open. Thus Pr(D|P) = Pr(D|S) = 1.

  2. The only events that could cause the valve to fail are faults in either the power or in the switch. Thus Pr(D | P',S') = 0.

  3. If the power supply is faulty, it may cause a power surge that could disable the switch. The expert gives the following subjective estimate of this event (switch fails given the power fails) to be 40% . Thus Pr(S | P) = 0.4.

The influence diagram in Figure 9 shows how the probabilistic information is represented by the expert from the statements above. Note that state variable "f" is independent of the valve data "d" given that information on the states of "s" and "p" are known.

Figure 9: Influence Diagram of Valve Diagnostic Example (Sample Space of State Variables: s, p, d and f are Ws={S,S'}, Wp={P,P'}, Wd={D,D'}, and Wf={F,F'})

A Venn diagram conditioned on the fact D (that the valve has failed) is given in Figure 10. Note that the events P and S are collectively exhaustive but not mutually exclusive.

Figure 10: P and S are Collectively Exhaustive but not Mutually Exclusive

The first step is to separate the events into mutually exclusive events, Fi as shown in Figure 11, by defining the following:

F1 = SP' = The switch fails but the power does not

F2 = S'P = The power fails but the switch does not

F3 = SP = Both the power and the switch fail

Figure 11: Fi are both Mutually Exclusive and Collectively Exhaustive

The probability of each failure event can be derived from the maintenance data and the expert's rules.

Pr(F1) = Pr(SP') = Pr(S | P') Pr(P')

In order to find Pr(S | P') we must use some of the laws of probability described in this chapter.

Pr(S) = Pr(SP) + Pr(SP') = 1/50

= Pr(S | P) Pr(P) + Pr(S | P') Pr(P') = 1/50

= (0.4) (1/100) + Pr(S | P') (1 - 1/100) = 1/50

Solving for Pr(S | P') gives

Pr(S | P') = (1/50 - 4/1000) (100/99) = 16/990 = 0.01616

Pr(F1) = Pr(SP' | S) = Pr(S | P',S) Pr(P' | S) =(16/990) (99/100) = 16/1000 = 0.016

Pr(F2) = Pr(S'P | S) = Pr(S' | P,S) Pr(P | S) = (1 - Pr(S |P,S)) Pr(P | S) = (1 - 0.4) (1/100) = 6/1000

Pr(F3) = Pr(SP | S) = Pr(S | P,S) Pr(P | S) = (0.4) (1/100) = 4/1000

The probability of the valve failing to open on any particular day is:

N=3

Pr(D) = S Pr(D|Fj) Pr(Fj) = 0.016 + 0.006 + 0.004 = 0.026

j=1

From Bayes Rule, we can find the likelihood of each failure cause given that the valve does not open.

Pr(F1 | D) = Pr(D | F1) Pr(F1) / Pr(D) = (1) (.016)/(0.026) = 0.6154

Pr(F2 | D) = Pr(D | F2) Pr(F2) / Pr(D) = (1) (.006)/(0.026) = 0.2308

Pr(F3 | D) = Pr(D | F3) Pr(F3) / Pr(D) = (1) (.004)/(0.026) = 0.1538

Question: Why is the sum of Fi (i=1,2,3) equal to one? We are really interested in the likelihood that the power supply or the switch is the cause of the failure or in terms of probability we want Pr(P | D) and Pr(S | D). How can we obtain this from Pr(Fi | D)?

Answer: Events Fi (i=1,2,3) are collectively exhaustive and mutually exclusive. Thus by Axioms 2 and 3, the sum of their probabilities is equal to one. From Figure 7, Event S is the union of mutually exclusive events F1 and F3 and P is the union of mutually exclusive events F2 and F3. Thus Pr(S | D) = Pr(F1 | D) + Pr(F3 | D) = 0.6154 + 0.1538 = 0.7692 and Pr(P | D) = Pr(F2 | D) + Pr(F3 | D) = 0.2308 + 0.1538 = 0.3846.

10 Exercises

Like most primitive concepts, probability can be best introduced by an example.

The most familiar example is the flip of a coin. You may have already formulated an idea about the chances of getting a heads or a tail. If I flip it, what will come up on top. How many say heads? How many say tails? There is no trick here. We have learned to expect a 50-50 chance. Until I actually flip several times, though, you might expect some kind of trick. Or I might have a coin that is biased one way or another. What if instead of flipping the coin, I spin it, does that change the odds.

Now consider something a little different: a thumbtack. What are the odds that the point will come up on top? What about the head? First write down what you think the respective probabilities are. Then do some experiments. Are you willing to update your estimates?

(2) The Bayesian Pair of Bolts

Suppose that a "pick and place" robot with two manipulators has been designed to pick two bolts of the same size from a bin and place each pair in separate containers on a belt for assembly. Suppose also that there are three different kinds of pairs of bolts that are used in the assembly of this product:

Pair 1 two bolts both made of alloy A

Pair 2 two bolts both made of alloy B

Pair 3 one of alloy A and the other of alloy B

The alloys look identical and the only way to determine whether one is A or B requires performing a test. Suppose that one of the pairs of bolts falls on the ground and the operator picks it up. The probability of getting any specific pair is 1/3. Now suppose she randomly picks one of the bolts from the pair, performs a test on it and determines that it is made of alloy A. What is the probability that the other bolt is of alloy A?

Hint: A common erroneous answer to this question is 1/2. But this is incorrect! It is important to represent the events properly.

Define the following events:

P1: The event that pair 1 was picked.

P2: The event that pair 2 was picked.

P3: The event that pair 3 was picked.

A1: The event that the first member of the pair is of alloy A.

A2: The event that the second member of the pair is of alloy A.

B1: The event that the first member of the pair is of alloy B.

B2: The event that the second member of the pair is of alloy B.

What we want to know is Pr(A2|A1,S). We know that Pr(P1|S) = Pr(P2|S) = Pr(P3|S) = 1/3. Using Bayes' theorem and conditional probability find Pr(A2|A1,S).

(3) The Curious Competitor

Of three finalists in a design competition, named A, B, and C, one will be announced the official winner the next day. The panel has met and the Judge has made his final decision. Competitor A assigns a probability of 1/3 to the event that any one of the three competitors will win. Being curious, Competitor A tries to get some information out of the Judge without asking directly. He knows that regardless of who wins, at least one of the other two competitors will be losers (both if A is the winner). He asks the Judge to give him the name of one of the losers. The Judge, however, refuses to grant the request, arguing that if A knew which of the other two was a loser, the probability that A would be the winner would increase to 1/2 instead of 1/3. Do you agree with the Judge's argument? If so use conditional probability to verify his statement. If not use the axioms and theorems of probability and event algebra to find the probability that A is the winner given that the Judge names B or C as one of the losers. Assume that if both B and C are losers, the Judge would be equally likely to pick either one.

Hint: In problems of this kind, it is important to define precisely the events. It may be helpful to define the following events:

A = Competitor A wins

B = Competitor B wins

C = Competitor C wins

Ja = Judge says B loses

Jb = Judge says C loses

 

11. References:

Hacking, I., The Emergence of Probability, Cambridge University Press, 1975.

Heckerman, D., "Probabilistic Interpretations for MYCIN's Certainty Factors," Uncertainty in Artificial Intelligence, Kanal, L. and Lemmer, J., Eds., North Holland, 1986.

Howard, R.A., "Probability", Chapter 38 in Mathematics Associated with Systems Engineering, pp. 3-47, Cambridge, MA: MIT Press.

Raiffa, H., Decision Analysis, Menlo Park, CA: Addison-Wesley Publishing Co., 1970 (Chaps. 1 &2).

Savage, L.J., The Foundations of Statistics, New York: Wiley, 1954.

Shortliffe, E.H. and B.G. Buchanan, "A model of Inexact Reasoning in Medicine," Mathematical Biosciences, No. 23, 1975, pp. 351-379.

Siddal, J. N., Probabilistic Engineering Design, New York: Marcel Dekker, Inc, 1983, Chapter 2: "Concepts and Theorems of Probability", pp-5-40.


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]

Last updated: 8 March 99
Send Comments to: Alice Agogino, aagogino@me.berkeley.edu
Copyright © 1999 Alice Agogino; All Rights Reserved.