![]() | ME290M
Spring 1999, T-Th 12:30-2:00 pm
|
| [ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat] |
The ancient Mesoamerican cultures (Mayas, Toltecs, and Aztecs) were also into gaming. They would risk their lives, their families, and all their possessions in games of chance. However, there is no evidence that the Mesoamericans had a concept of probability or if they did there is no evidence that they used it to their advantage. They believed in extreme fatalism. Although there were many events that appeared "random", they believed that the gods controlled everything. Thus whether you won at a game or not, depended less on your skill in calculating the odds and acting accordingly, than on the whims of the gods. A belief in extreme fatalism and causality precludes the development of the intellectual concepts of randomness and probability.
Hacking claims that probability mathematics is of Arabic origin. In fact, the word for chance, namely "hazard", is an Arabic word. The Indians, who were traders with the Arabic cultures, provide the first accounts of concepts of uncertainty and prediction in their written folklore. Consider the 4th century Indian story of Nala:
The god, Kali, had taken an interest in a young princess who is promised to marry Nala. In an act of jealousy, Kali is said to take possession of the body and soul of Nala. This is revealed to the public when Nala in a sudden frenzy of gambling loses his kingdom and wanders demented for many years.
Then on the advice of a snake-king that he meets in the forest, Nala takes a job as a charioteer to the foreign potentate Ptuparna. On a journey the king flaunts his mathematical skills by estimating the number of leaves and fruit on two great branches of a large tree.
Ptuparna is reported to have done this on the basis of a single twig that he examines. He estimates that the tree has 2095 fruit. Nala counts all night and is amazed by the accuracy of the king's guess. The king is proud of his skill and responds with
I of dice possess the science
and in numbers are thus skilled
The king agrees to teach Nala this science of gaming in exchange for some lessons in horsemanship. At the end of his training, Nala vomits out all of Kali's poisons, returns to his ever-faithful bride, and regains his kingdom in a single but fierce gambling game.
The story of Nala provides strong evidence that 4th century India recognized probability (in terms of dicing or gambling) as a science. It also demonstrates that the Indians recognized the important connection between "dicing" and estimating the number of leaves on a tree.
Even though the story of Nala was circulated in Europe in Sanskrit for many centuries, the Europeans did not make the critical connection between probability and estimation until the 17th century. Like the Aztecs, many point to the European belief in strict causality and determinism, as the cause for this myopia.
Formal probability theory, didn't develop until late into the 17th century with the mathematical and philosophical contributions of Pascal, Huygen, and Bernoulli.
What then is probability? There are at least two points of view on the subject, sometimes called the frequentists and the Bayesian positions. The frequentist point of view is the one often presented in courses on statistics. A probability is simply the frequency of events given a large representative sample size. IEOR Professor Richard Barlow, has a license plate that says "probability does not exist". He represents a Bayesian viewpoint. In this view, a probability is not something physical; you can not directly measure a probability. In the words of the Italian mathematician de Finetti: "probability is a state of mind, not a state of matter".
Engineering-Economic Systems Professor Ron Howard of Stanford University proposes the following definition:
Definition: A probability is a measure assigned to an individual's subjective assessment of the likelihood of an event based on his state of information. The probability measure (1) depends on the state of information, (2) may change with new information, (3) may vary among individuals, and (4) corresponds to the areas on a Venn diagram.
(1) For any event A, the probability measure is a real non-negative number such that Pr(A|S) >0.
(2) Pr(I|S) = 1, where I = the universe of all events.
(3) If AB=ø (i.e., if A and B are mutually exclusive) then Pr(A+B|S) = Pr(A|S) + Pr(B|S).
Consider what the axioms imply in terms of Venn diagrams of events.

Pr(I|S) = 1
Pr(A|S), Pr(B|S) > 0
Pr(A+B|S) = Pr(A|S) + Pr(B|S) if A and B are mutually exclusive
Figure 1: Venn Diagram of the Probability Axioms
The first axiom restricts the area in any region to be non-negative. The second axiom requires that the area of the entire diagram, corresponding to the universe of all events "I", be normalized to one. The third axiom implies that the area contained in two nonoverlapping regions be the sum of their individual areas.
(1) Pr(A'|S) = 1 - Pr(A|S)
(2) Pr(ø|S) = 0
(3) Pr(A + B|S) = Pr(A|S) + Pr(B|S) - Pr(AB|S)
| Pr(AB|S) | |
| Pr(B|A, S) = | æææææ |
| Pr(A|S) |
| Pr(AB|S) | |
| Pr(A|B, S) = | æææææ |
| Pr(B|S) |

Figure 2: Conditional Probability
Pr(AB|S) = Pr(A|S) Pr(B|S) implies independence
In other words, A is probabilistically independent of B if having knowledge about B gives you no new information about A and visa versa. Thus
| Pr(AB|S) | Pr(A|S)Pr(B|S) | ||
| Pr(B|A, S) = | æææææ | = ææææææææ | = Pr(B|S) |
| Pr(A|S) | Pr(A|S) |
Influence Diagrams: Conditional probabilistic influence and independence can be explicitly represented as expansions of joint probability distributions via the use of influence diagrams. The strong piece of information is the lack of an influence or the identification of (conditional) independence. We must assume that influences may be present if no information indicating independence is provided.
Definition: Consider two (uncertain) state variables x and y. y does not influence x if x is independent of y given S, i.e., if Pr(x|y,S) = Pr(x|S).

Figure 3: Two Node Influence Diagram

Figure 4: Venn Diagram of Three Events to Illustrate the Chain Rule
The concept of conditional probability can be expanded to accommodate any number of events. For example, with three events A,B, and C:Pr(ABC|S) = Pr(AB|C,S)Pr(C|S) = Pr(A|BC,S)Pr(B|C,S)Pr(C|S) [Equation 1]
Or Pr(ABC|S) = Pr(BC|A,S)Pr(A|S) = Pr(B|CA,S)Pr(C|A,S)Pr(A|S) [Equation 2]
In fact, there are six different combinations possible (as shown in the next chapter).

Figure 5: Three Influence Diagram Expansions for Pr(ABC|S)
Expansions of Pr(ABC|S) in equations [1,2] are graphically shown in the influence diagram in Figures 5a&b. In fact, there are six different combinations possible if conditional independence is not considered (as shown in the Influence Diagram chapter). If conditional independence exists, any of the three arcs may be deleted. For example, in Figure 5c the joint expansion of Pr(ABC|S) is as described in equation [3] below where B is independent of C:
Pr(ABC|S) = Pr(AB|C,S) Pr(C|S) = Pr(A|BC,S) Pr(B|S)Pr(C|S) [Equation 3]<
Although a probability tree and an influence diagram may correspond to the same joint probability distribution, the influence diagram is much more compact and the conditional independence between variables is explicit in the graphical representation. For example, the probability tree in Figure 6 also corresponds to the expansion in equation [3] and the influence diagram in Figure 5c (in addition to equation [1] and Figure 5a), because the tree does not explicitly differentiate between levels of conditional independence.

Figure 6: Probability Tree - Pr(ABC|S) = Pr(A|BC,S)Pr(B|C,S)Pr(C|S)
Ai Aj = ø, ij
[1,N] (mutually exclusive)A1 + A2 + . . . + An = I (collectively exhaustive)
Consider another event B that may overlap several events Ai, i=1,2 . . 6.

Figure 7
If we know all the probabilities Pr(Ai|S) and Pr(B|Ai,S) (i.e., we know all of the corresponding area relationships in the Venn diagram above), then we know
Pr(AiB|S) = Pr(Ai|S)Pr(B|Ai,S)
with a corresponding influence diagram of:
![]()
Figure 8: Pr(AB|S) = Pr(B|A,S) Pr (A|S)
Suppose we pose the question: what is Pr(Ai|B,S)?
In medical diagnostics, the mutually exclusive and collectively exhaustive events Ai could represent various distinct diseases. The event B might represent a symptom that, by itself, could occur in many of the diseases Ai. The probability Pr(B|Ai,S) is then the probability that symptom B will occur if disease Ai is present. The desired information in medical diagnostic expert systems is: what is the probability of disease Ai given that symptom B is present ? (i.e., what is Pr(Ai|B,S)?) In mechanical diagnostics, the symptoms might be evidence of a failure or of equipment malfunctioning. This may be in the form of sensor readings or human observation. Ai might be the set of mutually exclusive but collectively exhaustive possible causes of the failure.
How do we find the inverse relation Pr(Ai|B,S)? Let us find this relation from the basic rules of probability given in Section 6.
From the conditional probability in Section 6.2:
| Pr(Ai, B|S) | |||
| Pr(Ai|B, S) = | æææææ | ||
| Pr(B|S) | |||
| Pr(B| Ai, S)Pr(Ai|S) | |||
| = | æææææææ | [equation 3] | |
| Pr(B|S) |
Because the events Ai together are collectively exhaustive:
I = A1 + A2 + . . . An
Thus in event algebra,
B = IB = ( A1 + A2 + . . . + An) B
= A1B + A2B + . . . + AnB
Because the events Ai are mutually exclusive: (Ai, B)(Ai, B) = ø for i unequal to j, the probability of B is the sum of the probabilities of all combinations of Ai and B:
| Pr(B|S) = | Pr(A1, B|S) + Pr(A2, B|S) + . . . + Pr(An, B|S) | |||
| N | ||||
| = | S Pr(Aj, B|S) | |||
| j=1 | ||||
| N | ||||
| = | S Pr(B|Aj,S) Pr(Aj|S) | [Equation 4] | ||
| j=1 |
Substituting equation [4] into [3] gives Bayes' Theorem:
| Pr(B|Ai,S)Pr(Ai|S) | |||
| Pr(Ai, B |S) = | ææææææææææ | [Equation 5] | |
| N | |||
| S Pr(B|Aj,S) Pr(Aj|S) | |||
| j=1 |
| Pr(D, Fi | S) | |||
| Pr(Fi|D,S) = | ææææææææææ | [Equation 6] | Pr(D | S) |
| Or | |||
| Pr(D|Fi,S)Pr(Fi|S) | |||
| Pr(Fi|D,S) = | ææææææææææ | [Equation 7] | N |
| S Pr(D | Fj,S)Pr(Fj | S) | |||
| j=1 |
Pr(P) = 1/10
Pr(S) = 1/50
(Note: In order to simplify the nomenclature at this point, I will drop the conditioning term for the state of information "S" at this point. It should be implied in all probability representations following, however.)
An expert who is familiar with these valves gives you the following rules:
The influence diagram in Figure 9 shows how the probabilistic information is represented by the expert from the statements above. Note that state variable "f" is independent of the valve data "d" given that information on the states of "s" and "p" are known.

Figure 9: Influence Diagram of Valve Diagnostic Example (Sample Space of State Variables: s, p, d and f are
Ws={S,S'}, Wp={P,P'}, Wd={D,D'}, and Wf={F,F'})A Venn diagram conditioned on the fact D (that the valve has failed) is given in Figure 10. Note that the events P and S are collectively exhaustive but not mutually exclusive.

Figure 10: P and S are Collectively Exhaustive but not Mutually Exclusive
The first step is to separate the events into mutually exclusive events, Fi as shown in Figure 11, by defining the following:
F1 = SP' = The switch fails but the power does not
F2 = S'P = The power fails but the switch does not
F3 = SP = Both the power and the switch fail

Figure 11: Fi are both Mutually Exclusive and Collectively Exhaustive
The probability of each failure event can be derived from the maintenance data and the expert's rules.
Pr(F1) = Pr(SP') = Pr(S | P') Pr(P')
In order to find Pr(S | P') we must use some of the laws of probability described in this chapter.
Pr(S) = Pr(SP) + Pr(SP') = 1/50
= Pr(S | P) Pr(P) + Pr(S | P') Pr(P') = 1/50
= (0.4) (1/100) + Pr(S | P') (1 - 1/100) = 1/50
Solving for Pr(S | P') gives
Pr(S | P') = (1/50 - 4/1000) (100/99) = 16/990 = 0.01616
Pr(F1) = Pr(SP' | S) = Pr(S | P',S) Pr(P' | S) =(16/990) (99/100) = 16/1000 = 0.016
Pr(F2) = Pr(S'P | S) = Pr(S' | P,S) Pr(P | S) = (1 - Pr(S |P,S)) Pr(P | S) = (1 - 0.4) (1/100) = 6/1000
Pr(F3) = Pr(SP | S) = Pr(S | P,S) Pr(P | S) = (0.4) (1/100) = 4/1000
The probability of the valve failing to open on any particular day is:
N=3
Pr(D) =
S Pr(D|Fj) Pr(Fj) = 0.016 + 0.006 + 0.004 = 0.026j=1
From Bayes Rule, we can find the likelihood of each failure cause given that the valve does not open.
Pr(F1 | D) = Pr(D | F1) Pr(F1) / Pr(D) = (1) (.016)/(0.026) = 0.6154
Pr(F2 | D) = Pr(D | F2) Pr(F2) / Pr(D) = (1) (.006)/(0.026) = 0.2308
Pr(F3 | D) = Pr(D | F3) Pr(F3) / Pr(D) = (1) (.004)/(0.026) = 0.1538
Question: Why is the sum of Fi (i=1,2,3) equal to one? We are really interested in the likelihood that the power supply or the switch is the cause of the failure or in terms of probability we want Pr(P | D) and Pr(S | D). How can we obtain this from Pr(Fi | D)?
Answer: Events Fi (i=1,2,3) are collectively exhaustive and mutually exclusive. Thus by Axioms 2 and 3, the sum of their probabilities is equal to one. From Figure 7, Event S is the union of mutually exclusive events F1 and F3 and P is the union of mutually exclusive events F2 and F3. Thus Pr(S | D) = Pr(F1 | D) + Pr(F3 | D) = 0.6154 + 0.1538 = 0.7692 and Pr(P | D) = Pr(F2 | D) + Pr(F3 | D) = 0.2308 + 0.1538 = 0.3846.
The most familiar example is the flip of a coin. You may have already formulated an idea about the chances of getting a heads or a tail. If I flip it, what will come up on top. How many say heads? How many say tails? There is no trick here. We have learned to expect a 50-50 chance. Until I actually flip several times, though, you might expect some kind of trick. Or I might have a coin that is biased one way or another. What if instead of flipping the coin, I spin it, does that change the odds.
Now consider something a little different: a thumbtack. What are the odds that the point will come up on top? What about the head? First write down what you think the respective probabilities are. Then do some experiments. Are you willing to update your estimates?
(2) The Bayesian Pair of Bolts
Suppose that a "pick and place" robot with two manipulators has been designed to pick two bolts of the same size from a bin and place each pair in separate containers on a belt for assembly. Suppose also that there are three different kinds of pairs of bolts that are used in the assembly of this product:
Pair 1 two bolts both made of alloy A
Pair 2 two bolts both made of alloy B
Pair 3 one of alloy A and the other of alloy B
The alloys look identical and the only way to determine whether one is A or B requires performing a test. Suppose that one of the pairs of bolts falls on the ground and the operator picks it up. The probability of getting any specific pair is 1/3. Now suppose she randomly picks one of the bolts from the pair, performs a test on it and determines that it is made of alloy A. What is the probability that the other bolt is of alloy A?
Hint: A common erroneous answer to this question is 1/2. But this is incorrect! It is important to represent the events properly.
Define the following events:
P1: The event that pair 1 was picked.
P2: The event that pair 2 was picked.
P3: The event that pair 3 was picked.
A1: The event that the first member of the pair is of alloy A.
A2: The event that the second member of the pair is of alloy A.
B1: The event that the first member of the pair is of alloy B.
B2: The event that the second member of the pair is of alloy B.
What we want to know is Pr(A2|A1,S). We know that Pr(P1|S) = Pr(P2|S) = Pr(P3|S) = 1/3. Using Bayes' theorem and conditional probability find Pr(A2|A1,S).
(3) The Curious Competitor
Of three finalists in a design competition, named A, B, and C, one will be announced the official winner the next day. The panel has met and the Judge has made his final decision. Competitor A assigns a probability of 1/3 to the event that any one of the three competitors will win. Being curious, Competitor A tries to get some information out of the Judge without asking directly. He knows that regardless of who wins, at least one of the other two competitors will be losers (both if A is the winner). He asks the Judge to give him the name of one of the losers. The Judge, however, refuses to grant the request, arguing that if A knew which of the other two was a loser, the probability that A would be the winner would increase to 1/2 instead of 1/3. Do you agree with the Judge's argument? If so use conditional probability to verify his statement. If not use the axioms and theorems of probability and event algebra to find the probability that A is the winner given that the Judge names B or C as one of the losers. Assume that if both B and C are losers, the Judge would be equally likely to pick either one.
Hint: In problems of this kind, it is important to define precisely the events. It may be helpful to define the following events:
A = Competitor A wins
B = Competitor B wins
C = Competitor C wins
Ja = Judge says B loses
Jb = Judge says C loses
Heckerman, D., "Probabilistic Interpretations for MYCIN's Certainty Factors," Uncertainty in Artificial Intelligence, Kanal, L. and Lemmer, J., Eds., North Holland, 1986.
Howard, R.A., "Probability", Chapter 38 in Mathematics Associated with Systems Engineering, pp. 3-47, Cambridge, MA: MIT Press.
Raiffa, H., Decision Analysis, Menlo Park, CA: Addison-Wesley Publishing Co., 1970 (Chaps. 1 &2).
Savage, L.J., The Foundations of Statistics, New York: Wiley, 1954.
Shortliffe, E.H. and B.G. Buchanan, "A model of Inexact Reasoning in Medicine," Mathematical Biosciences, No. 23, 1975, pp. 351-379.
Siddal, J. N., Probabilistic Engineering Design, New York: Marcel Dekker, Inc, 1983, Chapter 2: "Concepts and Theorems of Probability", pp-5-40.
| [ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat] |
Last updated: 8 March 99