ME290M, Spring 1999

ME290M
Expert Systems in Mechanical Engineering

Spring 1999, T-Th 12:30-2:00 pm
1165 Etcheverry Hall, Course Control No. 56369 http://best.me.berkeley.edu/~aagogino/me290m/s99


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]


Discrete Random Variables

In order to automate reasoning under uncertainty in expert systems on digital computers, the axiomatic theory of event algebra and probability theory presented in the previous chapters must be expressed in discrete mathematics. The concepts of outcome sample spaces and discrete probability mass functions will be presented and related to event algebra, influence diagrams, probability trees, and conditional probability. Expectation and higher order moments of probability mass functions will also be introduced.

1 Outcome Sample Spaces

It is useful in performing probability manipulations to define the random events "Ei" as unions of elementary mutually exclusive and collectively exhaustive events. Outcome sample spaces are analogous to Venn diagrams where the universe I, is made up of these elementary mutually exclusive and collectively exhaustive events. The probability of any event Ei is then the sum of the probabilities (due to the mutual exclusivity) of the elementary events which comprise Ei.

2 Example of the Use of Sample Space Representation

Assume that we are to roll two "fair" talus bones; one is the heel bone of a sheep, the other is a heelbone of a deer. If you recall from Chapter 10, the talus bones may have been one of the first "dice" used by humankind. Each one has four sides, and if it is "fair" the likelihood of any one side coming up on a roll is equally likely.

Let us further assume that each side of the talus bone has dots on it one dot, two dots, three dots and four dots (similar to a conventional die but with four sides instead of six).

Let us try an experiment in which both the sheep and the deer talus bones are thrown randomly on the ground. Let us define the following events:

Si = the number of dots on the up side of the sheep talus bone.

Dj = the number of dots on the up side of the deer talus bone.

If we assume that the throw of the sheep bone is independent of the throw of the deer bone, then Pr(SiDj ) = Pr(Si )Pr(Dj ).

The sample space for the joint events can be represented by the grid in Figure 1. The joint events DiSj (i,j = 1, 2, 3, 4) represent the 16 possible outcomes independent rolls for each talus bone. The probability of each event is equal given that we are rolling "fair" talus bones (the likelihood of getting any side is equal to 1/4 for each bone). Thus if the talus bones are "fair", the probability associated with each sample point " DiSj" in Figure 1 is

Pr(DiSj ) = Pr(Di ) • Pr(Sj ) = (1/4)(1/4) = 1/16

Figure 1: Sample Space for Roll of Two Talus Bones

Define the state "E" as the sum of the numbers of two rolls and state "F" as the product of the numbers of two rolls. Suppose we are interested in the following two possible outcomes of each state:

E4 = the event that the sum of the two numbers that are face up is four.

F<10 = the event that the product of the two numbers that are face up is strictly less than ten.

The sample space "E" for the sum of the dots on the two talus bones is shown in Figure 2. The elementary events in which the sum is four (event E4) are filled in with dots.

Because the events are mutually exclusive, the probability of event E4 (that the sum of the number of dots facing up is four) is 1/16 times the number of ways a sum of four can occur. As shown in Figure 2, there are three events that sum to four and thus:

Pr(E4) = 3(1/16) = 3/16

Figure 2: Sample Space of "E", the Sum of Dots

The sample space for the product of the number of dots on the two talus bones is shown in Figure 3. The elementary events in which the product is strictly less than ten (event F<10) are filled in with dots.

Figure 3: Sample Space of Product of Number of Dots

Because the events are mutually exclusive, the probability of event F<10 (the event that the product of the number of dots facing up is strictly less than ten) is 1/16 times the number of ways a product less than ten can occur. As shown in Figure 3, there are thirteen events with a product less than ten and thus:

Pr( F<10 ) = 13(1/16) = 13/16

What if the talus bones were biased? Suppose the probability of getting four dots is higher than getting three and so on as defined by the probability distribution below:

Pr(Si) = Pr(Di) =

2i-1
------------------
16

The associated discrete probability distribution for each roll is the same for each bone and is given in Figure 4.

Because these events are mutually exclusive and collectively exhaustive, the sum of the probabilities of all events Si or Di equal one. Consider the sheep talus:

Si=1,4 Pr(Si) =
1 + 3 + 5 + 7
--------------
16

= 16/16 = 1

Figure 4: Probability Mass Function on Si or Di

The joint probabilities will no longer be 1/16 for each point in the sample space, as with the "fair" talus bones, but will be as shown in Figure 5:

Figure 5: Joint Probabilities, Pr(Si, Di) x 256

3 Influence Diagram Representation

How would one represent events E4 and F<10 in influence diagram form? There are many possible and consistent representations. If you find it natural to think of the sum of the outcomes of the tosses to be influenced by the outcome of each individual toss, state E would be represented in the diagram in Figure6 and expressed mathematically by the following expansion:

Pr(E ) = Pr(Sum(Di and Sj) ) =Pr( Sum(Di and Sj) | Di ,Sj ) • Pr( Di | Sj) • Pr(Sj ) [Equation 1]

Because Di and Sj are assumed to be independent, Equation 1 reduces to the following:

Pr(E) =Pr(Sum(Di and Sj)) =Pr( Sum(Di and Sj) | Di ,Sj ) • Pr( Di) • Pr(Sj ) [Equation 2]

Figure 6: Influence Diagram of Sum of Talus Bones Tosses

We could make an analogous argument for representing the product state "F", the probability distribution of the product of the outcome of two tosses. Event F<10 (the event that the product of the two numbers that are face up is strictly less than ten) would be one possible outcome of that state space.

Pr(F) = Pr(Product(Di and Sj) )

=Pr( Product(Di and Sj) | Di ,Sj) • Pr( Di | Sj) • Pr(Sij) [Equation 3]

Because Di and Si are assumed to be independent, Equation 1 reduces to the following:

Pr(F ) = Pr(Product(Di and Sj) )

= Pr( Product(Di and Sj) | Di ,Sj) • Pr( Di )• Pr(Sj ) [Equation 4]

Figure 7:Influence Diagram of Product of Talus Bones Tosses

Note that there is no arc between states D and S in both Figures 6 and 7 representing the assumption that the two events are independent.

How do we represent both events E and F in the same influence diagram? Both diagrams show Di and Sj as being independent, which would remain unchanged for the combined diagram. But should there be an arc between E and F? The question to be answered is: given we know the outcome of Di and Sj, would knowing the product give us any new information about the sum and vice versa. The answer is NO, and thus there is no arc between E and F. E and F are said to be conditionally independent of each other given Di and Sj. The lack of an arc in Figure 8 reveals this conditional independence graphically.

Figure 8: Influence Diagram of Sum and Product

4 Probability Tree Representation

The probability tree corresponding to the sample space in Figure 1 is given in Figure 9.

Figure 9: Probability Tree of Talus Toss

5 Expectation of a Random Variable

E(x|H)=S [xi Pr(xi)]
"xi

or E(x|H) = <x|H>

6 Probability Mass Function Pr(x)

The probability that the random variable x takes on the discrete value xo is defined by means of the probability mass function Pr(x) with the notation below:

Pr(x) = Pr(x=xi) for all xi

7 Cumulative Probability Distribution Pr(x L.E. y|y)

Pr(x L.E. y|y) = S Pr(x, for all x L.E. y) = P (x L.E. y |y) = Pr(x L/E. y|y=yi) for all yi

8 Complementary Cumulative Distribution

Pr(x>y|y) = 1 - Pr(x L.E. y|y)

9 Examples

Examples of the probability mass functions and corresponding complementary and cumulative probability distributions for the "biased" talus bone are given on the following page.

 

Figure 10: Probability Mass Function

Figure 11: Cumulative Distribution

Figure 12: Complementary Cumulative Distribution

10 Expectation of a Function of a Random Variable

Let g(x) be any single-valued function of the random variable x. The expected value of g(x) is designated as E[g(x)] and defined as:

E[g(x)] = S g(xi)Pr(x=xi) = <g(x)>

11 Moments

Moments of probability distributions are a generalization of the concept of expected value. Moments can be used to define the characteristics of a probability mass function (discrete) or density function (continuous). The quantity mi is the ith moment about the origin.

The Kth moment about the origin of the random variable x is defined as the expected value of x**k:

E(x**k) = S (x**k) Pr(x=xi) " xi = <x**k>

11.1 First Moment: Mean

E(x) = S (x) Pr(x=xi) " xi = <x>

11.2 Second Moment: Variance about the origin

E(x**2) = S (x**2) Pr(x=xi) " xi = <x**2>

12 Kth Central Moment

Central moments of a random variable x are defined with respect to the mean <x>. The Kth moment about the mean of the random variable x is designated as ck and defined as the expected value of (x - <x>)**k:

E( (x - <x>)**k) = < (x - <x>)**k > = S (xi - <x>)**k Pr(x=xi ) " xi

12.1 First Central Moment: Mean about the Mean = 0

E( (x - <x>)) = < (x - <x>)>

=S (xi - <x>) Pr(x=xi ) " xi

=S (xi) Pr(x=xi ) " xi - (<x>) S Pr(x=xi ) " xi

= <x> - <x> = 0

12.2 Second Cental Moment: Variance (about the mean is implied)

E( (x - <x>)**2) = < (x - <x>)**2>

=S (xi - <x>)**2 [Pr(x=xi )] " xi

=S (xi)**2 [Pr(x=xi)] - 2<x>S (xi) [Pr(x=xi)] + <x>**2 S Pr(x=xi) " xi

= <x**2> - 2<x>**2 + <x>**2

= <x**2> - <x>**2 = s**2 (variance)

The standard deviation is the square root of the variance (about the mean is implied) and is designated as s.

12.3 Third Moment about the Mean: Skewness

The third moment is related to the symmetry of the probability mass function and is sometimes referred to as the skewness of the probability mass function. E( (x - <x>)**3) = < (x - <x>)**3 > = S (xi - <x>)**3 [Pr(x=xi )] " xi

The coefficient of skewness "a1" is a dimensionless parameter that is sometimes used to provide a relative measure of the third moment (skewness) with respect to the corresponding power of the second moment (variance):

c3
a1= -----------
s**3

13 Joint Probability Mass Functions: Pr(x,y)

Pr(x,y) = Pr(x=xi, y=yi ) for all values of xi and yi

It is difficult to graphically represent the joint probability mass function of two or more variables in two dimensions. Two possible graphical representations for two variables Di and Si are given in Figures 13 and 14 for the four dot talus toss problem presented in Section 2.

14 Marginal Mass Function: Pr(x)

The unconditonal probability mass function of the random variable x is sometimes referred to as the marginal mass function when the variable x can occur jointly with other random variables. Suppose we are given the joint probability mass function for random variables x and y. The marginal mass funciton for x would be the sum of the joint mass function over all the possible values of y:

Pr(x) = S Pr(x,y=yi ) " yi

For example, if we sum the joint probability mass function Pr(DiSi) in Figure 13 over all possible values of Di we will get the marginal mass function for Si.

Figure 13: Joint Probability Mass Function for Two Biased Talus Tosses

Figure 14: Alternate Representation for Joint Probability Mass Function for Two Biased Talus Tosses

15 Theoretical Discrete Probabiity Mass Functions

Although the emphasis in building expert systems is in the use of subjective assessments of uncertaintly, sometimes enough statistical data is available to justify the use of standard statistical distributions. One example of a named distribution is given below.

Binomial Distribution

There are only two outcomes with a binomial distribution sometimes referred to as "failure" or "success". A constant probability "p" is assumed on one of the outcomes throughout a sequence of trials. Each trial must be conditionally independent in the sense that given p, the probability of one of the outcomes is independent of the previous history of outcomes in the sequence. The binomial distribution is used to estimate the total number of successes or failures, out of a sequence of "n" trials.

Under these conditions, the probability of obtaining a total number of "r " outcomes out of "n" trials, each of which has a probability "p", is as follows:

n!
Pr(r | n,p) = ---------- pr (1-p)n-r
r! (n - r)!

16 Exercises

Consider an extension of the butterfly valve failure problem introduced in Week 8. Let us add another failure mode, "O", an obstruction in the pipe. Suppose that F = event that the valve is not open as evidenced by a human observer, and all of the possible causes of this failure are: P = power supply is faulty, S = switch is faulty, and O = obstruction in the pipe.

A review of the maintenance data gives the following frequency estimates of the probability of a failure in any particular day:

Pr(P ) = 1/100

Pr(S ) = 1/50

Pr(O ) = 1/20

An expert who is familiar with these valves gives you the following rules:

(1) If the power supply or switch is faulty, the valve will not open. This is true regardless of whether anything else has failed. Thus Pr(F |P) = Pr(F |P,O) = Pr(F| S) = Pr(F| S,O) =1.

(2) If there is an obstruction in the valve inlet or outlet pipe, the valve will not open. Thus Pr(F | O) =1.

(3) The only events that could cause the valve to fail are a fault in the power supply or switch or an obstruction in the pipe. Thus Pr(F | O'P'S') = 0.

(4) If the power supply is faulty, it may cause a power surge that could disable the switch. The expert gives the following subjective estimate of this event (switch fails given the power fails) to be 40% and will not be any different whether or not it is known that an obstruction occurs. This implies that Pr(S | P) = Pr(S | P,O). = 0.4.

(5) A failure in the power supply has no bearing on whether there is an obstruction in the pipe and vice versa. This is a statement of independence between P and O. Thus Pr(P | O) = Pr(P) and Pr(O | P) = Pr(O ).

(6) A failure in the switch has no bearing on whether there is an obstruction in the pipe. However, our expert is not sure that the reverse is true. This is a statement of limited independence between S and O. Thus Pr(O | S) = Pr(O).

A Venn diagram is given in Figure 15. Note that the Failure events P, S, And O are collectively exhaustive, but not mutually exclusive. The first step is to separate the events into N mutually exclusive events Ei.

Define the following collectively exhaustive and mutually exclusive failure events Ei:

E1 = SPO The switch and power both fail and there is an obstruction

E2 = SPO' The switch and power both fail, but there is no obstruction

E3 = SP'O The switch fails but the power does not and there is an obstruction

E4 = SP'O' The switch fails, but the power does not and there is no obstruction

E5 = S'PO The switch does not fail, but the power does and there is an obstruction

E6 = S'P'O The switch and power do not fail, but there is an obstruction

E7 = S'PO' The switch does not fail, but the power does and there is no obstruction

QUESTIONS:

(1) Draw an influence diagram with the state variables: "f", failure state; "s", switch; "p", power; and "o", obstruction.

(2) Find the probability of each of the possible mutually exclusive events listed above, using the theorems of probability.

(3) What is Pr(P | F), Pr(S | F), and Pr(O | F)?


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]

Last updated: 8 March 99
Send Comments to: Alice Agogino, aagogino@me.berkeley.edu
Copyright © 1999 Alice Agogino; All Rights Reserved.