ME290M, Spring 1999, Week 1

ME290M
Expert Systems in Mechanical Engineering

Spring 1999, T-Th 12:30-2:00 pm
1165 Etcheverry Hall, Course Control No. 56369 http://best.me.berkeley.edu/~aagogino/me290m/s99


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]


Introduction to Artificial Intelligence

 

1 Historical Perspective of Machine Intelligence

The nature of intelligence and its automation are controversial subjects. The sometimes conflicting points of view have led to heated debates in diverse circles of: computer scientists, mathematicians, neurobiologists, philosophers, educators, and engineers. I prefer to think of computers as tools of human invention, not all that different in kind from other human artifacts. Human cultural evolution is marked by the ingenious methods we have developed to extend the limitations of our biology. Automobiles and airplanes act as extensions of our legs. Telescopes and radar extend the limitations of our eyes. Ultrasound and acoustic emission sensors extend our hearing beyond the audible range. Our cultural and social organizations allow human civilization to attempt massive projects on a planetary level. We have also developed tools to extend our ability to process information. Oral traditions and the written word allowed us to transfer our knowledge to future generations. Many civilizations independently developed simple devices for computation. The Chinese developed the abacus. The Incas used knotted ropes for numerical computation. The Indians gave us the so-called Arabic numerals, a welcome improvement over Roman numerics for computation. The Renaissance mathematicians expanded the realms of geometry, algebra, and number theory, along with the need for faster and more accurate long calculations. The Renaissance was an age of machinery: intricate clockwork, pumps, and gadgetry. The European upper classes were amused by mechanical novelties, such as windup toothpick dispensers and singing birds. Philosophers in trying to explain the natural world, often used these technologies to explain human thought and questioned whether machines could be made to think. Prior to the twentieth century, several inventors and mathematicians made attempts to transform these ideas into a working calculating machine.

One of the first known inventors to take up the challenge was Wilhelm Schickard, a seventeenth century German professor of mathematics and astronomy and friend of Johannes Kepler. He appears to have succeeded in building a mechanical calculator that could add, subtract, multiply, and divide large numbers. Before Schickard was able to send Kepler his calculating machine, a fire broke out in 1624 where the calculator was being assembled and the device was destroyed. Eventually a working copy was reconstructed from Schickard's notes. By that time Schickard was dead of one of the plagues that scourged Europe during the Thirty Years' War. Further development of his calculator appears to have died with him.

Blaise Pascal, the French mathematician and philosopher, independently designed and built an adding machine that he called the arithmétique. It consisted of a brass box about the size of a loaf of bread, with eight dials on its face that represented decimal places. Although built later in time, Pascal's machine was less sophisticated than Schickard's and could only add and subtract. More importantly, it was so expensive that no one wanted to buy it. Pascal was not of the opinion that human intelligence could be entirely automated and questioned the limits to mechanical analytical reasoning as illustrated in his famous quote:

The heart has its reason that reason does not know.

Pascal is also credited for developing probability theory - a subject to be covered in another chapter.

Three decades later, Gottfried Wilhelm Leibnitz, best known as the inventor of calculus, extended Pascal's calculator to include multiplication and division. He hated the tedium of calculation and grumbled: It is unworthy of excellent men to lose hours like slaves in the labor of calculation that could be safely relegated to anyone else if machines were used [Ritchie:57]. Unfortunately, the economics of Europe had not changed since Pascal's time and the Leibnitz wheel was considered only a toy of the very rich. His key insight was not credited during his lifetime: that given a formal language of thought (specifically, calculus) one could construct a thinking machine to replicate this process. Leibnitz died in 1716, neglected by his countrymen and overshadowed by Newton. The only person to show up for Leibnitz funeral was his private secretary. But Leibnitz insight inspired other philosophers and inventors to construct thinking machines.

Perhaps the most successful mechanical contribution to the history of computers came from the unlikely industry of textile manufacturing. Weaving requires arranging thousands of tiny threads in a precise pattern by means of a repetitive sequence of operations. Joseph Jacquard had the idea of building a weaving machine that could work more efficiently, tirelessly and accurately than human workers. The basis of his idea was the use of a program, a set of instructions for weaving patterned cloth. The program was executed via punched cards that were strung together to form an endless belt that passed over the hooks that did the actual weaving. The holes in the cards determined which warp threads were hooked and pulled downward so that when the shuttle of the loom passed across, the proper threads were crossed over other threads, thus creating the desired woven pattern. Jacquard's loom was introduced in 1805 and was immediately successful as a weaving machine and also formed the basis for the punched card programs, called Hollerith cards after their inventor Herman Hollerith, used in early computers of the twentieth century.

In the following decades, Londoner Charles Babbage became obsessed with the immense potential of computing machines. He conceived the Analytical Engine, a machine that consisted of thousands of intricate geared interlocking cylinders. The "engine" was to be steam powered with two major components: (1) a "mill" to carry out arithmetical operations and (2) a "store" to record the variables and answers to the calculations. He planned to use the Jacquard punch-card system to supply the instructions to both components. The Analytical Engine was remarkably similar to the architecture of today's computers with programs instead of punched cards, processing units as the "mill" and memory units for the "store". Babbage's brilliant friend, Lady Ada Lovelace (daughter of Lord Byron) poetically commented that

. . . the Analytical Engine weaves algebraic patterns just as the Jacquard-loom weaves flowers and leaves.

Although Lady Lovelace is often credited with the idea of using machines for non-numeric symbolic manipulations such as music composition, she also the first to predict the limitations of artificial intelligence:

The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.

Unfortunately, precision machining and steam power were not advanced enough to allow the realization of Babbages's dream. The invention of electronic computers in the following century provided the technology for machine intelligence to become a possibility.

 

2 Computer Intelligence

 

There are many definitions of artificial intelligence (AI) in the literature. One somewhat circular definition often given is that artificial intelligence is the study of intelligent behavior. The term artificial intelligence as a discipline was coined by John McCarthy in naming the first conference on the subject at Dartmouth College in 1956 [Charniak & McDermott: 1985, p.10,11] . McCarthy proposed:

. . . a two-month, ten-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

McCarthy was reported to have been deeply disappointed in the conference (Grevier [1993: 49]). He had hoped for consensus as to what this new emerging field was and where it was going. None of this happened and most of the researchers left and followed their own directions. One lasting outcome of this workshop was the acceptance of McCarthys's preferred expression for this new area of study – artificial intelligence.

Marvin Minsky's [1968: v] definition is: artificial intelligence is the science of making machines do things that would require intelligence if done by men. Elaine Rich [1983] defines artificial intelligence as the study of how to make computers do things at which, at the moment, people do better. Although these may appear to be ephemeral definitions, the slow progress of computers to outperform people makes them less problematical than they might appear at first glance.

Margaret Broden [1977: 4-5] takes a more humanistic view. She relates:

My own interests in artificial intelligence are biased toward its potential for counteracting the dehumanizing influence of natural science, for suggesting solutions to many traditional problems in the philosophy of mind, and for illuminating the hidden complexities of human thinking and personal psychology. . . . By "artificial intelligence" I therefore mean the use of computer programs and programming techniques to cast light on the principles of intelligence in general and human thought in particular. In other words, I use the expression as a generic term to cover all machine research that is somehow relevant to human knowledge and psychology, irrespective of the declared motivation of the particular programmer concerned.

From an engineering perspective, I prefer to think of artificial intelligence as providing the tools for expanding the role of computers; from numerical processors to more general symbol processors. Although numbers are limited kinds of symbols, so are concepts, ideas, names, etc. that are part of human thought and intelligence. Simon's [1981] physical symbol hypothesis states that (1) human thoughts correspond to language, (2) language can be captured in symbols, and (3) by manipulating symbols on a computer, one can simulate human thought processes [Parsaye & Chignell: 6].

In fact, it was the challenge of the physical symbol hypothesis that motivated early AI researchers to solve complex problems by modeling and solving them in symbolic programming languages. Some of the first problems to be studied by AI researchers were game playing and theorem proving problems. Samuel in 1963 published his checkers-playing program that not only played games with opponents but also used its experience at this games to improve its later performance. The Logic Theorist [Newell, Shaw & Simon: 1963] was an early attempt to prove mathematical theorems. Game playing and theorem proving shared the property that although people who did them well were considered to be displaying intelligence, it appeared that computers could solve them adequately simply by being fast at exploring a large number of solution paths and then selecting the best one. It seemed that this process required very little knowledge and could be easily programmed. As it turned out, this assumption turned out to be false. No computer is fast enough to overcome the combinatorial explosion generated by such problems. For example, a chess board can be modeled as an 8x8 grid with approximately 10070 board positions. Even with the fastest supercomputers available today, the optimal minimax solution of this game solved entirely in software would take over 50 years of CPU time. Even the simple game of tic-tac-toe requires approximately 105 game positions. Yet it can be easily mastered by children. Obviously humans use techniques other than exhaustive search.

 

3 AI Research Areas

 

The following list gives examples of problems that fall within the scope of artificial intelligence:

 

4 AI Techniques

 

As pointed out by AI critics (e.g., [Dreyfus : 1979] and [Dreyfus&Dreyfus:1986]), the predictions of early AI researches are far from coming true. One of the few hard results to come out of the first 20 years of AI research is that intelligence requires knowledge. Knowledge is difficult to encode because it is (1) voluminous, (2) difficult to characterize or quantify, and (3) constantly changing.

An AI technique is a method that exploits knowledge in such a way that:

In the 20 years of research limited implementations of the first requirement were forthcoming. In order to implement the research into something practical, the idea of limiting the generality to specific application domains led to what is presently called expert systems.

 

5 Problem Solving and Problem Characteristics

 

The first step in solving a problem is understand it and identify its characteristics. A few problem characteristics are listed below. We will discuss problem characteristics in more detail as we get into specific applications.

 

5.1 Problem Characteristics

 

  1. Is the task one of classification (C) and derivation or of synthesis (S)?
  2. Decomposable - Can it be broken into smaller parts?
  3. Reversibility - Can solution steps be ignored or undone?
  4. Deterministic (D) or Uncertain (U) Response to Control?
  5. Lack of precision in definition of states - are the states fuzzy (F)?
  6. Combinatorial complexity?

How would you classify the following problems?:

 

5.2 State Space Representation

 

A state space representation provides a framework for modeling and communicating problems and their solution. A representation in a state space is a model of the world, which can vary from the simple to the complex. In fact, the complexity of a problem is often defined by how we represent it. Operators or rules are legal moves that describe the possible transitions between states. The initial state is defined from specification from which the solution process begins and the goal state defines the specifications for a successful solution. The initial state is the root node of the tree. At any state an operator can be applied and the given node is expanded. This node becomes a parent node and the successors it generates upon application of the operator are called children or successor nodes. Thus root nodes are the only nodes with no parents (usually there is only one initial state or root node). A node that has no successors or children is called a leaf node.

Most complex problems allow for a range of operators and the number of successor states generated from the initial state can be quite large. The process for looking for a sequence of moves is called a search. A discussion of AI search strategies will be discussed later in this chapter But first consider a standard AI test problem, the 8-Square Puzzle problem.

 

The 8-Square Puzzle Problem

 

The 8-square puzzle is a square tray in which are placed 8 square tiles. The remaining ninth square is uncovered . Each tile has a number on it. A tile that is adjacent to the blank space can be slid into that space. A game consists of a starting position and a specified goal position.

8-Square Puzzle

Suppose we represent each square and what number is in it as a state. There are 9! combinations (about 400,000). This would be a data intensive approach. Although the 8-square problem is solvable using this representation, the larger 15-square version is not. Elaine Rich [1983: 7-11] gives various algorithms and representation schemes for solving this problem. However, before looking them up, first try the following exercises.

 

Thought Exercises

 

  1. What control strategies might you apply?
  2. Random moves?
  3. How would a different representation simplify the problem?

 

6 Rule-Based Programming and Production Systems

 

A rule is a declarative piece of knowledge that is often of the following form:

IF <circumstances>

THEN <do action, or conclude something>

We will go into more detail concerning both representation issues and rules of inference for rule-based systems in later chapters. We will, however, introduce the special case of a production system in order to discuss some important concepts associated with search techniques in AI systems.

A production system consists of:

(1) One or more databases that contain whatever information is appropriate for the particular task. They are typically a kind of working memory that represents the current state of the system.

(1) A set of rules, each consisting of a left side (a pattern) that determines the applicability of the rule, and a right side that describes the action to be performed if the rule is applied. They are of the form: <working memory pattern> --> <working memory change>.

(3) A control strategy that specifies the order in which the rules will be compared to the database and a way of resolving the conflicts that arise when several rules match at once.

The production system inference engine searches for production rules whose left-hand sides match patterns in working memory. One rule is chosen for action and the working memory is changed accordingly. In the following sections, the classic water jug problem will be used as a working example of a production system and various search strategies will be introduced in order to solve the problem.

 

7 Search Control Strategies

 

Given a particular state, what rule should be applied to perform the next step? There are many ways humans and thus computers solve problems. What are some rules of thumb concerning requirements for a good control strategy?

  1. It should recognize the goal state.
  2. It should cause some motion.
  3. It should be systematic (For example, in the 8-square problem, we could make random moves. We would eventually reach the solution, but we would probably perform many more steps than should be necessary.)

It is this last requirement that motivates the use of a problem-solving metric of performance. One is usually interested in minimizing the number or costs of steps required in finding a solution – this is referred to as the path cost. But it takes time to perform a search and thus the notion of the search cost is also introduced to define the time and memory required to find a solution. The total cost of the search is the sum of the path cost and the search cost. In many cases it may not be "optimal" to find the solution with the lowest path cost if it will take substantial resources to find it. A suboptimal, but "good" solution might be preferred if its search cost is substantially lower. Explicit recognition of this trade-off is called intelligent real time problem solving, where the cost of the solution processing in time and other resources is explicitly balanced against the benefit. (See Bradley & Agogino [1994] for application of this concept to the design process in which they coin the expression intelligent real time design).

Russell & Norvig [1995: 73] list the following criteria: for identifying the right search or control strategy for a particular problem:

Completeness: is the strategy guaranteed to find a solution when there is one?

Time complexity: how long does it take to find a solution?

Space complexity: how much memory does it need to perform the search?

Optimality: does the strategy find the highest quality solution when there are several different solutions.

 

7.1 Water Jug Problem

 

You are given two jugs, a 4-gallon one and a 3-gallon one. Neither has any measuring markers on it. There is a pump that can be used to fill the jugs with water. How can you get exactly 2 gallons of water into the 4-gallon jug?

State Space Representation: The states are represented by (S x y), where x represent the number of gallons of water in the 4-gallon jug (x=0,1,2,3, or 4) and y represents the number of gallons of water in the 3-gallon jug (y=0,1,2 or 3). The starting state is (S 0 0) and the goal state is (S 2 n) for any allowable value of y=n because the problem does not specify how many gallons need to be in the 3-gallon jug.

Production Rules: The production rules for the Water Jug Problem in prefix notation are used to initiate an action for changing the states of the water jug: filling the jugs, emptying the jugs, and pouring water from one jug to the other. The prefix expression (IF A B) means that B can be activated if A is satisfied. The "AND" operator is a logical operator requiring that all arguments must be satisfied for the AND expression to be satisfied.

 

  1. (IF (< x 4) (S 4 y)) Fill the 4 gallon jug
  2. (IF (< y 3) (S x 3)) Fill the 3 gallon jug
  3. (IF (> x 0) (S 0 y)) Empty the 4 gallon jug on the ground
  4. (IF (> y 0) (S x 0)) Empty the 3 gallon jug on the ground
  5. (IF (AND (³ x+y 4) (> y 0)) (S 4 y-(4-x))) Pour water from the 3 gallon jug into the 4 gallon jug until full. (IF (AND (³ x+y 3) (> x 0)) (S x-(3-y) 3)) Pour water from the 4 gallon jug ;into the 3 gallon jug until full.
  6. (IF (AND (² x+y 4) (> y 0)) (S x+y 0)) Pour all of the water from the 3 gallon jug into the 4 gallon jug.
  7. (IF (AND (² x+y 3) (> x 0))(S 0 x+y)) Pour all of the water from the 4 gallon jug into the 3 gallon jug.

A Few Assumptions:

1. You can fill the jug from the pump.

2. We can pour water out of the jug onto the ground.

3. We can pour water from one jug to another.

4. We only allow positive quantities of water in the gallon.

5. more?

 

7.2 Control or Search Strategy

 

A control or search strategy specifies the order in which the rules will be compared to the database and a way of resolving the conflicts that arise when several rules match at once. Here are some attributes that a control strategy should have:

(1) There should be a starting state. In this case the initial state is given as (S 0 0).

(2) It should recognize the goal state. Check at each iteration whether the goal state (S 2 n) has been reached.

(3) There should be some gain at each iteration. This should be obvious. Why make a move if there is no hope of improvement. Suppose we choose the strategy of starting each time at the top of the list and choosing the first one that we can activate.

1: (S 0 0) Initial state

2: (S 4 0) Apply rule 1

3: (S 4 3) Apply rule 2

5: (S 0 3) Apply rule 3

6: (S 4 3) Apply rule 1

7: (S 0 3) Apply rule 3

8: (S 4 3) Apply rule 1

Using this strategy, we would never solve the problem. We would continue indefinitely filling and unfilling the 4 gallon jug with water.

(4). It should be systematic. Consider the control strategy of choosing randomly among the applicable rules. This strategy is better than the first because it causes motion. It also will eventually lead to a solution. However, we are very likely to arrive at the same state several times during the process and intuitively we can see that the strategy may not be very efficient in general (although we may get lucky some times).

 

7.3 Uninformed Search Strategies

 

Uninformed search strategies (Russell & Norvig [1995: 73]) are those in which no information about the number of steps or the path cost from the current state to the goal state is known. This is sometimes also called blind search or weak search.

 

7.3.1 Breadth-First Search

 

Consider the following strategy that meets the rules discussed:

Construct a tree with the initial state as its root. Generate all the offspring of the root by applying each of the applicable rules to the initial state. This represents one level of what is called a breadth-first search.

Tree Representation: The next step of a breadth-first search is to generate all possible branches to each leaf of the tree and so on. All the nodes at depth d in the search tree are expanded before the nodes at depth d +1. The diagram below shows the second level (d=2) of a breadth-first search. Continue this process until some rule produces the goal state.

 

Two Levels of a Breadth-First Search

 

A breadth-first search procedure is guaranteed to find a solution if one exists, provided that there are a finite number of branches of the tree. In addition, it can be shown that if the solution exists, a breadth-first search is guaranteed to find the shortest path to the goal, assuming that there is an equal cost to transversing each arc in the tree.

On the other hand, breadth-first search will be inefficient compared to depth-first search (next section) if there are many paths that lead to a solution, but each of them is relatively long. In addition breadth-first search can require a tremendous amount of memory. The number of nodes at each level of the tree increases exponentially with the level number and must be all be stored.

The branching factor is defined as the average number of nodes that can be reached directly from a single node. Suppose the branching factor for a problem is constant as is the path cost, at b nodes expanded at each level. Thus b2 would be generated at the second level, b3 at the third level, and bn at the nth level. Thus if n was the shortest path length (the one that a breadth-first search is guaranteed to find) then the maximum number of nodes expanded would be:

1 + b + b**2 + . . . b**n

Russell & Norvig [1995: 75] give an example in which the branching factor is 10 and the assuming a cost of 1000 nodes/second and 100 bytes per node. In this case it would take thirty-five years to reach a solution if the depth is twelve. But the memory requirements are an even bigger problem for breadth-first than the execution time. Very few systems have the luxury of storing the ten gigabytes needed for this search at the eighth level.

Depth

Nodes

Time

Memory

0

1

1 millisecond

100 bytes

2

111

.1 seconds

11 kilobytes

4

11,111

11 seconds

111 megabyte

6

106

18 minutes

111 megabytes

8

108

31 hours

11 gigabytes

10

1010

128 days

1 terabyte

12

1012

35 years

111 terabytes

14

1014

350 years

11,111 terabytes

Time and Memory Requirements for Breadth-First Search

(Russell & Norvig [1995: 75])

 

7.3.2 Depth-First Search

 

Suppose on the other hand, we pursue a single branch of the tree until we can go no further or until some pre-specified depth has been reached before we pursue another branch. This is called a depth-first search . Consider an algorithm that starts at the beginning of the database and activates the first possible rule, starts at the beginning again and activates the first possible rule that leads to a new state. Such an algorithm would lead to the solution given in the figure on the following page. The sequence of rules activated is: [1,2,3,7,2,5,3,7].

Can you think of a more efficient solution in terms of minimizing the number of rules activated? How did you find this solution?

Depth-first search algorithms are often easy to implement and have very modest memory requirements. They need only store the single path from the root to a leaf node, along with the remaining unexpanded sibling node for each node on the path. For a state space with a branching factor b and maximum depth m, depth-first search requires storage of only bm nodes. Compare this to the exponential storage required for the breadth-first search. Thus from the example given in the breadth-first from Russell & Norvig [1995: 77] for the case where d=12 the depth-first search would require 12 kilobytes instead of 111 terabytes, a factor of 10 billion times less in space. However, one risks the danger of spending a lot of time exploring long but unfruitful paths before a successful one is discovered. Some problems have very long or infinite search trees. Thus, in general, depth-first search is neither complete or optimal.

Depth-First (with no repetition) Search

7.3.3 Depth-Limited Search

 

On variant of depth-first search is depth-limited search in which a cut-off is imposed on the maximum depth a path is allowed to take. As with depth-first search any solution found is not guaranteed to be optimal. Furthermore, if the depth limit is too small, it is possible that the search will not even be complete, i.e., a solution may not be found even if one exits.

 

7.3.4 Depth-First Iterative-Deepening Search

The iterative-deepening strategy combines features of both breadth-first and depth-first search. The algorithm exhaustively performs a depth-limited search for every possible value of a depth limit, starting at depth 0, then depth 1, and so on. It is optimal and complete yet has the modest memory advantages of depth-first search. Although it might seem wasteful as it expands states many times, since the trees are exponential the repeated searches are performed on the smallest part of the tree. The time complexity is O(bn ) and the space complexity is O(bn), where n is the depth of the shortest solution path. For problems with a large search space and an unknown solution depth, iterative-deepening is the preferred search method (Russell & Norvig [1995: 80]).

 

7.3.5 Generate-and Test

 

Generate-and-test is a depth-first procedure that has been proposed in various forms for solving complex problems. Simmons and Dixon [1983] describe a variant called Design-Redesign for use in mechanical design problems. A simplistic version of generate-and-test is given below.

1. Generate a possible solution.

2. Test to see if this is actually a solution by comparing the chosen point or the endpoint of the chosen path to a set of acceptable goal states.

3. If a solution has been found, quit. Otherwise, return to step 1.

If the generation of possible solutions is done systematically, then this procedure will find a solution eventually, if one exists. Unfortunately, if the problem space is very large, this may take a very long time.

This search procedure is sometimes called the British Museum Algorithm, a reference to the fact that if a sufficient number of chimpanzees were placed in front of typewriters and left alone long enough, they would generate all the books the museum contained.

 

7.3.5 Pruning Out Repeated States

 

Although it is impossible, in general, to prevent reaching the same state twice, search algorithms can be modified to stop the search once a repeated state has been reached. This can be performed in a number of ways such as (1) not allowing paths that cause cycles or (2) not expanding states that have already been reached.

 

7.3.6 Constraint Satisfaction

 

A variation of generate-and-test is one called constraint satisfaction. The difference here is that one might not know the exact form of the goal state, except that certain constraints must be satisfied.

 

7.4 Informed Search Strategies

 

Informed search strategies (Russell & Norvig [1995: 73]) are those in which some information about the number of steps or the path cost from the current state to the goal state is known. One form of informed search is heuristic search. These strategies require some kind of evaluation function that returns information concerning the desirability of following a certain path.

 

7.4.1 Hill-Climbing Procedure

 

Hill climbing is a variant of generate-and-test in which feedback from the test procedure is used to help the generator decide which direction to move in the search space. In a pure generate and test procedure, the test function responds with only a yes or no. But if the test function is augmented with a heuristic evaluation function that provides an estimate of how close a given state is to a goal state, the generate procedure can exploit it as shown in the procedure below.

  1. Generate the first proposed solution in the same way as any generate-and-test procedure. See if it is a solution. If so , quit. Else continue.
  2. From this solution, apply some number of applicable rules to generate a new set of proposed solutions.
  3. From each element of the set do the following:
  4. a). Send it to the test function. If it is a solution, quit.

    b). If not, see if it is the closest to a solution of any of the elements tested so far. If it is, remember it. If it is not forget it.

  5. Take the best element found above and use it as the next proposed solution. This step corresponds to a move through the problem space in the direction that appears to be leading the most quickly toward a goal.
  6. Go back to step 2.

This is analogous to hill-climbing in numerical optimization. As with numerical optimization, there are important events that could happen.

local maxima is a local peak but not as high as the highest mountain top.

plateau is a flat area of the search space, in which a whole set of neighboring states have the same value. On a plateau, it is not possible to determine the best direction in which to move by making a local comparison.

A ridge is an area of the search space that is higher than surrounding areas, but that cannot be traversed by single moves in any one direction.

 

How to avoid a nonoptimal stationary point?

1) Backtrack to some earlier node

2) Make a big jump

3) Start at a new point

Problems: Hill-Climbing is a local technique. You might find the right hill and never know that you have climbed the wrong mountain. It is not very useful if it is a problem with many hills (or in the words of optimization, if it has lots of local minima)

 

7.4.2 Branch-and Bound

 

See Dym & Levitt [1991: 53-56].

 

7.4.3 A* Algorithms: Minimizing the Total Path Cost

 

The A* algorithm minimizes the estimate of the total path cost, which is defined in two parts:

f(n) = g(n) + h(n)

where g(n) is the cost of the path to node n and h(n) is the estimated cost to the goal from node n. With the A* algorithm the strategy is to first expand nodes with the lowest value of f. If one can show that h(n) never overestimates the cost to reach the goal, then it can be shown that the A* algorithm is both complete and optimal. For more detail see Russell & Norvig [1995: 96-101] and Rich [1983: 78-84].

 

7.5 Forward Versus Backward Reasoning

 

We have been starting with the initial state and tried to activate rules in order to achieve our goal node. This is an example of forward reasoning. With forward reasoning, we want to design a computer program that will match the left side of the rules with the root node (or initial state) and use the right hand side to create a new "root" state.

We could have started with the goal and worked backwards, keeping track of the sequence of rules that leads to the initial state. That would be an example of backward reasoning. For backward reasoning, we want to design a computer program that will match the right hand side of each rule with the goal and find new "goal" states. Continue the process until the initial state is generated.

The decision to use backward or forward reasoning depends on the topology of the problem. Elaine Rich [1983:58] lists three factors:

(1) Are there more possible start states or goal states? We would like to move from the smaller set of states to the larger (and thus easier to find) set of states.

(2) In which direction is the branching factor (i.e., the average number of nodes that can be reached directly from a single node) greater? We would like to proceed in the direction with the lower branching factor.

(3) Will the program be asked to justify its reasoning process to a user? If so it is important to proceed in the direction that corresponds more closely with the way the user will think.

Examples:

Why is it easier to drive from an unfamiliar place to home than to drive from home to an unfamiliar place. This might be explained by the branching factor. For purposes of finding our way, there are many more locations that count as being home than there are locations that count as the unfamiliar target place. We can consider all the locations between which and home we already know a route to be equivalent to being home. If we can get to any of them, we can get home easily. But in order to know a route from where we are to an unfamiliar place, we pretty much are aiming at a very small target. This suggests that if our starting position is home and our goal position is an unfamiliar place we should plan our route by reasoning backward from the unfamiliar place.

Now consider the case of symbolic integration. The problem state is a set of formulas some of which contain integral expressions. The start state is a particular formula containing an integral expression. The goal state is a formula that is equivalent to the start state without the integral expressions. If we begin with the start state we face a large number of possible goal states.

Thus to solve this problem, it is better to reason forward using the rules for integration to try to generate an integral-free expression than to start with arbitrary integral-free expressions, use the rules for differentiation, and try to generate the particular integral we are trying to solve. Again we want to head toward the largest target; this time that means chaining forward.

Consider the case of theorem proving. The goal state is a particular theorem. Our initial state is a set of axioms. It could very well be that both states are approximately the same size. But consider the branching factor in each direction. From a small set of axioms we can generate a very large number (if not infinite) set of theorems. On the other hand, this large number of theorems must go back to a small set of axioms. This suggests that it would be much better to reason backward when trying to prove theorems. This is one reason that so many theorem proving programs use backward chaining, as done in one of the earliest AI programs called the Logic Theorist by Newell in 1963.

The third factor to consider in determining the direction of search is the need to explain the reasoning of the program. For examples doctors were unwilling to even consider the advice from the MYCIN program for medical diagnostics unless it could explain its reasoning for its diagnostic reasoning. A doctor might reason as follows:

If an organism has a set of properties Y (as determined by lab results) then it is likely that the organism is X.

Even though the evidence is most likely documented in the reverse direction: If the organism is X, then it is likely the have the set of properties Y.

(IF(ORGANISM X) (PROPERTIES Y) ) with likelihood ß.

The rules justify why certain tests should be performed by the doctor.

The best programs use some combination of forward and backward reasoning. However, one must be careful that you don't miss each other coming and going.

 

 

7.6 Pattern Matching and Conflict Resolution

 

In terms of pattern matching, forward chaining means matching the antecedent or left-hand side of a proposition and backward chaining involves matching the consequent or right-hand side of a proposition.

What happens if there is a choice in either method as to what rules to fire? This is called "conflict resolution" in most AI control schemes. Sometimes the rules are applied in a hierarchical fashion. Some are more important than others and should be activated first.

Consider the ELIZA program written by MIT's Weisenbaum in the 1960's to demonstrate the potential of computers in natural language processing. ELIZA was written to simulate a psychotherapist. ELIZA's knowledge about both English and psychology was coded in a set of simple rules. ELIZA operates my matching the left sides of the rules against the user's last sentence and using the appropriate right side to generate a response. The rules were indexed by keywords so only a few had actually to be matched in any sentence. And a priority was given to the set of keywords. This is sometimes called priority matching. For example, mother was put at the top of the list.

Some of the rules had no left side, so that the rule could be applied anytime and could be used if there was no other match. One such rule would trigger the response:

Tell me more about that.

if it could find no better match.

 

ELIZA SESSION

 

• How do you do? Please tell me your problem.

I am worried about what to say in my AI lecture tomorrow.

• Your AI lecture tomorrow?

Yes, I don't know what to tell my students about you.

• You seem to be quite positive

I am not positive, that is why I wanted to discuss it with you.

• Is it because you are not positive that you came to me?

I just said that.

• You say you just said that.

Don't play games with me.

• Let's discuss further why your AI lecture tomorrow.

I don't know whether I should tell them the truth about you.

• Don't you really know whether you should tell them the truth about I?

Perhaps I should let you tell your own story.

• You don't seem quite certain.

I am afraid you or I might make a mistake.

• How long have you been afraid I or you might make a mistake?

I'd rather not discuss that.

• I am not sure I understand you fully.

You're right, I don't think you understand me fully.

• Why are you concerned over my right?

What rights?

• Why do you ask?

Why do you ask?

• Does that question interest you?

I am getting bored with you.

• Do you believe it is normal to be getting bored with I?

I don't think I should be talking to a computer.

8 Exercises

Formulate a set of necessary characteristics for intelligence. Which computer programs that you know of satisfy these necessary conditions? Design a procedure for providing sufficiency test for intelligence.

9 References

Arden, B.W. (ed.), What Can be Automated: The Computer Science and Engineering Research Study (COSERS), Cambridge, MA: The MIT Press, 1980.

Boden, M., Artificial Intelligence and Natural Man, New York: Basic Books, Inc., 1977.

Charniak, E. and D. McDermott, Introduction to Artificial Intelligence, Reading, MA: Addison Wesley Publishing Co., 1986.

Crevier, D., AI: The Tumultuous History of the Search for Artificial Intelligence, Basic Books, Harper Collins Publisher, 1993.

Dixon, J.R., and M.K. Simmons, "Computers that Design Expert Systems for Mechanical Engineers," Computers in Mechanical Engineering, Nov. 1983, pp. 10-18.

Dreyfus, H.L., What Computers Can't Do: The Limits of Artificial Intelligence, New York: Harper Colophon Books, 1979.

Dreyfus, H. and S.E. Dreyfus, Mind Over Machine, New York: The Fee Press, A Division of Macmillan, Inc., 1986.

Dym, C.L. and R.E. Levitt, Knowledge-Based Systems in Engineering, McGraw Hill, Inc., 1991.

McDermott, J., "Domain Knowledge and the Design Process," Proceedings of the 18th Design Automaton Conference, ACM/IEEE, Nashville, Tenn., 1981, pp. 580-588.

Minsky, M.L., (Ed.), Semantic Information Processing, Cambridge, MA: MIT press, 1968.

Minsky, M.L., "Steps Toward Artificial Intelligence," in Semantic Information Processing, (ed.) Cambridge, MA: MIT press, 1968.

Newell, A. and H. A. Simon, Human Problem Solving, Englewood Cliffs, NJ: Prentice-Hall, 1972.

Newell, A., H.A. Simon and J.C. Shaw, "Empirical Explorations with the Logic Theory Machine: A Case Study in Heuristics," in E.A. Feigenbaum and J. Feldman (Eds.), Computers and Thought, New York: McGraw-Hill, 1963.

Parsaye, K. and M. Chignell, Expert Systems for Experts, New York: John Wiley & Sons, Inc., 1988.

Pearl, Judea, Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley, 1984. (on reserve in the Engineering Library).

Rich, E., Artificial Intelligence, New York: McGraw-Hill, 1983. (Pay particular attention to the A* and the AO* Algorithm.)

Ritchie, D, The Binary Brain: Artificial Intelligence in the Age of Electronics, Boston: Little, Brown and Company, 1984.

Rose, F., Into the Heart of the Mind: An American Quest for Artificial Intelligence, New York: Harper & Row Publishers, 1984.

Russell, S.J. and P. Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, Englewood Cliffs, New Jersey , 1995.

Simon, H.A., The Sciences of the Artificial, Cambridge, MA: MIT Press, 1981.


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]

Last updated: 30 January 99
Send Comments to: Alice Agogino, aagogino@me.berkeley.edu
Copyright © 1999 Alice Agogino; All Rights Reserved.