Rollins School of Public Health at Emory University
Instructor: Dr. Anthony G. Francis, Jr.
Lecture 11: Expert Systems and Reasoning with Uncertainty
Expert systems automate or assist with expert human tasks by capturing
knowledge held by experts and using AI reasoning techniques to draw
conclusions in novel situations; examples of expert systems include
medical diagnosis and clinical decision support systems. Expert
systems typically reason with logical rules, which in theory can
draw potentially infinite sets of valid conclusions from the given facts.
However, simple reasoning is not enough; typically have additional
components related to entering, maintaining, and explaining the
application of the knowledge they store. Furthermore, we frequently
cannot specify sound and complete rules for an expert domain; instead,
expert systems use probability models to draw conclusion conclusions
with varying degrees of belief that become increasingly tenuous the
further and further we stray from observed facts. These probability
models, often based on the Bayesian statistical framework, have wide
application across many areas of artificial intelligence, including
expert systems, decision theory, and information retrieval
Outline
- Expert Systems
- Architecture of Expert Systems
- Successful Expert Systems
- Reasoning with Uncertainty
- Bayesian Networks
- Applications of Bayesian Networks
Readings:
- Artificial Intelligence: Chapters 17 (esp. 17.4) and 19
- Machines Who Think: Chapter 12
Expert Systems
The rise of knowledge-based artificial intelligence
- Early AI: Weak Methods
- Universal mechanisms for solving generic problems
- Computationally intractable for all but the smallest, simplest problems
- Examples: Logic Theorist, General Problem Solver
- Logic Theorist: mathematical proofs
- General Problem Solver: planning
- EPAM: memory retrieval
- 1970's: Strong Methods
- Add task-specific knowledge about particular problems
- Not universal, but may be efficient and effective for given problems
- Examples:
- DENDRAL: spectroscopic analysis
- MACSYMA: mathematical equation solver
- HEARSAY: speech understanding
Expert Systems: Strong Methods applied to Expert Problems
- Focus on tasks normally performed by human specialists
- Solve "expert" rather than "mundane" problems
- Mundane tasks: basic tasks performed by every human every day
- General cognitive, perceptual and motor faculties
- Very hard problems for artificial intelligence
- Involve signal processing, vast knowledge, or realtime control
- Require vast amounts of memory and/or processing power
- Examples: recognizing a face, driving a car, making a sandwich
- Expert problems: symbolic cognitive tasks requiring trained experts
- Specialized tasks employing our general cognitive apparatus
- "Easy" problems for AI because they are abstracted to the symbolic realm
- Involve symbolic information, logical reasoning, and probability
- Require good knowledge representation and inference algorithms
- Examples: playing chess, diagnosing a disease, solving an equation
- Because of their task-specific focus:
- Require support architecture around core AI components
- Require knowledge engineering to collect specialist knowledge
Architecture of Expert Systems
Major Components of Expert Systems
- Knowledge Base
- Working Memory
- Inference Engine
- Explanation System
- Knowledge Base Editor
- User Interface
Users of Expert Systems
- Domain Expert
- Knowledge Engineer
- End User
Knowledge Base:
- Updated through a process of knowledge engineering
- Knowledge engineers interview domain experts
- Initial system is iteratively tested
- Occasionally side-by-side comparisons used
- Incorporates both facts and heuristics
- Facts: Generally public, shared, explicit, vetted knowledge
- Representations: Logical Statements or Network Knowledge
- Semantic Networks
- Frames
- Object-Attribute-Value (OAV) tuples
- Heuristics: Generally private, not shared, implicit, rules of thumb
- Representations: If-Then Rules
- Simple Rules
- Variabilized Rules
- Uncertain Rules
Successful Expert Systems
- All too many failures:
Made expert system problem solving a "stunt" that disrupted practice
- MYCIN - sessions too long, range too limited, too large and expensive to deploy
- XCON - hard to maintain as system grew to 10,000 rules
- In general: limited to
"Any problem that can be and frequently
is solved by your in-house expert in a
10 to 30 minute phone call,"
- Morris W. Firebaugh
- Invisible successes:
Hidden systems, hidden knowledge
- MAXIMA: freely available symbolic mathematics package
- COLOSSUS: Australian insurance adjustor advisor
- F-16 Maintenance Skills Tutor: has expert system troubleshooter
- Successes in Healthcare
Systems integrated into medical practice:
- PUFF - interprets pulmonary function tests
- PIERS - produces chemical pathology reports
- FocalPoint - scans 10% of all PAP screen slides
Reasoning with Uncertainty
- Requirements for Logical Reasoning
- Consistent Axioms
- Correct Inference Rules
- Valid Initial Facts
- When Logic Fails
- Unknown or unknowable "axioms"
- Uncertain inference rules or process models
- Incomplete knowledge of the world or its state
- Probability Theory
- Probability: degree of belief in a proposition
- Not statistical probability (likelihood that an event will occur)
- Could be derived statistically if population sampling data available
- Usually this data is not available or accurate
- Views of Probability
- Frequentist - probabilities derived from experiment
- Objectivist - probabilities are properties of the universe
- Subjectivist - probabilities characterize agent's beliefs
- Reference problem - the more precisely we objectively fix
a situation, the smaller the range of conditions over which it holds
- Kolmogoroev's Axioms of Probability
- 1. All probabilities are between 0 and 1
- 2a. Necessarily true propositions have probability 1
- 2b. Necessarily false propositions have probability 0
- 3. The probability of a disjunction is the sum of the
probabilities of its elements, minus the probability
they both happen at the same time
- Result: sum of all ways an event can happen must be 1
- Random variables
- One result of Kolmogorev's axioms:
Sum of all mutually exclusive ways an event can happen must be 1
- Random variables enumerate these mutually exclusive outcomes
- Types of random variables:
- Propositional: True or False
- Categorical: take on one of a set of values
- Numerical: take on one of a continuous range of values
- Probability of Random Variables:
- P( v1 )
- Short for P( V=v1 )
- Joint probability of many variables
- P( V1, V2, V3 )
- can take on values P( V1=v1, V2=v2, V3=v3 ... VN=vn )
- equivalent to P( v1 ^ v2 ^ v3 ... vn )
- Probabilistic Reasoning
- Prior Probability: likelihood of a proposition, all things being equal:
- P( H ) short for P( Hypothesis ) across all joint probability distributions
- Conditional Probability: likelihood of one event given another
- P( H | E ) short for P( Hypothesis | Evidence )
- P( H | E ) = P( H ^ E ) / P( E )
- Chain Rule: reasoning across multiple hypotheses
- P( V1, V2 ... Vn ) = Power(i=1..n) P( Vi | V1, ... VN )
- Bayes' Rule: reasoning back from evidence to hypotheses
- P( H | E ) = P( E | H ) * P( H ) / P( E )
- Problems with Reasoning with Uncertainty
- Ad-hoc methods do not scale
- MYCIN certainty factors give good results for small problems
- Not reliable for larger chains of rules
- Not stable for larger problems
- Fully specified probability theory can be intractable
- Joint probability distributions have combinatorial explosion
- Cannot collect answers for every cell of the matrix
- Could not tractably compute with them if you had them
- Bayesian reasoning often provides an effective method
- Compute using Bayes rule over bayesian inference networks
- Uses independence assumptions to streamline reasoning
- Often works almost as well as more accurate approaches
Bayesian Networks
- Represent conditional independence as a directed acyclic graph (DAG)
- Requires conditional independence:
P( H | E1, E2 ) = P( H | E1 ) means H is conditionally independent of E2
- Enables us to eliminate many unneeded cells of the matrix
- Structure of a Bayesian Network
- Prior probabilities are assigned to nodes without parents
- Conditional probability tables assigned for nodes with parents
- Nodes that are not connected are assumed to be independent
- Reasoning with a Bayesian Network
- Use the chain rule over all conditional dependencies
- E.g., P( C1, C2, P1, P2 ) = P( C1 | P1, P2 )*P( C2 | P2 )*P( P1 )*P( P2 )
- Much simpler than the unconstrained case: 16 possibilities
- Other uses of Bayesian Networks
- D-Separation: compute independence given some evidence
- Polytrees: Simpler networks which enable more efficient inference
- Evidence Above: compute probability of hypotheses given evidence
- Evidence Below: compute probability of evidence given hypotheses
Applications of Bayesian Networks
- Expert systems - compute principled probability relations
- Planning - compute the best possible action
- Information retrieval - guess which documents are relevant
Resources
- Expert Systems: ????
- Logic: ????
- Probability: ????
- Utility Theory: ????
- Decision Theory: ????
- Bayesian Reasoning: ????
- Bayesian Networks: ????
- Clinical Decision Support Systems: ????
- AI and Healthcare: ????
|
Research
Articles
Classes
Software
Classic
Weblog
Wiki
Store
f@nu fiku
Fiction
Personal
About
Contact
|