14_Reasoning_under_Uncertainty_(compatible).

lettuceescargatoireΤεχνίτη Νοημοσύνη και Ρομποτική

7 Νοε 2013 (πριν από 4 χρόνια και 1 μέρα)

111 εμφανίσεις

References:


M. Ginsberg.
Elements of Artificial Intelligence
, Morgan Kaufmann, 1995.


Luger and Stublefield.
Artificial Intelligence: Structures and Algorithms
,
Addisson Wesley, 2009.


Russell and Norvig.
Artificial Intelligence: A Modern Approach
. Prentice Hall,
2003.

In this class


Dealing with complexity of Bayes’ theorem: Bayesian
Networks.


Reflecting the intuition of experts: Stanford Certainty
Theory


Dealing with our ignorance and uncertainty: The
Dempster
-
Shafer Theory of Evidence.

Bayesian Networks

Example :


Suppose that we know the probability of a traffic light
being green is 0.45, of its being yellow is 0.1, and of it being
red is 0.45. Furthermore, suppose that we have a 25%
chance of running a red traffic without being ticketed, and
a 5% chance of being ticketed for running a yellow light.


Additionally, suppose that I go on to tell you that if I get a
ticket, there is a 90% chance that I will subsequently be in a
bad mood; if I don’t get a ticket, there is only a 5% chance.
What is the overall probability that I will later be in a bad
mood?

Example from: M. Ginsberg, Essentials of Artificial Intelligence. Morgan Kaufmann, 1995.

Bayesian Networks

Example Cont’d

green

yellow

red

ticket

ticket

ticket

ticket

ticket

ticket

Bad
-
mood

Bad
-
mood

In general, p(q) for some query q, given n random variables requires 2
n
different
conditional probabilities!


p(bad
-
mood|green
, ticket)


p(bad
-
mood|green
, ticket)


p(bad
-
mood|red
, ticket)


p(bad
-
mood|red
, ticket)


p(bad
-
mood|yellow
, ticket)


p(bad
-
mood|yellow
, ticket)


p(bad
-
mood|yellow
)


p(bad
-
mood|green
)


p(bad
-
mood|red
)


p(yellow)


p(red)


p(green)










Bayesian Networks

Example Cont’d


Assume we also add that I may lose my license if I get a
ticket.


Now we have four variables: Color of light, Ticket, Lose
license, and Bad mood.



To get the probability of a bad mood given those
variables we would have to measure/specify the
probabilities of all combinations of the four random
variables. How many in total?


3 * 2 * 2 * 2 = 24 different dependencies!

Bayesian Networks


Bayesian networks allow us to clearly and efficiently
represent conditional independence assumptions
using
directed graphs
.

Bayesian Networks

Example Cont’d

Color of
Light

Ticket

Bad
mood

Lose
license

Bayesian Networks

Example Cont’d


A directed link from a node A to a node B reflects a
causal relationship between A and B (A influences B).


Therefore, coherent patterns of reasoning are reflected
as a path in this graph from one node to another.

Color of
Light

Ticket

Bad
mood

Lose
license

Interpreting Bayesian Networks


Given a Bayesian Network I, we will say that a set E of
nodes splits a node x from another y if every path from
x to y passes through E
-
{x,y}.


If E splits x from y, then y is conditionally independent
from x given E.

Bayesian Networks

Example Cont’d


E= {Ticket} splits Color of Light from Bad mood.
Therefore, Bad mood is independent of Color of light
given whether or not we have received a ticket.

Color of
Light

Ticket

Bad
mood

Lose
license

Bayesian Networks

Example Cont’d


To calculate the p(Bad
-
mood) we need:

1.
unconditional probability of each Color
-
of
-
light (total of 3).

2.
Conditional probabilities of getting and not getting a ticket given a
specific Color
-
of
-
light (total of 6).

3.
The conditional probability of a bad mood given the I got a ticket and
that I did not get a ticket (total of 2).

Compare that with the amount of information needed when no
dependency is given!

Color of
Light

Ticket

Bad
mood

Lose
license

Bayesian Networks

d
-
seperation


Assume we add to our example that I can get a ticket for
driving without insurance.









Are Color of light and No insurance independent?


If we know nothing about whether or not I got ticket, then no.


But once we find out I did get a ticket then they become
dependent. Why?

Color of
Light

Ticket

Bad
mood

Lose
license

No
insurance

Bayesian Networks

d
-
seperation


Given a Bayesian network I, nodes x and y, and a path
p=<x=n
0
,...,n
k
=y> between them, we will say that p
d
-
separates

x and y if every path between them is
blocked
.
(Luger et al, 2009)



Bayesian Networks

d
-
seperation


A path is blocked if there is an intermediate node V in
the path with either of these properties:

1.
The connection is serial or diverging and the state of V
is unknown.



2.
The connection is converging and neither V nor any of
its children have evidence.




A

V

B

A

V

B

A

V

B

Bayesian Networks

d
-
connectedness


back to our traffic ticket example








Color of
Light

Ticket

Bad
mood

Lose
license

No
insurance

V

Child of V

Bayesian Networks

Advantages


“Allow us to conveniently represent the conditional
independence assumptions”
(M. Ginsberg, “Essentials of Artificial Intelligence”:
Morgan Kaufmann, 1995)


“reduce the amount of information needed by a
probabilistic reasoner
.”
(M. Ginsberg, “Essentials of Artificial Intelligence”: Morgan
Kaufmann, 1995)


The data of a domain can partition and focus reasoning,
thus reducing the complexity of the search.

The Stanford Certainty Factor


Used in MYCIN diagnostic system.


Stanford certainty theory
uses human judgment to
associate a confidence measure with a conclusion.


Each rule in the knowledge base is associated by a
confidence factor
.


Confidence factors reflect the human expert’s
confidence in the rule’s reliability.


Confidence factors may be adjusted to tune the
system’s performance

The Stanford Certainty Factor

Confidence Factors for Rules


Each rule in the knowledge base is associated by a
confidence factor
.


Confidence factors reflect the human expert’s
confidence in the rule’s reliability.


Confidence factors may be adjusted to tune the
system’s performance

Example:

(P1 and P2) or P3


R1
(0.7)
and R2
(0.3)

Stanford Certainty Theory


Stanford certainty theory splits “belief in a hypothesis”
from the “disbelief in the hypothesis”



There are two ways to measure the belief MB or
disbelief MD in a hypothesis H given evidence E.
Either:

1 > MB(H|E) > 0 while MD(H|E) = 0, or

1 > MD(H|E) > 0 while MB(H|E) = 0, or


Stanford Certainty Theory


The confidence in a hypothesis H given evidence E
with MD(H|E) and MB(H|E):

CF(H|E) = MB(H|E)


MD(H|E)


hypothesis known to
be false

hypothesis known to
be true

-
1

1

0

Calculating CF from rules


The premises for each rule can be combined using
and
s

and
or
s.


Given the confidence factor of premises P1 and P2:

CF(P1
and

P2) = MIN(CF(P1), CF(P2))

CF(P1
or
P2) = MAX(CF(P1), CF(P2))


Given a rule of the form

P


R (CF(rule))

The confidence of the conclusion R of the rule given
CF(P)

is

CF(P).CF(rule) if CF(P)>0

0 otherwise



Calculating CF from rules


If two different rules produces the same result R with two
different confidence factors
CF(R1)

and
CF(R2)

then the
combined confidence is


CF(R1) + CF(R2)


(CF(R1).CF(R2))
when
CF(R1), CF(R2)>0

CF(R1, R2)= CF(R1) + CF(R2) + (CF(R1).CF(R2))
when
CF(R1),CF(R2)<0

(CF(R1) + CF(R2))/(1
-
MIN((|CF(R1)|,|CF(R2)|))
otherwise



Stanford Certainty Theory

Advantages


Easy to compute


CFs are always between 1 and
-
1


Contradictory CFs cancel each other.


The combined CF measure is a monotonically
increasing/decreasing function which is consistent
with combining evidence.


The CF factor attached to rules is a human estimate of
the expert’s confidence in the rule.

Stanford Certainty Theory

Disadvantages


Functions used to calculate CF are
ad hoc
.


The quality of the program is based on the quality of
the subjective confidence factors attached to the rules.

MYCIN and Certainty Theory


MYCIN’s performance reflects that of the medical
experts.


Its limitations seem to be due to the limitations of its
knowledge base not the numerical algorithm.


The
Dempster
-
Shafer Theory of
Evidence


Designed to deal with the distinction between
ignorance

and
uncertainty
.


Assigns propositions an interval
[belief, plausibility].

The
Dempster
-
Shafer Theory of
Evidence


Belief is computed using the
belief function bel(X)
which measures the probability that the evidence
supports a given proposition.


bel(X) ranges


from 0 indicating the no evidence of support for a
proposition.


to 1 indicating certainty.


Plausibility

is defined as:

pl(X) = 1


bel(not(p))

The
Dempster
-
Shafer Theory of Evidence

Example


A shady character comes up to you and offers to bet you $10
that his coin will come up heads on the next flip.


Given that the coin might, or might not be fair, what belief
should you ascribe to the event that it comes up heads?




Since you have no evidence either way:

bel
(heads) =
bel
(heads) = 0

pl(heads) = 1
-
0= 1

pl(heads) = 1
-
0= 1

The Dempster
-
Shafer Theory of Evidence

Example Cont’d


Now an expert comes along an tells you that with 90%
certainty that the coin is fair.



Now your belief that the coin will turn up heads/tails is
updated to:


bel(heads) = bel(heads) = 0.9 * 0.5 = 0.45


Now what should you do? Accept the bet or decline it?

The
Dempster
-
Shafer Theory of
Evidence


reasoning system treats lack of evidence in a manner
consistent with our intuition.


But does not always provide a clear approach to
decision making.


The theory also violates a basic axiom of probability:

p(r) = 1


p( r )