A Probabilistic Framework for

wrendeceitInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

123 views

A Probabilistic
Framework for
Information Integration and

Retrieval on the Semantic Web

by

Livia

Predoiu
,
Heiner

Stuckenschmidt

Institute of Computer Science,

University of Mannheim, Germany


presented by

Thomas Packer

Sources of Uncertainty in Automated
Processes in the Semantic Web


Uncertain Document Classification


Uncertain Ontology Learning from Text


Uncertain Ontology Matching



Leads to uncertain, unreliable or contradictory
information.


Traditional logic cannot handle inconsistency.

Motivational Example


Domain
: Bibliography


Use Case
: Find publications with keyword
“AI”.


Complication
: Second ontology does not
include the concept of “topic” or “keywords”.


Solution
: Use machine learning to categorize
documents from the second collection.

Motivational Example (Continued)


Domain
: Bibliography


Use Case
: Find publications with keyword
“AI”.


Complication
: “Report” concept in one
ontology kind of corresponds to “Publication”
in the other.


Solution
: Map concepts between
ontologies
.

Approach


Start with a more standard approach,
Description
Logic Programs
.


Extend them with probabilistic information.


Call the result
Bayesian Description Logic
Programs
(BDLPs).



It is a subset of
Bayesian Logic Programs
.


It also integrates logic programming and
description logics knowledge bases.

BDLP Pedigree

Description Logic
Programs (DLPs)

Bayesian
Description Logic
Programs (BDLPs)

Bayesian Logic
Programs (BLPs)

Description Logic
(DL)

Logic Programs
(LPs)

Bayesian
Networks (BNs)

Uses of

Bayesian Description Logic Programs


Framework for


information retrieval


information integration


across heterogeneous
ontologies
.

Description Logic Programs
(Background)


Intersection of:


Description Logics (knowledge representation)


Logic Programming (automated theorem proving)


DLP program contains:


Set of rules


Set of facts


Rules have the form:


Conjunction of predicates implies some other predicate.


H and B’s are atomic formulae.


Predicate argument are called terms.


Terms are constants or variables.


A ground atom’s terms are all constants.

Description Logic Programs
(Background)

Description Logic Programs
(Background)


Restricted expressivity


Many existing DL
ontologies

fit DLP
restrictions.


Reasoning in DLP is decidable.


Reasoning has much lower complexity than DL
reasoning in general (in theory and in
practice).

Bayesian Description Logic Programs


BDLP program contains:


Set of rules


Set of facts


Rules have the form:


Conjunction of predicates implies some other
predicate.


“|” instead of “

” to imply conditional probability.


Each rule has a probability distribution specifying the
probability of each state of the head atom given the
states of the body atoms.


Each ground atom corresponds to a BN node.

Example BDLP

Example Bayesian Network


Blue


Ontology 2


Cyan


Learned from Ontology 2


Black & White


Ontology 1


Red arcs


Mappings


Where do Probabilities Come From?


Deterministic
ontologies


true = 1.0


false = 0.0


Probabilistic tools


Naïve
Bayes

document categorization


Probabilistic ontology mapping


Subjectively.


People argue that people are inconsistent in their
judgment of probabilities.


Using subjective probabilities is still more accurate
than forcing people to use Boolean judgments.

Example Query


Query for publications about AI.


Non
-
ground query.


Two valid groundings.


Query BN for probabilities (IR with ranking).



Conclusion


Strengths:


Actually explains how Bayesian Networks relate to predicates.


Handles integration (which others do not).


Handles IR.


Weaknesses


DLPs don’t allow for negation or equivalence.


No measured evaluation.


Size of model and therefore BN can be exponential in size of KB.


Intractable exact inference in BN’s with cycles.


Future work


Learn BLP programs from data.


Prune BN to portion relevant to query.


Approximate probabilistic inference.


Parallel/distributed programming.

Questions