Waking from a Dogmatic Slumber -

plantcityorangeΔιαχείριση

6 Νοε 2013 (πριν από 4 χρόνια και 2 μέρες)

101 εμφανίσεις

ICS
-
FORTH
March 30, 2006

1

Waking from a Dogmatic Slumber
-

A Different View on Knowledge Management for DL’s

Martin Doerr


London, UK

March 30, 2006



Center for Cultural Informatics

Institute of Computer Science

Foundation for Research and Technology
-

Hellas

Workshop on Semantic Interoperability in e
-
Science

ICS
-
FORTH
March 30, 2006

2

“There are no new research challenges in DL. There are only the ones
from 30 years ago we still have not solved” (anonymous, ECDL2005)

Apologies:

I’ll be deliberately provocative and possibly incomplete. Don’t take me
too serious.



What are Digital Libraries (or more generally
Digital Memories
)?

Information systems
preserving

and providing
access

to source material,
scientific and scholarly information, such as libraries of
publications
,
experimental data

collections, scholarly and scientific encyclopedic or
thematic databases or
knowledge bases
.


Knowledge Management for DLs


Traditional Use Cases

ICS
-
FORTH
March 30, 2006

3

The traditional library task:



Collect and preserve documents and provide
finding aids



The job is
solved
, when
the

(one, best) document is
handed out
.
“All you want is in
this document”.


Implementing the finding aids:



Assumption: User knows a
topic
, characterized by a noun, or knows
associations

of the topic
uncorrelated
to the problem to be solved (e.g. “organic farming” for

“host
-
parasite studies”.)



Retrieval goal:
Aggregation

of like items/ nearly
equivalent items
, best fit.




Means: Metadata, classification, (subject hierarchies and precise match with

categorical and factual KOS/authorities), keyword extraction.



Semantic interoperability

is limited to the
aggregation

task: Metadata are mainly

homogeneous (DC, MARC etc.), challenge is the matching of terminology (KOS).





Knowledge Management for DLs

Traditional Use Cases

ICS
-
FORTH
March 30, 2006

4


Knowledge Management for DLs

Problems



No support to
solve

a problem,



e.g., what species is this object?




No support to learn from the aggregated source, to retrieve by
contexts
,



e.g., Which professions had the
relatives of

van Gogh?



e.g., Which excavation drawings show the finding of this object?



e.g., Which resolution had Galileo’s telescope
when he observed
... (in general how
reliable was a scientific observation, can we correct the values found?).




No support to
integrate complementary

information in multiple sources into
new insight,



e.g., Which where the clients of van Gogh’s paintings?




No support for
cross
-
disciplinary

search.



e.g. Ecology, ethnology and biodiversity. Biology and archaeology.

ICS
-
FORTH
March 30, 2006

5


Knowledge Management for DLs

Grand Challenge

DLs should become integral parts of work environments as sources to
find integrated knowledge and produce new knowledge
.


But How ?

Employing “global networks of knowledge”:


Understanding of user questions, scientific discourse



expressing user questions in terms of core
conceptualizations
(relationships, core concepts)


Virtual or physical data and metadata integration



metadata schema integration as
complementary elements

of a larger
global schema,
and
not
as
alternative views of the
same
content.



mapping/merging

of terminology
*



matching factual data

(“data cleaning”)



Query systems taking into account the knowledge gaps and varying precision or
abstraction levels in the integrated data


processing of
“unknown” values

and
overlapping concepts

in data and queries.


* main focus of the Semantic Web

ICS
-
FORTH
March 30, 2006

6


Knowledge Management for DLs

Grand Challenge

Is that a dream ?


“Isn’t Digital information and human knowledge is too diverse, fuzzy, case
-
dependent?”


“Is the Semantic Web much further than AI decades before?”


We regard suitable knowledge management as the key.

We distinguish:

1.

Core ontologies for “
schema semantics”,

such as: “part
-
of”,”located at”,”used for”, “made
from”. They are small and rich in
relationships

that
structure information

and relate content.

2.
Ontologies that are used as “
categorical data”

for reference and agreement on sets of things,
rather than as means of reasoning, such as: “basket ball shoe”, “whiskey tumbler”, “burma
cat”, “terramycine”. They
do not

structure information. They
aggregate
, more than integrate.

3.

Factual

background knowledge for reference and agreement as
objects of discourse
, such as
particular persons, places, material and immaterial objects, events, periods, names.


ICS
-
FORTH
March 30, 2006

7


Knowledge Management for DLs

Preconceptions and Solutions

“Libraries should not depend on domain specific needs. Domains are too many
and too diverse. DLs need a generic approach.”


This seduces us to only employ intuitive
top
-
down

approaches for generic
metadata schemata. As a result, when the fantasy is exhausted, research stops.


We need

deep knowledge engineering
, generalizing in a
bottom
-
up

manner from
real, specific cases to find the true generic structures across multiple domains. We
need interdisciplinary work on
real research scenarios.




“Ontologies are huge, messy, idiosyncratic and domain dependent. Mapping is
the only generic thing we can do”


We are transfixed with ontologies used as “categorical data” (term lists), for which
this statement is mainly true. We oversee the different character of ontologies
describing “
schema semantics”.



The latter

may pertain

to generic classes of discourse
, rather than to domains. They
are the candidates to find generic relationships that integrate both factual and
categorical knowledge in a meaningful way even for specific application. We need
interdisciplinary work.


ICS
-
FORTH
March 30, 2006

8


Knowledge Management for DLs


Preconceptions and Solutions

“Queries are mainly about classes. The main challenge of information integration
is the integration of classes (terms).”


We believe this is
not sufficiently

supported by empirical studies. Query
parameters pertain to
universals and particulars
.


We need

to systematically
analyze the workflow

of research work and the
original
research questions

in each phase. We need to provide access by
factual
relationships

(Amit Sheth), such as “georeferencing”.


“Manual work is not scalable or affordable. Only fully automated methods have a
chance”


This seduces us to
discard the quality

of manual, intellectual decisions. Yet billions
of people produce content manually. Wikipedia demonstrates, that the above is not
true.


We need

to design the interactive processes and the awarding of users to
massively involve
Virtual Communities / Organisations

in cataloguing, data
cleaning and ontology development. We need
semiautomatic, highly distributed
algorithms. We need interdisciplinary work.




ICS
-
FORTH
March 30, 2006

9


Knowledge Management for DLs

Do we talk about the same thing?

“We need more reasoning!”


This is true. But
what sort of

reasoning? And before
any reasoning

can be done,
data must be
connected
, in a
“global network of knowledge”
. We must first clarify:


Do we talk about the same thing?


Requisites for a global network of knowledge:

1.
A
sufficiently generic

global model (core ontology with the
revelant
relationships)
.

2.
Methods to
populate the network:

knowledge extraction / data transformation.

3.
Theories of negotiating

identifier equivalence

across contexts
(databases).

4.
Algorithms for global, massive, distributed,
semiautomatic
detection of
identifier equivalence
(
data cleaning ) across contexts and to
curate referential
integrity

in order to create, maintain and improve the
consistency

of

global
networks of knowledge as a
continuous
process (not making yet another
database).



And
only then

we can do advanced reasoning and intelligent query processing ...

ICS
-
FORTH
March 30, 2006

10


Knowledge Management for DLs

Example: the core ontology ISO21127

The CIDOC Conceptual Reference Model (ISO/FDIS 21127)



is a
core ontology

describing the underlying semantics of data schemata
and structures from all museum disciplines and archives. Now being
merged with
IFLA FRBR

concepts
.



It is result of long
-
term
interdisciplinary work

and agreement.



In essence, it is a
generic model

of recording of “what has happened” in
human scale, i.e. a class of discourse.



It can generate huge, meaningful
networks of knowledge

by a simple
abstraction: history as meetings of people, things and information.



It bears surprise
: more effective metadata structures, and linking
schemes can be created from it.


ICS
-
FORTH
March 30, 2006

11

S

t

Caesar’s
mother

Caesar


Brutus

Brutus’
dagger

coherence
volume of
Caesar’s death

coherence
volume of
Caesar’s birth


Knowledge Management for DLs

Example: History as Meetings of People..

ICS
-
FORTH
March 30, 2006

12

S

t

runner

1
st

Athenian

coherence volume of
first announcement

coherence volume of
the battle of
Marathon

Marathon

other

Soldiers

Athens

2
nd

Athenian

coherence volume of
second announcement

Victory!!!

Victory!!!


Knowledge Management for DLs

Example: …Things and Information

ICS
-
FORTH
March 30, 2006

13

Type:


Text

Title:



Protocol
o
f Proceedings
o
f Crimea Conference

Title.Subtitle:


II. Declaration
o
f Liberated Europe


Date:



February 11, 1945.

Creator:


Th
e Premier of the Union of Soviet Socialist Republics




Th
e Prime Minister of the United Kingdom





T
he President of the United States of

America

Publisher:

State Department

Subject:


P
ostwar division

of Europe and Japan


The following declaration has been approved:

The Premier of the Union of Soviet Socialist Republics,

the Prime Minister of the United Kingdom and the President

of the United States of America have consulted with each

other in the common interests of the people of their countries

and those of liberated Europe. They jointly declare their mutual

agreement to concert



….
and to ensure that Germany will never again be able to


disturb the peace of the world
……



Documents

Metadata

About…


Knowledge Management for DLs

Example: Meetings and Metadata

ICS
-
FORTH
March 30, 2006

14

Type:


Image

Title:



Allied Leaders at Yalta


Date:



1945

Publisher:

United Press International (UPI)

Source:


The Bettmann Archive

Copyright:

Corbis

References:

Churchill, Roosevelt, Stalin

Photos, Persons

Metadata

About…


Knowledge Management for DLs

Example: Meetings and Metadata

ICS
-
FORTH
March 30, 2006

15

E31 Document

“Yalta Agreement”

E7 Activity


Crimea Conference


E65 Creation
Event

*

E38 Image

P86 falls within

E52 Time
-
Span

February 1945

P81 ongoing throughout

P82 at some time within

E39 Actor

E39 Actor

E39 Actor

E53 Place

7012124

E52 Time
-
Span

11
-
2
-
1945


Knowledge Management for DLs

Example: The ISO21127 Solution

ICS
-
FORTH
March 30, 2006

16

Knowledge Management for DLs


CIDOC CRM
Top
-
level Entities

participate in

E39
Actors

E55
Types

E28
Conceptual Objects

E18
Physical
Thing

E2
Temporal Entities

affect or / refer to

refer to / refine

location

at

E53
Places

E52
Time
-
Spans

ICS
-
FORTH
March 30, 2006

17



Identification

of real world items by real world names.



Classification

of

real

world

items
.




Part
-
decomposition

and

structural

properties

of

Conceptual

&



Physical

Objects,

Periods,

Actors,

Places

and

Times
.



Participation

of

persistent

items

in

temporal

entities
.



creates

a

notion

of

history
:

“world
-
lines”

meeting

in

space
-
time
.



Location

of

periods

in

space
-
time

and

physical

objects

in

space
.



Influence

of

objects

on

activities

and

products

and

vice
-
versa
.



Reference

of

information

objects

to

any

real
-
world

item
.

T
he CIDOC CRM

A Classification of its Relationships

ICS
-
FORTH
March 30, 2006

18

The CIDOC CRM

Example: The Temporal Entity Hierarchy

ICS
-
FORTH
March 30, 2006

19

Integration by

Factual Relations

Ethiopia

Johanson's
Expedition

CIDOC CRM

Core Ontology

Document
s in


Digital Librar
ies

Hadar

Discovery of
Lucy

AL 288
-
1

Lucy

Deductions

Linking documents

by co
-
reference

Primary link

corresponding to

one

document

Donald Johanson

Cleveland Museum

of Natural History

Instance of

real world

nodes (KOS)


Knowledge Management for DLs


Hypertext is wrong: Documents contain links!

ICS
-
FORTH
March 30, 2006

20

Content

Content

Source 1

Source 2

Query “Friends of a Friend”

1. query

Knowledge Management for DLs


Identifier Equivalence

input: “Martin”

Read output:

find “Kostas”
,

guess


Κώστας


2. query

input: “
Κώστας


output: “George”

ICS
-
FORTH
March 30, 2006

21

.

.

.

.

.

.

.

.

match

Authority service

.

.

.

.

local ids

Content

Content

L

i

n

k


t

a

b

l

e

find

equiva
-
lence

find equivalence

match

Source 1

Source 2

local ids


id

Dyn amic
li nk

Join

Join across sources by transitivity

of identifier equivalence

query

Knowledge Management for DLs


Curating Identifier Equivalence

input: “Martin”

output: “George”


Κώστας
” /

“Kostas”

ICS
-
FORTH
March 30, 2006

22


Knowledge Management for DLs

Conclusions

It is feasible to create effective,
sustainable, large
-
scale

networks of knowledge:


The CRM and its extensions seems to have the power to integrate historical knowledge in
Archives Libraries and Museums.


The CRM can be applied in surprisingly simple forms (CRM Core).

Is the CIDOC CRM more generally
applicable to e
-
science
?


Humanities collect factual knowledge. The CRM is a model of factual relationships at first.


Sciences collect categorical knowledge. But we oversee the record of
experimental data
, which
justifies

this knowledge and is by far
larger than

the resulting categorical knowledge.


Descriptive sciences already produce both categorical and factual knowledge.

Thesis:



the evaluation, access to and reuse of the record of experimental data of the
sciences has a

historical dimension
: who measured what, when, where under which circumstances, with which
methods, which knowledge, which care?


We
overestimate

the relevance of domain categories, and completely oversee the relevance of
the
historical relationships

in our
scientific

reasoning

(I am physicist).


Try the CRM, as a
starting model for e
-
science
, and see
what else

is needed.

ICS
-
FORTH
March 30, 2006

23


Knowledge Management for DLs

Conclusions

If we rethink old positions, we will find surprising new answers to

“..an information model for digital libraries that intentionally moves 'beyond search
and access’, without ignoring these functions, and facilitates the creation of
collaborative and contextual knowledge environments.”


(C.Lagoze, D
-
Lib Magazine 2005)


But:


We need a
massive investment in understanding

and generalizing the intellectual
processes and original
research questions
in interdisciplinary work.


We have to do research in
dynamic collaborative knowledge organization forms
,
formal processes and algorithms that
converge

to higher stages of knowledge
integration.


The large networks of integrated knowledge to come will need continuous
maintenance with
new, specific social organisation forms

and GRID
-
like resource
access, and they may look very different from our current systems…




(This is again a 30 years old challenge, are we closer now?)