A Logical Model for Digital Archives

doctorlanguidInternet και Εφαρμογές Web

8 Δεκ 2013 (πριν από 3 χρόνια και 9 μήνες)

376 εμφανίσεις

A
L
ogical
M
odel
for
D
igital
A
rchives



By


Rathachai Chawuthai










Examination Committee:


Prof.
Vilas Wuwongse (Chairperson)





Assoc. Prof
. Vatcharaporn Esichaikul

(Member)





Dr.
Raphael Duboz

(Member)













Nationality:

Thai


Previous Degree:

Bachelor of Computer Engineering






King Mongkut's Institute of Technology Ladkrabang





Bangkok
,Thailand











Scholarship Donor:

His Majesty the King’s Scholarship











Asian
Institute of Technology

School of Engineering and Technology

Thailand


May 2012

2


ACKNOWLEDGEMENTS

Acknowledgements

I would like to express my sincere gratitude to Prof. Vi
l
as Wuwongse, my thesis
advisor for the constant guidance with his insightful ideas an
d valuable suggestion
and instruction during the period of this thesis study.

I am also would like to take this opportunity to express
my

gratitude to Prof. Hideaki
Takeda, my thesis co
-
advisor in NII, Japan for his precious suggestion and practical
supervision during the period of my internship in Tokyo.

I am
truthfully

like to
give
thank
s

to Assoc. Prof. Vatcharaporn Esichaikul and Dr.
Ra
phael Duboz for their
beneficial

comments and suggestions towards improving this
thesis.

I would also like to acknowledge the financial support from His Majesty the King’s
Scholarship towards my study in AIT and also grateful to
NII International Internshi
p
Program which provide
s great

opportunity for

me to study

in Tokyo.

During this study
,

I have collaborated with many
classmates and senior
s

in CSIM
,
colleagues and consultants in NII,

and
co
-
workers

in Thomson Reuters
for whom I
have great regard
s
, and I wish to extend my warmest thanks to all those who have
shared with me the insightful
knowledge
.

Finally,
I

wholehearted
ly

thank my family, my friends, and all others whose names I
did not mention for supporting me throughout all my stud
y

in AIT.


Rathachai Chawuthai


3


ABSTRACT

Abstract

[TBD]


Keywords
: Digital preservation, Digital archives

4


TABLE OF CONTENTS

Table of contents

ACKNOWLEDGEMENTS

................................
................................
................................

2

ABSTRACT

................................
................................
................................
.........................

3

TABLE OF CONTENTS

................................
................................
................................
....

4

LIST OF FIGU
RES

................................
................................
................................
............

5

1

INTRODUCTION

................................
................................
................................
......

6

1.1

Background

................................
................................
................................
...........

6

1.2

Statement of problem

................................
................................
............................

7

1.3

Objective

................................
................................
................................
...............

7

1.4

Scope of study

................................
................................
................................
.......

8

2

LITERATURE REVIEW

................................
................................
..........................

9

2.1

Digital preservation

................................
................................
...............................

9

2.2

Species and taxonomy evolution

................................
................................
.........

10

2.3

Reference and guideline of digital archives

................................
........................

11

2.4

Underlying Community Knowledge

................................
................................
...

18

2.5

Related works

................................
................................
................................
......

18

2.6

Semantic technologies

................................
................................
.........................

20

2.
7

Third party tools

................................
................................
................................
..

23

3

METHODOLOGY

................................
................................
................................
...

26

3.1

Overview

................................
................................
................................
.............

26

3.2

Overview of
Underlying Common Community Knowledge

..............................

26

3.3

Evolution of Contextual Knowledge

................................
................................
...

27

3.4

Information model

................................
................................
...............................

32

3.5

Application profile for underlying community knowledge

................................

44

3.6

State contextual knowledge of a concept from information model

....................

44

3.7

Link of concept from information model

................................
............................

46

3.8

CKA ontology

................................
................................
................................
.....

47

4

IMPLEMENTA
TION

................................
................................
..............................

49

4.1

Summary of system requirements

................................
................................
.......

49

4.2

Features

................................
................................
................................
...............

49

4.3

System archi
tecture

................................
................................
.............................

50

4.4

Interoperability of digital archive systems

................................
..........................

58

5

RESULT AND EVALUATIO
N
................................
................................
...............

60

5.1

Result summary

................................
................................
................................
...

60

5.2

Evaluation and discussion

................................
................................
...................

61

6

CONCLUSIONS AND FUTU
RE WORKS
................................
............................

65

REFERENCE

................................
................................
................................
....................

66



5


LIST OF FIGURES

List of figures

Figure 1: Information Package Concepts and Relationships

................................
..............

12

Figure 2: OAIS Functional Entities

................................
................................
....................

14

Figure 3: The PREMIS Data Model

................................
................................
...................

15

Figure 4: Modeling Users, Profiles, Modules and Dependencies

................................
.......

19

Figure 5: A RDF Graph

................................
................................
................................
......

21

Figure 6: Example of RDF graph for two
continuous sentences

................................
........

21

Figure 7: Linked Open Data Cloud Diagram (as September 2010)

................................
....

23

Figure 8: Sesame’s architecture

................................
................................
..........................

24

Figure 9: A logical model of digital archives
................................
................................
......

33

Figure 10: A diagram of concept evolution

................................
................................
........

39

Figure 11: RDF model for simple concept evolution

................................
.........................

39

Figure 12: A diagram
of relation evolution
................................
................................
.........

41

Figure 13: An application profile for underlying community knowledge

..........................

44

Figure 14: Example of mechanism of the inference rules
................................
...................

46

Figure 15: Example of mechanism of the inference linked concept

................................
...

47

Figure 16: Classes and properties hierarchies of ontology CKA

................................
........

48

Figure 17: A
system architecture plan

................................
................................
................

50

Figure 18: A realization of the system architecture

................................
............................

52

Figure 19: Example document process for Master thesis management

..............................

52

Figure 20: Digital resource d
escription on Alfresco

................................
...........................

53

Figure 21: List of concepts which are available in the digital resource

..............................

53

Figure 22: Contextual knowledge of the concept and relevant concepts

............................

54

Figure 23: Detail about concept evolution

................................
................................
..........

55

Figure 24: A UI of community knowledge’s metadata definition

................................
......

56

Figure 25: A diagram of interoperability between UCKs

................................
...................

59


6


CHAPTER

1


INTRODUCTION

1

Introduction

1.1

Background

In real
-
world, every organization, for instance, business, government,

and institute
daily
produces
a
lot of
documents

in order to record and represent evidence of organizational
data, information and knowledge

(Buckland,

1997)
.

Actually, i
t is possible that some
documents may be
damaged or lost because of a poor document management.


Nowadays,
t
he shift of digital technology

seems to

enhance methodology of recording
organizational
knowledge from

traditional physical
format

to be digital
resource stored in electronic
media

(Huber, 1990)
.
In order to
have

digital resource
s

enduringly, an organization
needs
to
find out

some strategies which ensure long
-
term access and error
-
free storage of di
gital
information.
Fortunately, t
here are many researches and solutions can
support
sustainability of rendering a digital file

over time
(Yuan & Banach, 2011)
.

Although a
digital object can be rendered
correctly
, it becomes val
ueless if no one can
understand

it
originally. Certainly
,

to preserve interpretation of a digital resource becomes a
new
challenge

in a next step of digital archives
(Flouris & Meghini, 2007)
.

Digital archive is a collection of

historical records of digital resources. In general, an
archival resource is permanently preserved in order to ensure that it is accessible over
time. There are many capable digital preservation approaches including reference
guidelines, metadata standard
s, and systems such as OAIS
1
, PREMIS
2
, and DuraSpace
3

are also available. It therefore
guarantees

that bit streams of the digital resources
will be

preserved and
originally rendered

in the future

(Rhys
-
Lewis, 2000)
.

However, digital preservation is not only limited to the physical level, but also covers a
logical level which preserves the interpretation of digital information for future
(Flouris &
Meghini, 2007)
.

A great challenge is how t
o
have

a consumer

to

understand

some
concept
s

originally, while the consumer’s background knowledge
, which depends on

specific

time and social,

is dissimilar

to

what information provider
has
.

One of solutions is
t
o archive contextual knowledge

together

wit
h digital resources.

Flouris
and Meghini
introduced a formal theory of digital preservation by integrating digital information along
with underlying community knowledge. The theory also guides how

to

support original
interpretation by

serving proper contex
tual knowledge
to

a

consumer. Currently, the
theory is made practical use by Framework Programmes: CASPAR
4

and SHARMAN
5
.




1

OAIS:
Open Archival Information System is an archive, consisting of an organization of people and
systems, that has accepted the responsibility to preserve information and make it available for a De
signated
Community
(CCSDS, 2003)
.

2

PREMIS
: PREservation Metadata: Implementation Strategies, an international working group concerned
with developing metadata for use in dig
ital preservation
(PREMIS, 2004)
.

3

DuraSpace: An open technology

provide
s

long
-
term, du
rable access to digital
resources
(Duraspace, 2009)
.

4

CASPAR: Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval has built a
framework to support the end
-
to
-
end preservation life cycle for digital
information

(CASPAR, 2005)
.

7


As well as preserving contextual knowledge, an idea of linked data becomes a feature for
interpreting a concept
(Mulwad, Finin, Syed, & Joshi, 2010)
.
Moreover, t
he linked
archival information
is a key of open access which is recommended by

the

best practices
for digital archiving

(Hodge, 2004)
. Many of digita
l libraries, museums and institutional
repositories give priority to exchange knowledge and establish relationship among digital
resources across repositories such as Europeana
6

project.

In conclusion, beside of maintenance a digital file, preservation of

contextual knowledge
becomes a benefit of original
understanding
. Moreover,
linked data takes part in
understanding

of concept.
In view of that, t
his thesis is going to introduce a logical model
of digital archive
s

by preserving contextual knowledge toget
her with linked concept.

1.2

Statement of p
roblem

In general, a concept regularly lack of single interpretation, for example name concept an
d
taxonomy concept of
same biological data which is diversely recorded by a few institutes

(Mallet, 2007)
.

As a result
, some biologists are difficult to discover more knowledge from
archival information across communities.
A

preserv
ation of

concept’s interpretation

is a
proper solution of this issue. However,
they are many challenges which
sustain
interpretation

in practical
.

Firstly,
differen
ce
s

among background knowledge between

a

consumer

and
a provider

become

gap
of interpretation. To identify association between community knowledge is
difficult because each designated community
w
orks with

its
own scheme
to record
own

knowledge.

Secondly,
a consumer has

to link some relevant concept
s

for helping
accurate

interpretation. Due to the fact that each designated community
defines

a term of concept
independently
,

linking concepts
across designated communities is rarely possible
.

Lastly, the two problem statements above indicate that the actual root cause is

a

poor
interoperability among designated communities.
For that reason
, lacking of common
knowledge
among

underlying community
knowledge becomes the
fundamental

challenge
of
understanding

concepts
.

1.3

Objective
s



To develop a theory for digital archive

at
a
logical level.



To design an information model representing contextual knowledge
.



To develop a prototype system in order to
prove

the theory
.






5

SHAMAN: Sustaining Heritage Access through Multivalent Archiving develops a digital preservation
framework for analyzing, ingesting, managing, accessing and reusing information objects and data acr
oss
li
braries, archives

(SHAMAN, 2008)
.

6

E
uropeana
:
a platform for knowledge exchange between librarians, curators, archivists and
museums
(Europeana, 2008)
.

8


1.4

Scope of study



Develop

a theory by extending the existing theory “Steps towards a theory of
information preservation”
(Flouris & Meghini, 2007)
.



Design “Context
ual

Model” of

“Underlying Common Community Knowledge”

and “Underlying Community Knowledge”

using
linked metadata to model
contextual knowledge

as a knowledge evolution.



Build an archival system which
can

preserve digital resources along with
community knowledge evolution, provide context of concept for interp
retation, and
establish

a

link to other digital resources across digital archives. In this thesis, the
system is developed for linking evolution
biological data

a
mong other
repositories.



9


CHAPTER 2

LITERATURE REVIEW

2

Literature Review

2.1

Digital
p
reservation

Digital preservation is a management process of digital information for ensuring that it
is

accessible and readable over the time. In practical, digital preservation categorizes digital
objects in two types, which are digitized object and born
-
digital obje
ct. The digitized
object is a digital file which is scanned from physical document, whereas a born
-
digital
object is created by computer software initially. Many or
ganizations, such as, academic
i
nstitute, government, company, news provider, and etc. move
an eye on digital
preservation because the
y

need to keep important documents and media permanently.
In
practical, they

use

storage technology to

preserv
e

their digital

resources
. Technically, a
digital object contains a sequence of voltage signal called bi
t, which is saved in electronic
media, such as, magnetic tape, CD, and DVD. It means that the bit stream
s

can be
preserved as
long as longevity of the media, at that moment, it needs to
transfer the data

with new technology again.
Therefore, preservation d
igital object becomes a continuous
process rather than one
-
time activity.
(Yuan & Banach, 2011)

Ideally,

a
preservation process is necessary
to
consider by three levels
.

(Flouris &
Meghini, 2007)

Firstly,
bit preservation
,

the process ensure
s

that bit stream of a digital object is
originally

in the electronic media
.
In practical,

it always does

data migration
by refreshing

electronic media
.

Secondly,
data preservation
, this level is to ensure that the digital object is rendered
authentically. The process
aims

to construct bit stream to be a file, how to access the file,
and how to render the file.
Currently, it has two
solution
s;
the first
one

is using
an
emulator to r
ender the

original

file
, and
another

one is reforming the
primary

bit stream

for
newer

software.

To
preserve

bit

and data, people involv
ing

with
this

activity

have to

concern and make a

plan for

common issues, such as,

preservation policy, preservation strategy, content
policy, and right
s

and agreement.

Preservation policy is focusing on
a
file format and sustainability. A file format
has

to be
a
well
-
known format and trend
s

to use in long
-
term, such as, AVI, JPEG, MPE
G, TIFF, and
etc. Furthermore, sustainability is to define preservation duration; short
-
term, medium
-
term, and long
-
term. Short
-
term preservation
guaran
tees

accessibility of digital objects
within short period
before

technology

is changed
. Medium
-
term pres
ervation guarantee
s

accessibility
beyond

changing of technology, whereas long
-
term preservation plans for
accessibility of digital object indefinitely.

10


As well as preservation policy, preservation strategy
make
s

s
ure

that digital object is going
to be ali
ve and
accessible

in the future. Therefore, it needs to have

some

plans for
electronic media refreshment, backup system, secure storage, disaster management,
migration, and emulation.

Moreover
,

the migration process generally has

to
copy or remodel

an inte
llectual property
.
Hence
, it needs to

be

concern
ed

about right
s

and agreement from intellectual owner or
custody

as well
.

Lastly,
information preservation
becomes a challenge idea of

the

digital preservation. It
mainly focuses on

how to

preserv
e

an
interpretation

of digital information over time.
As a
result, a

consumer in the future
can
understand

original
digital information

in the same
way

as
what an
information provider
does
.

The information preservation is focusing on how to
preserve contextual knowledge of a
concept. The term

“Concept” can be view
ed

as a unit of thought which is subjective
(SKOS, 2009)
. The notion of the concept can present the conceptual or intellectual
structure of an organiz
ational knowledge. Moreover, it can refer to specific ideas or
meaning established within
a knowledge organization system
.

Therefore, a preservation of
contextual knowledge of a concept can help a reader to understand

original

information in
a digital arch
ive
.

2.2

Species and taxonomy

evolution

More than 1.4 million species
throughout

the world are truly distinguished and known by
methodical classification

(taxonomy)

and
appropriate
naming
in order to organize

knowledge of
organism
.
A

principle of taxonomy is to categorize all
individual specimens

by their features and behaviors

(Winston, 1999)
.
Because of

a

development

of science and
variety

methods of classification,

some biologists may record

the
same

creature

with
different name
s

and classification
s
.

Thus, k
nowledge of species and taxonomy becomes
lacking of a single

interpretation in

biologica
l reality across institutes and over time
(Mallet, 2007)
.

Under the same

kind of grouping at

low level and high level of taxonomy
, s
pecies always
shares similar physical characteristics, such as, DNA sequence, organs, living behavior,
and etc.
(Darwin, 1859)

(Winston, 1999)
.

Thus, the biologist can estimate some significant
details of one species from existing knowledge of a well
-
known species which are in the
same hierarchy.
Although there are many publications describing species nowadays, a
variety names and classifica
tions of the same species becomes barrier to closer relation
between knowledge of species
.

Consequently
, naming concept and taxonomic concept
becomes key players of interpretation of biological information
(Ytow, Morse, & Roberts,
20
01)
.


However, biologic
al

naming concepts and taxonomic concepts always change because
of
enhancement

of biologic
al

science and

a difference

of

taxonomic methods.
In case of
improvement of biological science, w
hen a taxonomist explores more informatio
n about a
11


creature, the taxonomist may give a new name and/or reclassify it. For example,
a snowy
owl named “
Nyctea scandiac
us

was
changed to “
Bubo scandiacus
” because a higher rank

(genus) named “Nyctea” was merged to genus “Bubo”
.

Then,

the taxonomist reclassified
that snowy owl to genus “Bubo”

(Wink & Heidrich, 1999)
. As a result,

the name of the
snowy owl was changed according to naming conversion of species

(ICZN, 1997)
.


Apar
t from enhancement of science, a multiplicity

of taxonomic methods mak
es

a creature
to
be interpreted dissimilarly
,

b
ecause

each institute might

record a species concep
t

individually

without any references

from

each other
.

Thus, some specimens may be
rec
ord
ed

repeatedly
by
distinctive
institutes with different names and classifications.

For
instance
,
Roelofs
recorded a bug named “
Auletes fumigatus
” under genus “
Auletes
”, but
Voss recorded the same bug with
the
name “
Auletobius fumigatu
” under genus

Auletobius”
.

These examples demonstrate that
the same specimen
is

examined by different
methods
and taxonomies
,

in
which

it

is
depend
s

on perspectives of biologists
.
Hence
,

it becomes
unclear
how to interpret biological
data
correctly

(Mallet, 2007)
.

Name and taxonomy
concepts of species sometimes indicate knowledge according to naming conversion and
classification methodology. Thus
,
the
chang
es

of biological data, such as,
name
replacement, synony
my, homonymy,
misspelling,
merger
, splitt
er
, reclassification,

promotion, demotion
,

and etc.
(Franz & Peet, 2008)

become

a

misinterpretation of
the
data
.

In order to
understand

original
biologic
al

data
, it needs contextual knowledge, such as,
name, change of name, taxonomy, and change of taxonomy
together

with the data.
Moreover,

to explore more knowledge,

the contextual knowledge help
s

a consumer to
retrieve more knowledge by linking
some

relevant biol
ogical concepts
as well
.
To
preserv
e

interpretation of biological concepts becomes a new challenge of
existing models
of digital archive
.
Therefore, this thesis considers applying an approach a logical model of
digital archives with evolutions of biologica
l concepts.

2.3

Reference and guideline of digital archives

The
challenge

of preserving digital
resource

for the future

is focusing on

the longer
term
.

A proposed solution

must be

common to librarie
s, archives and museums and fit

the need for strategies and so
lutions
,

has become

increasingly urgent
.
Certainly
, every
digital archive should
implement

some common guidelines or standards

in order to
make

every system work together
. Fortunately,
there are many capable digital
archive

standards

including reference

guidelines, metadata standards. This thesis follows the

reference guideline named OAIS,
metadata standard
s

named PREMIS

and METS
, and
interoperability protocol named OAI
-
PMH
which are acknowledged by the first
-
class
institutes, such as, ISO,
the Library of Congress
, and Open Archived
.

12


2.3.1

OAIS

(
Open Archival Information
System
)

OAIS


Open Archival Information System,
an ISO standard

being

a

reference guideline
of
an
archival information system discussed, purpos
ed, and recommend by RLG (Research
Group Library) and OCLC (Online Computer Library Center) since 2003

(CCSDS, 2003)
.
The main objective of OAIS is to model a system for archival information which is
represented in
the
digital format, for long
-
term preservation. OAIS benefits to three key
players,
a
provider who submits digital information to the system,
a
consumer who queries
and retrieves digital information from the system, and

a

management who manages and
makes prese
rvation strategies for the system. Significantly, the
archival
digital object in
the system has to be formed in a package which is necessary for important functions of
OAIS
,

which
are to ingest, store, and access. In order to build
a

system, OAIS sugg
ests
designing and implementing

the

information model as well as

the

function model.

Firstly, information model of the OAIS is formed in
three

types of package,
Submission
Information Package
(SIP) which is entered to the system,
Archival Information
Package

(AIP) which is preserved in the system, and
Dissemination Information Package

(DIP)
which is distributed from the system. Each package type is based on the same concept of
the
information package. The package model consists of four elements, Conten
t
Information,
Preservation Description Information
(PDI), Packaging Information, and
Descriptive Information, which illustrate in the figure.


Figure
1
: Information Package Concept

and Relationships

Content information contains
a digital resource that needs to be preserved, such as, text,
image, video, audio, and etc. In addition, content information contains

metadata of

Represent Information

(RI) which
present
s

necessary

technical

information
,

such as

a

digital encoder, in order to interpret
the
bit stream to
readable

digital data. A responsibility
of RI can be
extend
ed

to be representation information of context knowledge relevan
t

to
the digital information. Apart from content information, P
DI contains preservation
metadata that informs humans or machines to
prepare environment

for

access
ing
,
render
ing

or other actions to the digital resource. Thus, PDI contains specific feature of
digital preservation, such as, content,
fixity of species
, reference, provenance, and context.
After that, Packaging Information enwraps both Content Information and PDI to store as
one object. The last
package is Descriptive Information.
Due to the fact that low
13


performance occurred when searching direct
ly from packaging information
, Descriptive
Information presents metadata containing

a

full set of attribute
s

must be

searchable and
perform
ed

indexing to improve performance of searching ability.


Secondly, function model of OAIS consists of six key functions

as

follow
s
:



Ingest

is to accept SIP package from provider, verify SIP package, and generate
AIP package for archive storage.



Archival storage

is to store and preserve data to ensure that data

can

still be
accessible from constrain of media and security, and provide capabilities for
disaster recovery.



Data Management

is focusing on administration tasks of database management.



Ad
ministration

is responsible for system tasks, such as, negotiate submission
agreement with producer, audit submission to ensure that it meets standard,
maintain configuration of system hardware and software, and keep an eye on day
-
to
-
day governance of the
other OAIS functional entities. The functional entities,
information package, and users
are
presented in
Figure 2: OAIS Functional
Entities
.



Preservation Planning
is to monitor environment of OAIS such as an obsolete

hardware and

software, recommend

a

user to do preservation activities regarding

a

result from preservation monitoring, develop preservation strategies and standards
for future change of technology, and customize package template in order to serve

the

migration goal.



A
ccess
is to provide

a

single user interface that consumers browse, search, and
access. The function also generates DIP from AIP and responses the consumer

request
s
.

14



Figure
2
: OAIS Functional Entities

2.3.2

PREMIS

(
Preservation Metadata: Implementation
Strategies
)

PREMIS

(PREMIS, 2004)


Preservation Metadata: Implementation Strategies is a
working group
which

defines metadata for digital preservation. The group is working
under Library of Congress (LOC)
7

located in United States of America. PREMIS is
proposing preservation metadata as Data Dictionary represented in XML schema
8

to
support implementation of the
Data Dictionary in digital archival systems. The Data
Dictionary is designed on top of

the

information model of OAIS
,

which specifies on
content,
fixity of species
, reference, provenance, and context. The PREMIS Data
Dictionary and its schema consist

of entities and semantic units. Entities are thing
s

that are
considered
as
important keys in the context of digital preservation; intellectual entities,
objects, events, agents, and right
s
. Semantic units are properties of each entity that
archival sy
stem need
s

to know and should be able to distribute to other systems. The



7

R
esearch library of the United States

Congress

8

XML schema is a description of a type of XML document, typically expressed in terms of constraints on
the structure and content of documents of that type, above and beyond the basic syntactical constraints
imposed by XML itself.

15


PREMIS data model is shown
in

the

figure below.


Figure
3
: The PREMIS Data Model

Intellectual Entities
, a content that can be described as a unit, such
as, books, articles,
pictures, maps, videos, and etc. However, this entity is not fully described in PREMIS
Data Dictionary because it can

be

use
d

by
ano
ther metadata standard such as Dublin

Core
9
.

Objects
, concrete units of intellectual entities are in
the
digital form such as
“SemanticWebProgramming.pdf”. The objects also have three type
s
:

file, representation,
and bit stream. File, a construction of bit
s

is rendered
individually

by itself
.
Representation, a collection
of

many files such as
a
web
page that contain
s

HTML files,
image files, JavaScript files, and CSS files. Bit stream, sequences of bit
s

that
are not
rendered

separately
,
they need to work with other bit streams, such as, an audio stream
in a
movie. Semantic units of object entities are unique identifier, fixity information, size,
format, original name, creation, inhibitors, software and hardware environment, path to
object, digital signature, relationship, and etc.

Agents
, people, organiz
ation, or software involve
s

with digital objects. Normally, Data
Dictionaries of agent
s

are unique identifier, name, and type.
However,
i
t can

also

use
another metadata standard for agent, for example, FOAF, vCARD, eduPerson, and etc.

Events
, activities
done

by agents to digital objects in order to do preservation actions, for
example, ingestion, migration, audition, and etc. The result or outcome from
the
event
must be recorded as well. The event entity contains semantic units

such as,

uni
que
identifier, type, data and time, description, outcome, agents and their roles, objects and
their roles, and etc.

Right
s
, assertion of rights and permission are directly relevant to
preserv
e

objects and
agents in
a
repository. The right is to

express by statement pattern “Someone (agent)
grants somehow (permission) to the repository in regarding something (object)
.
” For
example, “John Hebeler grants AIT digital library permission to make 10 copies of
SemanticWebProgramming.pdf for preservation

purposes
.”

Therefore, right entity needs



9

Dublin Core:
A set of vocabulary includes properties for use in resource description

(Dublin Core)
.

16


properties

such as, unique identifier, basis for claiming, description, rights statement,
restrictions,
term
s

of grant, time period, objects, and
agents
.

Intellectual
Entities

A

content that can be described
as a unit, such as, books, articles, pictures,
maps, videos, and etc. However, this entity is not fully described in
PREMIS Data Dictionary because it can

be

use
d

by
ano
ther metadata
standard such as Dublin

Core
10
.

Objects

Concrete units of intellectual entities are in
the
digital form such as
“SemanticWebProgramming.pdf”. The objects also have three type, file,
牥灲pse湴慴楯測⁡湤⁢楴⁳瑲ta洮†



File, a construction of bits is rendered by itself alone.



Representation, a col
lection contains many files such as web page
that contain HTML files, image files, JavaScript files, and CSS
files.



Bit stream, sequences of bits that does not render alone, so they
need to work with other bit streams, such as, an audio stream in a
movie.




Semantic units of object entities are unique identifier, fixity
information, size, format, original name, creation, inhibitors,
software and hardware environment, path to object, digital
signature, relationship, and etc.

Agents

People, organization, or
software involve
s

with digital objects. Normally,
Data Dictionaries of agent are unique identifier, name, and type. It can use
another metadata standard for agent, for example, FO
AF, vCARD,
eduPerson, and etc.

Events

Activities are taken by agents to dig
ital objects in order to do preservation
actions, for example, ingestion, migration, audition, and etc. The result or
outcome from
the
event must be recorded as well. The event entity
contains semantic units, like, unique identifier, type, data and time,
d
escription, outcome, agents and their roles, ob
jects and their roles, and
etc.

Right
s

Assertion of rights and permission are directly relevant to preserving
objects and agents in
a
repository. The right is to express by statement
pattern “Someone (agent) grants somehow (permission) to the repository in
regarding something (object).” For example, “John Hebeler grants AIT
摩d楴a氠 汩扲bry 灥牭楳r楯渠 瑯t 浡步 ㄰N c潰楥猠 of
pe浡湴楣me扐牯rr
a浭楮i⹰摦K 景爠 灲p獥牶r瑩潮 灵牰p獥s
.”

周q牥景feI
物r桴he湴nty nee摳dp牯灥r瑩e猬s獵s栠a猬s畮楱ue 楤敮瑩晩f爬r扡獩s 景爠c污業lngI
摥獣物灴楯測n 物g桴猠獴慴s浥湴Ⱐ牥獴物捴楯湳Ⱐ

瑥t洠潦o gra湴Ⱐ瑩浥m灥物潤r
潢橥o瑳Ⱐ慮搠tge湴献


2.3.3

METS

(
Metadata Encoding & Tra
nsmission
Standard
)

Metadata Encoding & Transmission Standard
(METS)

is a standard
, which is maintained
by the Library of Congress,

for encoding descriptive, administrative, and structural
metadata regarding o
bjects within a digital library.




10

Dublin Core: A set of vocabulary includes properties for use in resource description

(Dublin Core)
.

17


The metadata has seven majors sections
, METS header, Descriptive metadata,
Administrative metadata, File section, Structural map, Structural links, and Behavior. This
thesis use
s

only four sections for wrapping DC and PREM
IS.

1)

METS

Header

contains metadata describing the METS document itself
.

2)

Descriptive metadata

section
encodes describing metadata such as DC.

3)

Administrative metadata

section provides information regarding how the files were
created and stored, intellectual property rights,
and etc. This thesis

uses this section for
enwrapping

the PREMIS metadata

in
event and right
.

4)

File

section

lists all files containing

a locat
ion

of the

digital object

which is described
by object
entity of

PREMIS.

METS metadata is widely accepted by
many well
-
known digital repositories
, e
specially,
the popular digital repositories, such as DSpace
11

and Fedora
-
common
12
.

2.3.4

OAI
-
PMH

(
Open Archives Initiative Protocol for Metadata
Harvesting
)

The Open Archives Initiative Protocol for Metadata Harvesting

(OAI
-
PMH)

is a low
-
barrier mechanism for repository interoperability.

There are two key players;

Data
Providers are repositories that expose structured metadata via OAI
-
PMH

and

Service
Providers then make OAI
-
PMH service requests to harvest that metadata. OAI
-
PMH is a
set of six verbs or services that are invoked within HTT
P.

1)

GetRecord

is a verb that is used to retrieve metadata of one resource from a
repository.

2)

Identify

is a verb that is used to retrieve information about repository.

3)

ListIdentifier

is a verb that is used to retrieve only header of all digital resources
from
a repository.

4)

ListMetadataFormats

is a verb that is used to retrieve the metadata format from a
repository, for example, OAI_DC, METS, and etc.

5)

ListRecords

is a significant verb that is used to retrieve metadata of all digital
resources in a repository. For example,

http://an.oa.org/OAI
-
script?verb=ListRecords
&from=1998
-
01
-
15&metadataPrefix=oai_dc
. The
repository always uses

this verb to
harvest external me
tadata for displaying the metadata in the system.

6)

ListSets

is a verb that is used to retrieve the set structure of a repository.

This thesis uses

the benefit of OAI
-
PMH by wrapping METS metadata in order to
exchange metadata between repositories.




11

DSpace

is an open source institutional repository application (
http://www.dspace.org/
).

12

Fedora
-
common is a
digital asset management (DAM)
system for digital preservation (
http://fedora
-
commons.org/
).

18


2.4

Underlyin
g Community Knowledge

According to OAIS and PREMIS, these two approaches
can

fit some

requirement
s

from

physical level of

bit preservation and data preservation. Unlike information preservation,
this preservation type is often ignored by existing preservat
ion approaches.

This approach, named “Steps towards a theory of information preservation”

(Flouris &
Meghini, 2007)
, come
s

up with a formal, mathematical, and logic
-
based description of
information preservation as a scientific discipline. The theory is focus on how people who
have differen
t

background knowledge
interpret

the same data similarly. Th
is

theory is
going beyond l
anguage translation. For example, content creator

who is American put
information about the “First floor” in a document
. T
he creator
knew
the concept

of

“First
floor” is on the ground level because of his background knowledge. However, if anoth
er
person
who has
different background knowledge
, for example a British, read
s

the content,
he may understand that the “First floor” is a one stor
y’s height above ground.
This issue is
caused
by

provider and
consumer are not in the same
Designated Community

(DC). If the
consumer needs to
interpret
the original
concept
,
it

needs to refer to knowledge of DC of
the creator as well. DC is a particular group of people who share
s

some common
contextual knowledge, for instance, language, background, common

sense, and etc.
T
his
collective of

knowledge in DC is called
Underlying Community Knowledge

(UCK).

Therefore, the digital object provider has to combine own UCK together with

the

digital
object
. As well as an

archival system has to provide more contextual knowledge to
a
consumer by
offering

provider

UCK and consumer

UCK.
As a result
, the consumer
can
understand

the same concept
.

2.5

Related works

2.5.1

CASPAR

(
Cultural
, Ar
tistic and Scientific knowledge for Preservation,
Access and Retrieval
)

CASPAR

(CASPAR, 2005)



Cultural, Artistic and Scientific knowledge for Preservation,
Access and Retrieval


is an Integrated Project co
-
financed by the
European Union within
the Sixth Framework Program
.
This

significant work is to develop

an

approach of
preservation
with an
understanding of digital resource
s

across different user communities.
It uses semantic technology to identify
a
gap between use
r communities and provide more
information

which is not known by the consumer.
Knowledge Manager
is responsible for
this

feature
; a component supports knowledge management services for digital
preservation based on semantic technology in order to support
the
change of knowledge in
Designated Community. Generally, CASPAR supports knowledge preservation on
Representation Information
of
an
information model
.

For instance
,

in the figure

below
, user
u1

has

a

profile
p1
, and user
u2

has

a

profile
p2
.
The profile
p1

understands Represent Information
m1

(and also
m2
,
m3
, and
m4

semantically), and the profile
p2

understands
m2

(and also
m3

semantically). In addition,
19


digital object
o1

is required
m1

(and also
m2
,
m3
,
m4

semantically) to understand, and
o2

is required
m3

and
m4

to understand.


Figure
4
: Modeling Users, Profiles, Modules and Dependencies

In consequence,

the

user
u1

can

understand digital object
o1

and
o2

correctly
. On the other
hand, if

the
user
u2

needs to understand digital object
o1
, the system should provide
knowledge about
m1

and m4 to
u2
. In case of user
u2

understanding
o2
, the system should
provide knowledge about
m4
.

2.5.2

SHAMAN

(
Sustaining Heritage Access through Multivalent
ArcriviNg
)

SHAMAN

(SHAMAN, 2008)



Sustaining Heritage Access through Multivalent
ArcriviNg


is an Integrated Project co
-
financed by the European Union within the
Seventh Framework Program
. An objective of th
is

project is to preserve communication
for

all
of digital objects

activities

in the future.
It also

includes the
supporting

preservation
components for analyzing, ingesting, managing, accessing, and reusing digital information
across libraries, archives,
and

other repositories. The SHAMAN theory of Preservation
focuses on

the

ability to maintain context arrangement and man
agement of digital
resources
in

the preservation environment itself. The preservation environment comes up
with context information during document process
including

user
-
specific or domain
-
specific knowledge
which is

associated with a
document. The framework develops
contextual knowledge of document process representing in form of RDF
-
based ontology
extending from PREMIS metadata. The context consists of things being relevant to digital
objects, such as, persons, abstracts, sessions, t
opics, affiliations, person role, contributor
role, organizer role, reviewer role, document processes, result from each process,
document version
, and etc. The benefits

of SHAMAN
are

semantic integration and long
-
term access by capturing
context during
document process, interpretation of preserved
digital information, managing control
-
flow, and maintaining hist
orical change of

the

object
.

20


2.5.3

Europeana

Europeana is an internet portal that perform
s

as an interface to many of intellectual
properties, su
ch as, books, paintings, films, museum objects and archival records
(Europeana, 2008)
.

The main function

of Europeana
is to

provide accessibility to
variety

types of million
contents from different types of cultural heritage o
rganization.
The different types of
cultural heritage organizations, such as, libraries, museums, and archives collect digital
objects in different standards. The Europeana provides a common metadata
standard
called Europeana Se
mantic Elements for mapping
different types of metadata in order to
make all information
searchable.

The model is expressed in RDF format
and
being integration for collection, connecting
and enriching the descriptions provided by Europeana content providers. The
re

are many
elements including classes and properties

and become

a standard metadata for content
providers
join
ing

Europeana information space. Most of elements are provided by
Europeana, whereas some of them are reused from

the

fol
lowing namespaces:



Resource Description Framework (RDF)
.




RDF Schema (RDFS)
.



OAI Object Reu
se and Exchange (ORE).



Simple Knowledge Organ
ization System (SKOS).



Dublin Core namespaces for elements
(DCTERMS and DCITYPE).

The strongest point of Europeana is to

provide a metadata standard in order to access
million digital resources from many cultural heritage organizations
in
one single portal.

2.6

Semantic technologies

2.6.1

Semantic web

Semantic Web is
the

next generation of current web technology. The content

of web
page
is giving well
-
defined meaning
,

which
can

be
understood by human or computer

(Berners
-
Lee, Hendler, & Lassila, 2001)
. The semantic web provides a common framework that
allows data to be able to share and reuse acro
ss communities such as application and
organization. The concept is proposed by Tim Berner
-
Lee who uses Mathematics
theory

to
model “things” and “relationship between things” representing in RDF language.
Information provider
can

share human
-
readable infor
mation in form of RDF language in
order to make computer understand web contents and support human to find and infer web
contents accurately.

2.6.2

RDF (
Resource Description
Framework
)

RDF

(W3C, 2004)



Resource Description Framework


is a standard model for data
interchange on the web. The structure of RDF

is based upon the idea of forming a
21


statement

with

three expressions
;

relation of subject, predi
cate (property or verb), and
object. Technically, it forms
a

statement

as a graph view.
RDF recommends identifying
everything as a URI (Uniform Resource Identifier). However, an object element
can be

identif
ied

by literature as a value.


Figure
5
: A RDF Graph

Every natural
statement

is able to represent in RDF language
. For example,
“Rathachai

studies in AIT” and “AIT stands for Asian Institute of Technology”
can be translated

in
to
RDF
as
follow
s
:

:rathachai

:study


:ait .

:ait


:stand_for

“Asian Institute of Technology” .

Sequentially
,
the RDF statements

are forme
d

as a graph
:


Figure
6
: Example of RDF graph for two continuous sentences

2.6.3

Spatio
-
Temporal RDF

Spatio
-
Temporal RDF is a spatial and temporal RDF data.
In general, the spatio
-
temporal RDF model extends the normal RDF by
integrating

the triple with
relationship

of space and time

between named entities

(Buraga & Ciobanu, 2002)
.
The
research

expressed spatio as a spatial data expressed in RDF

format
. Furthermore, t
he
most important of this topic
is about

the complexity of

temporal RDF

data.


Grandi

defined

notion of

time in temporal RDF by using
an
ordered set of time point.
Thus, each time interval is a
n

ordered pair which is a subset

of
the
universe of time

(Grandi, 2011)
.

He also expressed temporal RDF by

a

set of triple with time interval
as follow
s
.





{
(





|


)
|




}

The notion
RDF
TDB

is a set of temporal triple. The notion
s
,
p
, and
o

are a subject, a
predicate, and an object of a triple, whereas
T
i

is a time interval which is a subset of the
universe of time
T
.

Furthermore,
Guierrez

described temporal RDF model similar to the
model of Grandi

as well
. Gu
t
ierrez also defined
entailment
of temporal RDF graph and
22


extended the SPARQL in order to access temporal RDF

(Gutierrez, Hurtado, & Vaisman,
2005)
. For example, a query asking for “Students applying for jobs at time
t
after finishing
their Ph.D. program in
no more than 4 years”

can be:

(






)



(








)
























The notion
t
i

,
t
f

, and

t

denote
begin time, end time, and current time.

This thesis gains benefit of the temporal RDF approach in order to record change of
triple. Moreover, the thesis applies the approach of spatio by using a set of resource
s

for representing designated communities.

2.6.4

Ontology

Ontology is
a
systematic structur
ing of knowledge
,

which is to view the knowledge as a
set of concepts within a domain and their relationships to describe and represent an area of
concern
s
. It could be viewed as a vocabulary with constraints, relationships, and rules.
Ontology includes a
number of different components.

Individuals
,

also called instances or objects
,

which are the basis component
s

of
ontology
.
They are used to describe things of inter
est such as people, animals,
plants
, or

etc.

Concepts
, also called classes collections,
types of objects, or kinds of things
,

which are
the core component
s

of most ontology. One Concept (or Class) may be a sub
-
concept (or
sub
-
class) of another Concept (or another Class). Any member of the sub
-
class will be
a
member of the parent classes. More
over, sub
-
class will inherit the properties
within

the
parent class. For example, the class Dog might have sub
-
classes called Chihuahua, or
Golden Retriever, so the member of class Chihuahua and Golden Retriever will be
a
member of the class Dog.
I
n addition
, the class Chihuahua and class Golden Retriever
would inherit the property of the class Dog such as bark(), fur_color(), or species() etc.

Attributes
, also called aspects, properties, features, characteristics, or parameters
,

which
belong to cla
ss and object. The value of attribute can be either simple data type such as
string, date,
and
Boolean
or complex data type such as

a

list of value. For example
,

a

complex data type,

an

attribute gender of
the
dog can only be one of the lists
{male,
female}.

Relations

describe the way in which individual relate
s

to each other

between super
-
class
and sub
-
class
,

which allow

to

build

hierarchical structure
of an individual


to its parent
class
.

2.6.5

Linked data

Linked data is to connect linked data (graph) across the web by linking things
that

have
semantic concept together. In practice, Tim Berners
-
Lee recommends
four

principles in
order to link data around
the world.

23


1.

Use URIs as names for things.

2.

Use HTTP URIs so that people can look up those names.

3.

When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL).

4.

Include links to other URIs, so that they can discover more things.


The origins of the web
data were resulting from efforts of the Semantic Web research
community, as well as the W3C linking Open Data (LOD) project which

was

founded in
January 2007. Currently
,

there are
numerous

number
s

of data sets published on the web as
linked data
shown

in the figure

below
.



Figure
7
: Linked Open Data Cloud Diagram (as September 2010)


2.7

Third
-
party tools

2.7.1

Alfresco

Alfresco is an

open

source for
E
nterprise
C
ontent
M
anagement

(ECM)
.
It offers full
functionality for documentation, for example, document management, collaboration,
records management,
and
knowledge management

(Alfresco)
.

Alfresco document management provides features for
organizations with all services
for creating, converting, managing, and sharing digital resource
s
. It also offers
version management, search capabilities, and association between digital resources.

Alfresco record management offers a secure, auditable envi
ronment for manipulat
ing
record
s

of digital resource.

Collaboration provides user interface for users who want to discuss the content using
user forum and comment.

24


Knowledge management (KM) refers to a range of practices used by organizations to
identify,
build, represent, and share knowledge for learning across the organization.

Alfresco can be enhanced to be digital archive by identify
ing

preservation metadata
for digital resource
s

such as PREMIS because
Alfresco
allow
s

administrator to
customize metadata structure for digital resources in each document space. In
addition, it can do preservation activities by modifying a document in order to render
it in the present computer environment.

The project named “New York Philh
armonic
Digital Archives” is a case study of using Alfresco as a digital archive. The project is
to preserve American culture heritage
(New York Philharmonic)
. This case indicates
that Alfresco can support terabyte of data and

is

a capable

high performance search
engine.

In addition, Alfresco provides Content Management Interoperability Service (CMIS)
including Web Service and ReSTful that can be used by other applications to work
with contents in Alfresco.

2.7.2

Sesame

Sesame is a de
-
facto standard framework for processing RDF data. The
Sesame
system is a
w
eb
-
based architecture that provides useful services such as, parse, store,
infer, and query RDF data.
It

offers SOAP API that can access RDF reposi
tory via the
Internet

(Sesame)
.


Figure
8
:
Sesame

A
rchitecture

25


The persistent storage of RDF data repository

is designed for maintainability and
scalability, so it separates its
functionality
by modules.

Admin

Module

allows user to manipulate RDF data and schema by adding, updating
and deleting.

Query Module

provides SPARQL end
-
point service.

Export Module

allows user to export all triples from the repository.

Repository Abstr
action Layer

(RAL) provides an API that offers functionality to
load data to or
retrieve or remove data from the repository.

Lastly, the
Repository

is RDF storage. By default, the Sesame provides Native
Sesame for storing RDF. However, it can use another
triple
store
instead of Native
Sesame.

2.7.3

Jena

Jena is a Java framework for developing
semantic w
eb application.
Jena offers
Java
libraries for helping developer to build semantic web, linked
-
data applications, and
servers.

The Jena

libraries provide the following features

(Jena)
.



API for processing RDF from an RDF repository in XML, N
-
Triples, and
Turtle
format.



Inference RDF data with ontologies such as RDFS and OWL.



Rule
-
based inference engine for reasoning RDF with Jena rules. The Jena rule
has a simple structure similar to if
-
then statement.
It

has very useful built
-
in
functions, for exampl
e, regular expression, making a new instance,
Mathematics function, and etc.



Connection between application and RDF repository for extract
ing

data from
and
writing

to RDF graphs
,

which are represented as a “model”.

Furthermore
, Jena provides easy
-
to
-
use functions for inference
with
many well
-
known
schemas, such as, DC, FOAF, SKOS, and etc. Therefore,
Jena is widely used within the
semantic
w
eb de
velopment and research communities
(McBride, 2002)
.



26


CHAPTER
3

METHODOLOGY

3

Methodology

This chapter describes idea and methodology to develop a theory of a logical model of
digital archives.

3.1

Overview

This thesis aims to propose a theory of a logical model of digital archives. The main idea
is to extend the theory of “Steps towards a theory of information preservation”
(Flouris &
Meghini, 2007)

by propos
ing

underlying

common

community knowledge (UCCK) which
is

a reference of

all underlying community knowledge
(
UCK
)
. Thus,
this thesis
will
present

the method which makes
all UCK be associated

with

each other through UCCK
,

in
order to establish

a

relation
between

relevant digita
l resources across digital archives.

This chapter is to describe by two main parts. Firstly, it introduces a logical model of
digital archives
,

which is
re
presented by
ontology
. The model
becomes a template

of UCK
and UCCK
for

captur
ing

contextual knowled
ge evolution
,

which is extending ontology of
an
event
(Gustman, et al., 2002)

(Hayes, et al., 2005)
. Secondly, it
presents

work
flow and
rule

to interpret concepts under UCK and link concepts across UC
K via UCCK.

3.2

Overview of Underlying Common Community Knowledge

Regarding the
ideas towards a theory of digital preservation

(Flouris & Meghini, 2007)
,
the UCK is presented by ontology for each designated community. The theory also
introduces an idea to
capture changes between UCK by using technique of ontology
evolution. A consumer can interpret a concept across designated community by realizing
contex
tual knowledge of source community

comparing with consumer’
s one
. In practical,

a

designated community archives UCK by own scheme which may not com
patible with
other communities
. Hence,

exchanging data and contextual knowledge between different
schemes
is
rarely possible because of lacking

of common
reference knowledge
(Zeng &
Chan, 2004)
.

Therefore, this thesis is going to extend the theory of Flouris by introducing a reference
contextual knowledge called “Underlying Common Com
munity Knowledge” or “UCCK”.
The UCCK becomes a key player who coordinates all
UCK, so they can exchange and
associate each other as well
.

For example,
an institute C1 did research and recorded information about
a

species

named


Nyctea scandiac
us


under UCK1
, while an
institute
C2 did another research an
d archive
information with name


Bubo

scandiac
us


under UCK2
.
When a

consumer from C1
access some document from C2, the consumer may not

have proper background
knowledge to

interpret “
Bubo scandiac
u
s
” correctly.

In fact
, although the
Nyctea
scandiac
us

and
Bubo

scandiac
us

is the same snowy owl, the consumer may not associate
27


these concepts together
, because

the both concepts
is recorded with different name and
context.

In this case, UCCK becomes an im
portant role which recorded association between name
concept
Nyctea scandiac
us

and
Bubo

scandiacus
. Moreover, the
archive

also
gives the
reason of this change.

For example,

the association

between these two species

is caused by
merging between genus Nyctea

and Bubo in 1999
(Wink & Heidrich, 1999)
.

Regarding

this example and some suggestion about common knowledge in distributed
system
(Halpern & Moses, 1990)
,
the common knowledge should
correspond
to a fact
being “publicly known


and provide

relationship

between conceptual knowledge

across

variety of
UCK.

Thus, the behaviors of UCCK are:



To be described in the common language
.



To be a reference for all
UCK

as a common knowledge
.




To present contextu
al knowledge which every community globally need to know.



To
record an evolution of contextual knowledge
which every

communit
y needs to
realize
.



To identify associations
between
UCK.



To capture change or evaluation of
UCCK

itself
.



To be able to link concep
ts
across

designated communit
ies

by means of
common
contextual knowledge

3.3

Evolution of Contextual Knowledge

Regarding Fouris (2007), ability to understand the archival information
can

interpret
a
term concept which is placing in correct context. In general, contextual knowledge
preservation
always captures knowledge evolution with in specific
time, and space.

In
practical, the contextual knowledge always records information surrounding a concept,
s
uch as, attribute, part
-
whole, classification, and membership
(Stevens, Goble, &
Bechhofer, 2000)
.

In consequence
, this thesis defines an evolution of contextual
knowledge as ontology
(Yildiz, 2006)
.

In addition, the model must be feasible to deal with
differentiates of other repositories
(Ferrandina, Meyer, Zicari, Ferran, & Madec, 1995)
.

Therefore, the evolution of
contextual

knowledge is described by a relationship betwe
en
knowledge evolution, time, and community (space). In particular, there are two main
categories of knowledge evolution. Firstly,
a
concept evolution is a change of concept,
such as, replacement, merger, and splitter
(Franz & Peet,

2008)
.
Secondly, an evolution of
relation between concepts is a change of a value of
a statement
, such as, property, and
classification
(Stevens, Goble, & Bechhofer, 2000)
.

From now on, this theory

is

going

to describe a
role of time, community, concept
evolution, and relation evolution. Moreover, it is going to explain how to state a contextual
knowledge by specific time and community and link some relevant concepts.

28


3.3.1

Time

This thesis reuses
some idea from
the

approach of temporal RDF
(Grandi, 2011)
. The
approach
presents

a universe of

time as
the union

of

Cartesian product of time interval for
multi
-
temporal
purpose
.

The model in this thesis requires uni
-
temporal, so it needs to
re
define the universe of time, as follows:


















[







)









Term

Definition

T

A
universe of time is represented by set of ordered time points.

Ti

A time interval
which has

endpoint





and



.





A begin time of a time
interval





An end time of a time interval

3.3.2

Designated
Community

This thesis defines a notion of d
esignated c
ommunity
to be a name

element and lets
universe set





be the set of all designated communities
. For example:






{

















}


This thesis
usually

gives



as a sub set of



Term

Definition




䄠畮楶A牳r映
de獩sna瑥搠
c潭o畮楴楥献




䄠獥琠潦t
摥獩g湡瑥搠
c潭o畮楴楥献




䅮⁥汥浥湴映
摥獩gna瑥搠
c潭o畮楴y.


3.3.3

Concept evolution

A concept evolution is a
model which record event of

of replacing, splitting, and merging
of concept

(Yildiz, 2006)
. In general, evolution of concept is a relation between concept
before change an
d concept after change.





{
(






|





)
|























}

This notion inform that the

concept evolution

(


)

is a relation
ship

between set
of

old
concepts

(


) and

set of new concepts

(



) which
is

truth in a time interval (


) and some
communities (


).

Term

Definition




䄠獥琠潦⁣潮oe灴⁥癯汵瑩潮
s

睨楣栠楳⁤ee灥ne搠潮d瑩浥⁡湤⁣潭o畮楴y





獥琠潦t
c潮oe灴
s

扥景牥 c桡nge.





A

set of

concept
s

after change.




畮楶敲獥映c潮oe灴献

29


For example, a genus Nyctea has been merged to Bubo since 1999. This knowledge is
shared by a
designated
community
named

Yale University. It can be expressed by:

(
{



}

{

}
|
[



)

{



}
)







In addition, it can define function mapping for the three operations: splitting, merging, and
replacing. The function mapping is expressed by


(
delta
), for example,



for replacing,



for splitting,

and




for merging
.

Replacing is an operatio
n which takes the place of one concept with another concept. Both
domain and range of replacing is a power set of concept containing one element.







(

)




(

)

Splitting is

an operation which

replace
s

one concept with two or more concepts.


The
domain

of the splitting is a set of one element expressed by
a power set of concept

containing one element, whereas its range is a power set of concept containing two or
more elements.







(

)




(

)

Merging is
an operation which
replace
s

two or more concept
s with one concept.

The
domain of the merging is a power set of concept containing two or more elements,
whereas its range is a power set of concept containing one element.







(

)




(

)

Regarding the example of merging between genus Bubo and Nyctea,
It can express the
function mapping as follows:



(
{



}
)

{

}

Finally, the thesis defines the delta function as a union of result of all functions. This
definition is going to be linked concept which is described in the
section
about linking
related concepts.


(


)




(


)



(


)



(


)

3.3.4

Relation evolution

Regarding semantic web

(Berners
-
Lee, Hendler, & Lassila, 2001)
, a relation of resources
is basically constructed from triple


subject, predi
cate, and object.
The p
redicate is
sometimes called

property


or

function


which maps from one resource (subject) to
another resource (object).

Evolution of relation

in practical

can be

classification relation,

part
-
whole relation,
membership relation,
or attribution relation

(Stevens, Goble, & Bechhofer, 2000)
. Each
type of relation is
identified

by
a

predicate, such as,
higher
Class
,
partOf, memberOf, color,
30


and etc.
However
, predicate
s

are

not limited by a few controlled vo
cabularies. A

designated

community should be freedom to choose whatever predicates required by own
repository
.
This thesis designs this

model
to be a generic one

in order to support whatever
characteristic of statements
.

This approach
is
model
ing

a change of relation by
altering

a
value of a

triple

statement
.





{
(








|





)
|






























}

This notion inform that the relation evolution (


) is a relationship b
etween subjective
concept (

), property (

), old value (

), and new value (


) which is truth in a time interval
(


) and some
designated
communities (


).

Term

Definition




䄠獥琠潦⁲e污瑩潮⁥癯汵瑩o渠睨楣栠楳⁤ee灥ne搠潮⁴d浥⁡湤n
c潭o畮楴y




c潮oe灴⁷桩c栠楳⁡⁳畢橥j琠潦⁲ela瑩潮



A property of relation.



䄠癡汵攠扥景牥 c桡nge.




䄠癡汵攠l晴e爠rhange.



䄠畮楶A牳r映 牯灥牴y


F潲oexa浰meⰠ,晴fr⁣桡n来映浥rg楮g ge湵猠Nyctea⁴漠B畢漬u
a

獰sc楥猠i



) may
reclassify
to be new higher taxon. This example is expressed by:

(












|
[



)

{



}
)







3.3.5

To s
tate a contextual knowledge