Subject access in Czechia

hurriedtinkleAI and Robotics

Nov 15, 2013 (3 years and 8 months ago)

79 views

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

1

Subject access in Czechia


Marie.Balikova@nkp.cz

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

2

Outline


Knowledge Organization Systems


Czech Subject Authority File (CZENAS)


Conspectus Categorization Scheme

(CCS)


Uniform and Subject Information Gateways


Topic Map of Library Collection


CZENAS in Digital Collections at the NL CR


DL in European context


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

3

Traditional KOS



represent the
lists of words and phrases, or notation

symbols
organized according to
explicit rules

with
different level of
hierarchical structure

(from very limited to highly
sophisticated)


are used to
tag units of information

so that they may be more
easily retrieved by a search


solve the problems of
homograp
h
s, synonyms and polysemes



ensure consistency

when
the same concept can be given
different names


consistency of terms is one of the most
important
aspects

in
organization and management

of information


c
ontrolled vocabularies and classification schemes are meant
for human users

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

4

Non traditional KOS

Ontologies

as a form of
knowledge representation
, are defined as a
systematic account of existence, a specification of a
conceptualization


describe
concepts and relationships

in
programmatic

ways
and enable
arbitrary

relationships


represent
Knowledge Organization Systems

which try to
capture and describe the
real world entities

and
relationships

in „mashine“ readable and understandable
manner

Other searching systems and techniques

t
he user wants not whole documents but brief answers to specific
questions:
How old is the President? When did Jan Hus die?

What is the anthem of European Union?


a
nswering short questions becomes a problem of finding
the best combination


of
word
-
level information retrieval

(IR) and


syntactic/semantic
-
level

natural language processing (NLP)
techniques


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

5

Participants in the project

TRUST

Multilingual Semantic

and Cognitive

Search Engine

for Text Retrieval Using

Semantic Technologies


IST
-
1999
-
56416



Question
-
answering

method is used


Monolingual

Multilingual


You could search in:

French,

Italian,

Polish,

Portuguese



M
-
CAST:

+ two languages:

English

Czech

M
-
CAST

Question


answering method

M
-
CAST answer


block of answers

Exact, direct answer

Snippet

Visualization of resource page

Prototype of the system


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

6

What is the anthem of European Union?

Question in English


Answer in Polish:


Ody do radości





M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

7


Quel est le drapeau de l'Union Européenne
?

What is the flag of the European Union?




Question in French,

Answer in Portuguese:


„círculo de doze estrelas

douradas sobre fundo azul“




M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

8

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

9


C
zech National Subject Authority File


a structured controlled vocabulary identifying the basic
semantic relationships

(equivalence, hierarchical and
associative) between terms in
natural language

that is designed
for both, post
-
coordination and pre
-
coordination



an
integrated indexing and retrieval

tool in which verbal
(controlled) terms are being linked to the
UDC equivalent
notations and English terms



a standardized system of controlled terms which could serve
the needs of professionals

(cataloguers, indexers) and
non
-
professionals
, e.g. web content creators as well



to
offer them an organizing tool not only to
retrieve
material,
but to
tag
material as well



non
-
professionals

would like to use a standardized indexing
and retrieval tool


but
simple in structure

(like Dublin Core format),
in syntax

(descriptor
-
type system), and with
up
-
to
-
date terminology


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

10

UDC
-

a complementary tool, mapping process

Universal Decimal Classification


covers all subjects


provides context to search terms


supports interoperability between information systems


e
n
ables


browsing and navigation


broadening and narrowing searches


multilingual access to collections


language independent coding

M
apping process

between Czech verbal expression
s

and UDC
numbers is being done intellectually


c
andidates of controlled terms are chosen with
document in
hand (from bottom up)


in order to suggest terms
as specific as needed

(not as specific
as possible)


s
ingle or complex UDC

numbers (pre
-
combined) are linked,


English equivalents

of preferred terms

are added



M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

11

Structure of authority record


Individual entities


Link to Wikipedia provides

additional information

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

12


Example of the application of geographic
coordinates in authority record for places



M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

13

Example of UDC index of formal descriptors in both
Czech and English languages

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

14

Conspectus Categorization Scheme


concordance
tables between
UDC

and
DDC
; three hierarchical
levels, 1.
-

24

Conspectus divisions, 2.
-

584

Conspectus categories, 3.
-

topical authority terms

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

15

Uniform Information Gateway (UIG)

-

nation wide portal which
unifies access to on
-
line library services in Czech
ia

1
M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

16

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

17

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

18

Topic map of library collections

-

an
user
-
friendly subject
access

for inexperienced library users and for those who
prefer to get information on
documents location directly

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

19


D
igital library Kramerius

-

more than

6 mil
.

of scanned pages
.

The goal: to
digitise and make accessible the
periodicals and
monographs

comprising the national cultural heritage

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

20

The metadata of digitised documents in the catalogue of NL

CR contain
subject access points
integrated

in Czech

Subject Authority File
, the metadata which form part of the
Kramerius digital library contain UDC codes only

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

21

D
igital archive of Czech web resources
which are collected
with the aim of their
long
-
term preservation

Conspectus

categories scheme (in
Czech

version

only)

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

22

Example of
subject access

data in full level record of

WebArchiv

digital collection with hyperlink to the original
web page
.
For this

t
he special
Agreement is necessary

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

23



M
anuscriptorium
-

system for collecting and making
accessible on the internet information on
historical book

resources, linked to a
virtual library of digitised documents

.


Searches in Manuscriptorium database
:

provided by a variety of access points,

like
c
ountry,
s
ettlement,

r
epository,
a
uthor,
p
lace of origin
, etc



M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

24

The European Library

is a free service that offers access to the
resources of the

48

national libraries

of Europe in

35 languages

Resources


digital
:


books, posters,

maps,

sound recordings,

videos, etc.
;


bibliographical

Quality and

reliability

are guaranteed

collaborating

national libraries

of Europe


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

25

Search term:


lesní moudrost

Preferred form:


woodcraft

Query expansion

by synonyms (variant form)

M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

26



Europeana
-

portal
of

European
memory institutions collections

D
escriptive
and subject access
points
-

author, place
,
date of
creation,
etc.
-

are added by cooperating institutions

.


M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

27


Czech Digital Library
-

concept covering digitisation, long
-
term preservation of and access to the entire national cultural
heritage in digital form
.

The National Digital Library
covers an important part of
national cultural heritage and operates in the broader context
of the
Czech Digital Library



M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

28

Conclusion



T
raditional Knowledge Organization Systems used in Czechia

can be applied in organization and management of both
traditional and digital memory institutions collections.



The controlled indexing languages (thesauri, subject heading
systems) and classification systems are still very useful, even
necessary.



They serve both to professionals (specialists for knowledge
organisation systems, librarians, archivists, and curators), and
end
-
users as well.



They are important for development of KOS based on semantic
technologies and ontologies

and support standardisation and
harmonisation of terminology in specific domains



M. Balikova, NL CR

Cyfrowość bibliotek i archiwów
Warszawa, 2009

29

Thank you!