Implementing an Institutional Repository for Digital Archive Communities

judgedrunkshipΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

90 εμφανίσεις

DC 2006

Implementing an Institutional Repository
for Digital Archive Communities



Hsueh
-
Hua Chen, Jieh Hsiang & Chiung
-
min Tsai



National Taiwan University


Experience from National Taiwan University


DC 2006

Outline


Introduction of Digital Archives Project
in National Taiwan University


Digital Archives Resource Center (DARC)



DARC as Institutional Repository


Challenges for DARC


DSpace
-
based IR system





Implementation of DARC





DC 2006

National Taiwan University

Digital Archives Project


one of the institutional projects of National Digital
Archives Program (NDAP)


7 participating organizations


(6 content development, 1 technology service)


NTU Library


TAI Herbarium


Insect Museum


Geosciences Museum


Museum of Anthropology


Zoological Museum


Computer and Information Networking Center


th
e amount of data is more than
100,000


DC 2006

National Taiwan University

Digital Archives Project



preserving NTU natural/cultural heritages



promoting NTU academic research




public access to NTU legacy collections

DC 2006

Technical Supports and Services


3 major missions


Provide the technology services for the other 6
projects


Deal with the standards of information organization


Build the NTU Digital Archives Resource Center
(DARC)


Set up the collaboration and communication
mechanisms


Integrated searching system


Knowledge management system


Provide backup service

DC 2006

DARC

the

portal website

http://www.darc.ntu.edu.tw

DC 2006

NTU Digital Archives Collections


Taiwan Historical Collections


Tan
-
Hsin
-
Tang
-
An

(6703)


Taiwan Cultural Relic Rubbing

(189)


Ino Kanori (
2350)


Botanical Collections


Type Specimen (1080)


documents of Type Specimen
(700)


General Specimen (23109)


Flora of Taiwan vo.1
-
6


Entomological Collections


Type Specimen (1239)


Common insect species


(618)


Photo Collections (190)




Geosciences Collections


Minerals (572)


Rocks (450)



Anthropological
Collections


Artifacts and p
hotos of the
aborigine groups(2129)


Zoological Collections


Amphibians (33)


Birds (550)


Mammals (83)

(2006/6)

DC 2006

Challenges

DARC represents a wide variety of content and data

types which reflect the features of various research

fields of study, however improving the interoperability
for the heterogeneous collections is important.




Every content holder aim at the needs of their particular community
and work independently with a loose collaboration and integration;



Every content holder has their respective digital archive system with
individual data structure, metadata standards, management policy and
search interface, however, there is an inability to transform and
integrate data with each other transparently.

DC 2006

DARC as Institutional Repository


Institutional Repository


A university
-
based institutional repository is a
set of services that a university offers to the
members of its community for the
management and dissemination of digital
materials created by the institution and its
community members.


Source: Clifford A. Lynch (February 2003), “Institutional Repositories: Essential Infrastructure for
Scholarship in the Digital Age”
ARL Bimonthly Report

226: 1
-
7.
http://www.arl.org/newsltr/226/ir.html

DC 2006

DARC as Institutional Repository

digital repository



Design


Broaden Public Access


Personalize Use


Services


searching


storage


usage


documentation


Source: Digital Library Federation (Jul. 2004). “Digital library content and course management
systems: Issues of interoperation.”
http://www.diglib.org/pubs/cmsdl0407/cmsdl0407check.htm


DC 2006

DARC as Institutional Repository


Institutional Repository


DSpace (MIT, HP)


Eprints (University of Southampton)


arXiv (Cornell University)


ETD
-
db (Virginia Tech University)


GreenStone (University of Waikato)


……


DC 2006

DSpace


Support from MIT Libraries, Hewlett Packard Labs


Open Source


java


Related database / SQL


APIs


CNRI (
The Corporation for National Research Initiatives

)“handles”
as identifiers


OAI
-
PMH


Management



latest version:

DSpace v.1.4 alpha (July , 2006)


DC 2006

DSpace Data Contents



Preprints, articles


Technical Reports


Working Papers


Conference Papers


E
-
theses


Datasets


statistical, geospatial, matlab, etc.

DC 2006

DSpace
-
based System


Advantages:


Easy installation


Less cost


More Functions


Allowing Modifications



Developing a proto
-
type in short time

DSpace
is a good choice.

DC 2006

DARC: DSpace
-
based IR Implementation


System Architecture


Data Ingestion


Matadata


Harvesting



Mapping


Preservation


Indexing and Searching


User Interface


Chinese Enhancement



DC 2006

Implementation



Red Hat Linux 8.0 3.2
-
7


DSpace 1.1.1


Apache, Tomcat,


Java 1.3/1.4, JSP 1.2, Servlet 2.3


PostgreSQL 7.3.x, JDBC (rdbms)


CNRI Handle System 5 (persistent ids)


Lucene 1.2 (index/search)


Log4j (logging)


implement OAI
-
PMH and Handle System


DC 2006

The Integrated Framework of DARC

典藏單位



完整

metadata



數位物件

轉換程式

DC metadata

精簡

metadata

低解析度數位物件

OAI

Data Provider

資料處理



索引建立

OAI

Service Provider

Centralized

Database

Web Server

CGI

HTTP

臺灣大學典藏

數位化計畫

典藏單位



完整

metadata



數位物件

轉換程式

DC metadata

精簡

metadata

低解析度數位物件

OAI

Data Provider

資料處理



索引建立

OAI

Service Provider

Centralized

Database

Web Server

CGI

HTTP

臺灣大學典藏

數位化計畫

Content Providers



Complete metadata



Digital Objects

converting

process

DC metadata

Simple metadata

low resolution digital
objects

OAI

Data Provider

data process



index

OAI

Service Provider

Centralized

Database

Web Server

CGI

HTTP

User

DAP
-
NTU

Integrated System

DC 2006

System Architecture

Digital Archives
Content Providers
Metadata records
Digital objects
Dublin Core metadata records
Low resolution digital objects
Metadata repository
records convert
&
index
OAI
-
PMH
FTP
Handle System
users
DARC cross
-
catalog
and browse interface
HTTP
DC 2006

Data Ingestion


Web Submit UI or FTP



Batch Item Importer



Handles


DC 2006

Batch Item Importer

DC 2006

Matadata

Use Dublin Core schema for data search and keep
the original metadata for display


Metadata harvesting: to preserve the original
data as much as possible regardless of the
metadata formats.


Metadata mapping and conversion: to provide
applications for mapping original metadata into
Dublin Core Schema, which is needed for object
management and service development.

DC 2006

The title of the
project

Attribute/Type

Data type

Subject

Metadata standard

DARC

Varied

Humanities/Science &
Biology

Dublin Core

Taiwan Historical
Collections

Archives

Rubbing

History

Dublin Core


TAI Herbarium

Botany

Botany

HISPID

Insect Museum

Zoology

Zoology

SPECIES 2000

Geosciences
Collections

Specimen

Geology

N/A

Anthropological
Collections

Manuscript

Object


Ethnology/Archeology

N/A


Museum of Zoology

Zoology

Zoology

SPECIES 2000

Metadata by Projects List

DC 2006

Matadata Mapping

Original

metadata

Mapping

To DC

Operation

DC 2006

Metadata Preservation and Display



DC 2006

Indexing and Searching


DC 2006

Indexing and Searching




DC 2006

search results

DC 2006

search results
-
DARC

DC 2006


fulltext search

DC 2006

User Interface


different data, different display


search

Dublin Core element


browse

thumbnail



di
splay

the original data element


the display sequence can be customized

DC 2006

DARC UI


Basic Search

Browse

Advanced Search

DC 2006

User Interface


search result


DC 2006

User Interface



full display result


a


c


b


DC 2006


classification tree

DC 2006


礦物



國立臺灣大學地質科學系成立於日治時代,多數館藏承接至臺北帝大時代,
資料量至為龐大多樣,包括礦物標本約
500
件、岩石約
450
件、化石約
2300
件、
礦物寶石約數十件等,深具資料保存維護、研究、教育等功能。

(Source : NTU DARC, https://140.112.30.244/darc/intro/1918
-
212.jsp )

1

2

copy

click

paste

3

Citation export

DC 2006


Digital Content Licensing

DC 2006

Chinese Enhancement


Browse

Search

DC 2006

Chinese Enhancement


Basic Search

Browse

Advanced Search

DC 2006

Future Works

DARC is the
institutional
repository for the
management and
dissemination of digital
archiving.



Communication





Digital Archiving



Value
-
added service

DARC
E
-
Office
Digital
Archiving
system
KM system
backup
service
long
-
term
preservation
mechanism
DRM
system
cataloging
system
membership
management
mechanism
integrated
searching
and
browsing
user
feedback
mechanism
DC 2006

Future Work


Interoperability for the digital archives
(DARC
-
based) between Institutes


Define an overall technical framework


Develop a core set of systems to support
digital collections


Metadata


Knowledge Organization System

DC 2006

Thank You