Developing the Digital Institutional Repository at HKUST

obtainablerabbiData Management

Jan 31, 2013 (4 years and 7 months ago)

158 views

Hong Kong University of Science & Technology Library

Developing the

Digital Institutional Repository

at HKUST

Diana Chan
, Head of Reference

KT Lam
, Head of Systems

HKUST Library


December 9, 2003

HKUST Library

2

Outline of Presentation

1.
Why Create the Institutional Repository?

2.
Demonstration

3.
How Did HKUST Develop It?

4.
Challenges

5.
IR Software Evaluation

6.
DSpace Implementation at HKUST

7.
Q&A

HKUST Library

3

1.
Why Create the Institutional Repository?



What is an Institutional Repository (IR)?


A “digital collection capturing and preserving the
intellectual output of a single or multi
-
university
community”.

-
adopted from “The case for institutional repositories: a
SPARC position paper” prepared by Raym Crow.

-
<
http://www.arl.org/sparc/IR/ir.html
>



HKUST Library

4

1. Why Create the IR?



Budapest Open Access Initiative


<
http://www.soros.org/openaccess/index.shtml
>




Recommends 2 Strategies:

1.
Self
-
archiving in Open Electronic Archives

2.
Open Access Journals


HKUST Library

5

1. Why Create the IR?



Dual Open
-
Access Strategy


<
http://www.ecs.soton.ac.uk/~harnad/Temp/berlin.htm
>


BOAI
-
2 ("
gold
"): Publish your article in a
suitable open
-
access journal whenever one
exists.


BOAI
-
1 ("
green
"): Otherwise, publish your
article in a suitable toll
-
access journal and
also self
-
archive it.

HKUST Library

6

1. Why Create the IR?



Why have an IR at HKUST?


To create a
permanent record

of the scholarly
output of HKUST


-

No access to some scholarly works published

by our own faculty


-

Collections of working papers, technical

reports, research reports flowing around


-

Some of our scholarly works are in the public

domain




HKUST Library

7

1. Why Create the IR?


Why have an IR at HKUST
?


To help the international Open Access effort.
Because the mission of disseminating
knowledge is only half complete if it is not
widely and readily available to society.



-

Adapted from the Berlin Declaration



<
http://www.zim.mpg.de/openaccess
-
berlin/berlindeclaration.html
>

HKUST Library

8

1. Why Create the IR?


The Contribution Must Satisfy 2 Conditions:




The author…grants to all users a free…right of
access to, and a license to copy, use, distribute,
transmit and display the work publicly …



A complete version of the work is deposited in…at
least one online repository

-

From the Berlin Declaration

HKUST Library

9

2. Demonstration


HKUST Institutional Repository


<
http://repository.ust.hk
/
>



DSpace interface


Sample record


Submission form


Search in OAISter

HKUST Library

10

Collection Type and Size

Communities



18

Collections



37


Book chapters




1

Conference papers


85

Journal articles



66

Patents




62

Presentations



40

Preprints



12

Technical reports


109

Theses




110

Working papers



35

Miscellaneous




6


Total




520

HKUST Library

11

3. How Did HKUST Develop It?

<
http://library.ust.hk/staffman/ref
-
man/IR/ir.html#stages
>


1. Planning & Policies

2. Technical Developments

3. Harvesting and Promotion

4. Work Teams

5. Negotiations with Publishers

HKUST Library

12

3.1 Planning and Policies



Task Force

-

software, scope, policies,
database structure, problems, action plans.



Information Services Committee



guidelines

on different types of publications,
publishers’ policies, data formats, faculty
concerns.



Library Administrative Committee



final
approvals.



HKUST Library

13

3.2 Technical Developments


Will be discussed by KT Lam in parts 5 & 6

HKUST Library

14

3.3 Harvesting and Promotion

Within HKUST:


1
st

Stage
:
Prototype



105 Computer Science Technical Reports



2
nd

Stage:

Target Group: Faculty who already
posted

their publications on the Web

Emailed 80. 49 agreed. Harvested 144

documents.









HKUST Library

15

3.3 Harvesting and Promotion

Within HKUST:

3
rd

Stage:

Target Group:
All Faculty


Emailed all to encourage direct submission. 2
documents submitted. Notes from the Library

4
th

Stage:

Target group:
All Faculty



Emailed all telling which publishers allow post
-
refereed self
-
archiving (IEEE, ACM, Emerald,
SPIE…).

3 documents submitted

HKUST Library

16

3.3 Harvesting and Promotion

In the Cyberspace:


Harvested 53
US Patents


Harvested 21 journal articles from
Emerald


Harvested 10 articles from
DOAJ


Joined
OAISTer


HKUST Library

17

3.3 Harvesting and Promotion

Planned:


Will harvest
conference proceedings

held at
HKUST and published by HKUST


Will cover
PhD theses

with signed permissions


Will contact
departments

for preprints, working
papers, technical reports, etc.


Will contact
faculty

whose publications have not
been posted


Departmental visits


HKUST Library

18

3.4 Work Teams



Subject Librarians



Data Entry Team



HKUST Library

19

3.4 Work Teams


Subject Librarians

1 Liaison

With Faculty

6. Do

Indexing


5. Verify

Document Versions


2. Check Faculty’s


Publication Lists

4. Ascertain


Publishers’s
Policies

3. Harvest

Documents


HKUST Library

20

3.4 Work Teams


Data Entry Team

1. Verify and Convert


PDF Documents

2. Data Entry

Using Submission

Form

3. Set PDF Document

Security & Properties

4. Create Folders
& Upload Files

HKUST Library

21

Get indexed
documents


from librarians


Screen and

Convert Files

Input Data

Check for Errors



Set Pdf document

Security & Properties




Final Check

Group to

Different Folders

Define

Communities &

Collections in DSpace

&

Upload Files

Flowchart on Data Entry

HKUST Library

22

3.5 Negotiations with Publishers


Collection Development Librarian wrote to:


INFORM


ProQuest


Wiley


Springer


IEEE


AAAS


Elsevier



Result: No good news yet.

HKUST Library

23

4. Challenges

Faculty:


Low awareness of Open Access Initiative (OAI)


Concern over copyright issues


Apathy in direct submission


Lack of willingness to negotiate on non
-
exclusive
rights and to publish in open access journals


Lack of willingness to provide the right versions of
documents (pre
-

or post
-
refereed)


Only a small % of scholarly work can be archived


HKUST Library

24

4. Challenges

Institution:


Needs to make a mandate to deposit all
research outputs with the Institutional
Repository



Needs to give financial support to faculty who
submit papers to open access journals


HKUST Library

25

4. Challenges

Publishers:


In Romeo project, only 34 out of 80 allow some sort
of archiving


Many have no policy (Camford, Genetic Society of
America)


Many have an unclear policy



Some:


Decline to give permissions (Springer, AAAS)


Give no response (INFORM)


Give a wrong answer (Wiley)


Need to include self
-
archiving into license
agreements with publishers


HKUST Library

26

4. Challenges

Library continue to:


Provide support for university research self
-
archiving


Promote IR


Educate users and faculty about the IR


Showcase the IR


Find champions and partners among faculty


Seek institutional mandate and support


Harvest documents

HKUST Library

27

5. IR Software Evaluation

1.
Background

2.
EPrints

3.
DSpace

4.
Why Did We Choose DSpace?

5.
Evaluation Guide

6.
Other Software and Selection Criteria

HKUST Library

28

5.1 Background


Institutional repository software
-

also known
as institutional archive
-
creating software, or
digital repository software.


HKUST Library started IR software evaluation
in late December 2002.


Two products were evaluated:
EPrints

and
DSpace
.


Decided to use
DSpace

in mid
-
February
2003.

HKUST Library

29

5.2 EPrints


<
http://software.eprints.org/
>



Developed by University of Southampton.


The very first freely available institutional
repository software; since 2000.


GNU software, thus, open source.


Has the largest installed base.


Written in Perl, with MySQL and Apache.

HKUST Library

30

5.3 DSpace


<
http://www.dspace.org/
>



Jointly developed by MIT Libraries and
Hewlett
-
Packard Company.


Open source available since late December
2002, after two years of development.


Written in Java, with PostgreSQL, Lucene,
and Apache/Tomcat.


Still under development.

HKUST Library

31

5.4 Why Did We Choose DSpace?


DSpace was developed based on the
experience gained by EPrints.


It has a well defined data model:


Community + Collection + Item + Metadata +
Bundle + Bitstream


UTF
-
8 capable.


Well organized web
-
interface.


Metadata in Dublin Core format.

HKUST Library

32

5.5 Evaluation Guide



“A Guide to Institutional Repository Software”
by Open Society Institute


<
http://www.soros.org/openaccess/software/OSI_Guide_
to_Institutional_Repository_Software_v1.htm
>


HKUST Library

33

5.6 Other Software & Selection Criteria


Other IR Software:


CDSware


from CERN.


I
-
TOR


from Netherlands Institute for Scientific
Information Services.


MyCoRe


from University of Essen.


Selection Criteria:


Open source.


Comply to OAI
-
PMH (Open Archives Initiative
Protocol for Metadata Harvesting) .


Currently released and publicly available.

HKUST Library

34

6. DSpace Implementation at HKUST

1.
DSpace Server

2.
Problems

3.
Limitations



HKUST Library

35

6.1 DSpace Server


<
http://repository.ust.hk/
>



PC with Pentium 4, 2.4GHz, 1GB RAM
memory


RedHat Linux, with standalone Tomcat,
PostgreSQL database, and Lucene search
engine.


DSpace Version 1.1.1.


Becomes live since late February 2003.

HKUST Library

36

6.2 Problems


Faculty Submission Form



DSpace’s build
-
in submission interface is too
complicated.


We have to develop our own
submission form
.
Then use DSpace’s Item Importer to load the
data.



CJK Search Failure



Fixed by modifying DSpace Java source codes.

HKUST Library

37

6.2 Problems


CNRI Handle



Required registration at CNRI for a handle prefix.
Our prefix is 1783.1.


Custom Authentication



Added java codes to query HKUST’s LDAP
server.


Handling of non
-
English Characters


Uses the approach adopted in our Electronic
Theses Database.

HKUST Library

38

6.2 Problems


Server Hanging Problem



Other Software Bugs

HKUST Library

39

6.3 Limitations


Flatten Community+Collection structure


2
-
level only, not deep enough.



Linked Collection


a collection that belongs to more than one
communities.



Unable to Cross Search


search multiple collections from different
communities.

HKUST Library

40

6.3 Limitations


Query Syntax Not Apparent to Users, e.g.


+water +rapid


[for exact word match]


"vapor generator"

[for phrase search]



Limited Capability on Sorting Search Results.



Cannot Display the Number of Items in the
Repository, in a Community, and in a
Collection.

HKUST Library

41

Related Websites


American
-
Scientist September Forum


<
http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/index.html
>


Open Access Presentation


<
http://www.ecs.soton.ac.uk/~harnad/Temp/openaccess.ppt>


Self
-
Archiving FAQs


<
http://www.eprints.org/self
-
faq
/
>


SPARC Institutional Repository Checklist &
Resource Guide


<
http://www.arl.org/sparc/IR/IR_Guide.html
>