Building the Digital Library Environment at the University of Kansas Lawrence

pogonotomygobbleAI and Robotics

Nov 15, 2013 (3 years and 10 months ago)

81 views





Digital Library

Technical Infrastructure Task Force



Building the Digital Library Environment at
the University of Kansas


Lawrence



Working Document, ver. 1.0


Report to the

Digital Library Executive Group


November 10, 2000







Rick Clement

Wes

Hubert

John Miller

Jerry Niebaum

Beth Forrest Warner

Table of Contents


Executive Summary

................................
................................
................................
........

1

Introduction

................................
................................
................................
.....................

7

Conceptual Foundations of the KU Digital Library

................................
...........................

9

Implementation of the KU Digital Library

................................
................................
.........

9

Strategies

................................
................................
................................
....................

9

Functions

................................
................................
................................
...................

12

Roles and Responsibilities

................................
................................
.........................

13

Archit
ecture Components and Standards

................................
................................
..

17

Component Selection

................................
................................
.............................

17

Strategic Components

................................
................................
............................

18

Local Repositories

................................
................................
................................
..

20

Resource Naming Services

................................
................................
....................

21

Object Classes and Services

................................
................................
.................

23

Standards for Content and Metadata

................................
................................
.....

23

Archiving

................................
................................
................................
................

32

Interface Considerations

................................
................................
........................

34

Interoperability Considerations

................................
................................
...............

34

Conceptual Architecture Model

................................
................................
..............

34

Supported
Software Tools

................................
................................
......................

35

Support Issues

................................
................................
................................
.......

38

Project Selection and Prioritization

................................
................................
............

40

Selection Guidelines:

................................
................................
.............................

40

Process

................................
................................
................................
..................

41

On
-
going Services Evaluation

................................
................................
...................

42

Recommendations

................................
................................
................................
........

43


Appendix A: Charge to the Task Force

................................
................................
.........

45

Appendix B: KU Digital Library Mission and Goals

................................
........................

46

Appendix C: Object Classes & Behaviors


Background Information

...........................

49

Appendix D: Textual Markup Languages


Background Information

............................

54

Appendix E: Tools


Background Information

................................
...............................

61

Appendix F: Archiving


Background Information

................................
.........................

64

Appendix G: User Interaction Scenarios

................................
................................
.......

68


Table of Figures


Figure 1: Digital Library Functions

................................
................................
.................

12

Figure 2. Strategic Components Diagram

................................
................................
.....

18

Figure 3: Conceptual Architecture Model

................................
................................
......

35






Acknowledgements / Disclaimer



Wh
ile many websites and articles were reviewed, the Task Force would like to
especially acknowledge the following institutions for providing information that
was used to help shape the discussions of the Task Force and this document:




California Digital Libr
ary



Digital Library Federation



Harvard University



Library of Congress



University of Arizona



University of California
-
Berkeley



University of Michigan


Thank you.


Any mistakes or misinterpretations of information are solely the responsibility of
the Task Fo
rce Report authors.



KU Digital Library Technical Infrastructure Task Force Report

Page
1

Working Draft


November 1
0, 2000

Executive Summary


The meaning of the term “digital library” is less transparent than one might expect. The
University of Kansas Digital Library Executive Group and Digital Library Advisory Group
were formed in 1999 to begin the proc
ess of defining and shaping the concept of “digital
library” in the KU environment


As KU prepares to step into the digital library arena, it is helpful to shape a common
understanding of the concept by referring to our Digital Library Vision, Mission, and

Principles statements, available on the KU Digital Library Initiatives website
(
http://kudiglib.ukans.edu
).


In May 2000, the Digital Library Executive Group formed the Digital Library Technical
Infrastructure Ta
sk Force. This report is a result of Task Force discussions on the
technical issues involved in creating a digital library environment for the University of
Kansas. Given the volatile nature of technology in general, and digital libraries in
particular,
this report should be viewed as a
framework

for technical implementation.
Key points from the report are summarized below.


Roles and Responsibilities


There are many roles in a digital library program


and there is seldom a one
-
to
-
one
relationship betwe
en these roles and individual people. These roles fall into several
broad categories including
Management, Requirements Analysis & Design, Core
Technical Support,

User Support,

and
Legal and Policy Issues Support
. The digital
library team must have a bal
ance of skills across a variety of roles with individual staff
members often wearing several hats at once. It is important to recognize that each of
these roles is critical to the success of the initiative.


Architectural Components and Standards


One of
the hallmarks of a library is the ability to provide coherence and context for
access to disparate collections of information resources. This is a critical principle to
carry forward into the digital environment and a distinguishing characteristic that
se
parates digital libraries from simple collections of links to electronic objects.


There are several critical factors involved in being able to provide a coherent, contextual
environment for digital resources:




Use of standards for creating objects



Use of
standards for describing resources (for access)



A common methodology for access to object types, and



An understanding of object behaviors and user interactions



KU Digital Library Technical Infrastructure Task Force Report

Page
2

Working Draft


November 1
0, 2000

A successful digital library environment is not an isolated, self
-
sufficient entity that exists

and operates apart from the rest of the information and technology environment of the
institution. It is critically dependent on, and must work within, the resources and
decisions made in many areas including networking, computing support, information
re
source support, and information technology policies. Access to and use of
information resources for the University depends upon a solid foundation that provides
an exceptional network and computing services infrastructure.


A key factor in the provision o
f a coherent, unified environment is the use of standards,
whenever possible, for the many different aspects of storing and accessing digital
information, including standards for: interoperability, data format, resource identification,
resource description
, and data archiving.


Building upon this basic institutional infrastructure, a successful digital library
environment should provide additional architectural components including:


Local Repositories



The conceptual model of a
repository

as a set of serv
ices and related facilities
may be defined to include:



A datastore containing digital objects (
content and metadata
) created by,
or under the auspices of, the KU
-
DLI



Services necessary to the smooth operation of satisfying requests for
objects residing in
the datastore



Services for effective long
-
term management of objects in the datastore.


Datastore facilities and services

may be either centralized or distributed, with a
hybrid approach being the most likely and practical. Not only will remote
resources
be included but potentially, local resources hosted on non
-
DL servers
as well. While these may be included from an access perspective only, certain
standards for persistent access should be met for materials to be included under
the KU
-
DL umbrella.


Resou
rce Naming Services and Standards


The lack of permanence, over time, of object names is both the hallmark and the
bane of existence of creating persistent access to digital resources. To facilitate
the use of digital object names within the KU digital li
brary environment, the KU
-
DLI should adopt the concept of
name resolution services
; develop and provide a
name resolution server for the campus; and develop and provide a set of
services that permit the KU community to create and manage names for their
di
gital objects. In addition to developing naming conventions, policies and
processes must be developed to create organizational naming authorities and
outline the responsibilities for maintaining the validity of names over time.




KU Digital Library Technical Infrastructure Task Force Report

Page
3

Working Draft


November 1
0, 2000

Content and Metadata Stand
ards


Standards in the creation of objects and metadata allow common storage,
access, and management processes to be used and economies of scale to be
realized for the institution. The use of recommended standards should be
required for objects and metada
ta created locally under the auspices of the KU
Digital Library and stored in the central object / metadata repositories. These
same standards should be strongly encouraged for objects and metadata created
outside the control or coordination of the KU
-
DL
and stored / accessed remotely
(either on or off campus). Objects not adhering to published KU
-
DLI standards
should be accepted for long
-
term management only under exceptional
circumstances. Specific content and metadata standards are outlined in detail
in
the report.


Archiving


The mission of digital
archiving

initiatives is to preserve the integrity of objects
and ensure their persistence. This seemingly simple statement, however, raises
a wide range of questions, most of which focus on standards, res
ponsibility for
archiving, technical strategies, the connection between archiving and access,
and economic models.


Preservation in the digital world is not exclusively a matter of longevity of storage
media. The viability of digitized files is much more
dependent on the life
expectancy of the
access

system. Institutions must prepare to migrate digitized
resources from one generation of technology to subsequent generations. The
use of digital technologies from a preservation perspective requires a deep a
nd
longstanding institutional commitment to long
-
term
access
, the full integration of
the technology into information management procedures and processes, and
significant leadership in developing appropriate definitions and standards for
digital preservati
on.


Services and Policies


Tying the various individual components together are the services offered
through the digital library environment. These are the tools and processes
whose ultimate goal is to provide accurate, seamless, and 'transparent' access

across the various repositories and systems for the user community.


Services and policies initially provided for the digital library environment should
include object repository registration processes and guidelines, metadata
advisory and creation proces
ses and guidelines, naming convention guidelines
and services, object creation advisory and creation processes and guidelines,
common navigation processes, common tool sets, etc.



KU Digital Library Technical Infrastructure Task Force Report

Page
4

Working Draft


November 1
0, 2000

Bringing these technical, service, and policy components together to provide

a coherent
whole can be illustrated with the following diagram of a conceptual architectural model
for the digital library environment:



Selection and Prioritization


The number of meritorious projects proposed will always far outweigh the resources
ava
ilable to address them. In order to ensure that resources are invested wisely in
digitizing and managing the most significant and useful materials at the lowest possible
cost without placing the institution at legal or social risk, a mechanism for
selecti
ng

projects and giving them
priority

needs to be developed at the outset, in consultation
with the primary stakeholders. The following guidelines are proposed as an initial set of
selection criteria for projects being considered for implementation within
the KU
-
DLI:




consistent with the mission and goals of the institution and fits within the
strategic focus of the KU
-
DLI



has clearly defined goals and outcomes



responds to known campus needs



is collaborative in nature, leverages resources, and supports pa
rtnerships on
campus or with other institutions, groups, or individuals with similar interests



cost/benefit analysis over the short
-

and long
-
term is positive


KU Digital Library Technical Infrastructure Task Force Report

Page
5

Working Draft


November 1
0, 2000



takes advantage of available outside funding, or positions the KU
-
DLI to
obtain outside funding



enhances the diversity of resources available or the audience served, within
the supported technical infrastructure and standards



facilitates innovation in teaching and the curriculum



facilitates innovation in research



facilitates innovation in new mode
s of scholarly communication



facilitates creative, innovative, and interesting concepts and approaches
within the technical architecture framework



utilizes or provides the opportunity to build on hardware and software
solutions existing within the Universi
ty



preserves and enables continuing access to significant rare or unique
collections



supports selection and/or creation of information products in disciplines or
sub
-
disciplines where KU is recognized as a leader or is pre
-
eminent [note:
this would not be

determined solely by ‘rankings’]



enhances curricular development in emerging areas (e.g., indigenous studies)



maximizes economies of scale



uses the University’s fiscal, human, and infrastructure (space, hardware, etc.)
resources effectively



is consisten
t with KU Libraries’ scholarly communications and collection
development and management principles



enhances KU’s systems capability to support usersin self
-
sufficiency



benefits a significant number of users



follows the technical architecture parameters o
f the KU
-
DLI and provides
digital objects that meet the highest technical standards the institution can
afford


In order to apply these guidelines / criteria to selecting and prioritizing projects, a
selection process must be established. This process sho
uld include an objective
method of applying criteria. In addition, it should involve a number of campus
stakeholders in both developing and approving the process including the Digital Library
Executive Group and the Digital Library Advisory Group. Selecti
on and prioritization
processes will need to follow somewhat different criteria and process paths depending
on the source of the materials and the parties involved.


Implications for Resources


As with any project, the vision can quickly overwhelm the reso
urces available. In order
to create a successful digital library environment for the University, it is critical that the
initial scope of the initiative be carefully defined in order to maximize use of the
resources available.





KU Digital Library Technical Infrastructure Task Force Report

Page
6

Working Draft


November 1
0, 2000

Recommendations


Based on

the background information and discussion presented in this report, the
Digital Library Technical Infrastructure Task Force recommends the following actions to
facilitate the establishment of a robust digital library environment for the University of
Kans
as


Lawrence campus:


1)

Adopt the
Principles

statement as the basis for central Digital Library support for
the University of Kansas


Lawrence campus.

2)

Adopt the
Implementation Strategies

for the Digital Library Goals as the initial
implementation framework

for the DLI.

3)

Adopt the conceptual architecture model for the KU
-
DL. Recognize that
implementation of this model will require a phased, iterative approach whose
dimensions will be determined by the scope of the KU
-
DLI.

4)

Adopt the concept of local repositor
ies for the storage and management of local
digital resources. Appoint a group to develop and bring to the DLEG for approval,
a detailed plan.

5)

Adopt the concept of Name Resolution Services for the naming of and access to
resources in the digital library.

Appoint a group to select a name resolution service
scheme and develop a detailed implementation plan.

6)

Adopt the concept of Object Classes and Services as a methodology for
standardizing treatment of and interaction with digital library resources.

7)

Adopt
the guidelines & process for component/service/tool selection for the DLI

8)

Adopt the concept of a software toolkit and support levels and the criteria for
inclusion. Specific recommendations/phasing will be determined by the scope of
KU
-
DLI and decisions o
n other recommendations in this report.

9)

Adopt the basic recommendations for content format standards for locally
produced objects and commercially purchased/licensed resources, whenever
possible.

10)

Adopt the basic recommendations for metadata standards for r
esources to be
included in the KU
-
DL. Appoint a group to select/develop metadata tagset
definitions (minimal levels at least) and crosswalks, and procedures for creating
and maintaining KU
-
SL metadata.

11)

Determine the scope of KU’s archiving / preservation
commitments for digital
resources.

12)

Adopt the project selection guidelines and process. Appoint a group to develop
detailed procedures for submission and selection of projects.

13)

Select initial project(s) for implementation. Following selection, appoint a g
roup to
develop detailed task plans, timelines, and resource needs.

14)

Approve the use of existing, and/or purchase of new, equipment and software for
the initial DL implementation once the DLI scope and initial projects are
determined.

15)

Obtain funding for st
art
-
up costs.

16)

Appoint a group to develop service evaluation guidelines and process as the DLI
progresses.



KU Digital Library Technical Infrastructure Task Force Report

Page
7

Working Draft


November 1
0, 2000

Introduction


“The meaning of the term “digital library” is less transparent than one might expect. The
words conjure up images of cutting
-
edge comp
uter and information science research.
They are invoked to describe what some assert to be radically new kinds of practices for
the management and use of information. And they are used to replace earlier
references to “electronic” and “virtual” libraries.”

1


In order to help shape their common understanding of the concept, members of the
Digital Library Federation crafted a working definition of digital libraries:


Digital libraries are organizations that provide the resources, including the
specialized st
aff, to select, structure, offer intellectual access to, interpret,
distribute, preserve the integrity of, and ensure the persistence over time
of collections of digital works so that they are readily and economically
available for use by a defined communi
ty or set of communities.
2


“This is a full definition by any measure, and a good working definition because it is
broad enough to comprehend other uses of the term. […] other definitions focus on one
or more of the features included in the DLF definition
, while ignoring or de
-
emphasizing
the rest. For example, the term “digital library” may refer simply to the notion of
“collection,” without reference to its organization, intellectual accessibility, or service
attributes. This is the particular sense that

seems to be in play when we hear the World
Wide Web described as a digital library. But the words might refer as well to the
organization underlying the collection, or, even more specifically, to the computer
-
based
system in which the collection resides.
The latter sense is most clearly in use in the
National Science Foundation’s Digital Library Initiative. Yet again, institutions may be
characterized as digital libraries to distinguish them from digital archives when the intent
is to call attention to the

differences in the nature of their collections.”
3


While the concept of digitized resources in libraries has been in evidence for several
decades, the current use of the term ‘digital library’ stems from the federally funded
(NSF/ARPA/NASA) Digital Librar
ies Initiative in 1994. Since then, many excellent
digital library efforts have sprung up across the country, and around the world, that have
helped refine the concepts, practices, systems, and tools that go into building a digital
library.


The Universit
y of Kansas is now preparing to step into the digital library arena.
Commercially produced digital resources have been made available through the Library
for a number of years. An increasing appetite within the university community for
access to more dig
ital resources and an interest in digitizing locally held materials has
highlighted the need for a cooperative approach to designing and supporting a coherent
digital library environment for the University.



KU Digital Library Technical Infrastructure Task Force Report

Page
8

Working Draft


November 1
0, 2000

In response to this need, the Digital Library Ex
ecutive Group and the Digital Library
Advisory Group were formed in 1999. In May 2000, the Digital Library Executive Group
formed the Digital Library Technical Infrastructure Task Force with the following charge:


The Digital Library Technical Infrastruc
ture Task Force is charged to develop specific
recommendations and plans for the KU Digital Library Technical Infrastructure


the
collection of common systems and services that make it possible to store, organize, and
access digital materials:




Develop ar
chitectural principles and standards for KU shared digital collections.
These should be consistent with relevant industry, Kansas Digital Library, and
other University standards, provide a framework that facilitates the creation of
integrated systems, and

provide the flexibility to foster innovations in scholarly
communication. After DLEG approval, submit to the State Architecture process
for consideration.



Identify appropriate elements for the development of an integrated digital library
system which acc
ommodates all formats
-

metadata, full
-
text, images, numeric
data, geospatial data, etc.



Develop principles and specifications for the identification, evaluation, selection,
and implementation of online tools and services for sharing, accessing,
manipulati
ng, integrating, and archiving electronic scholarly content in all forms.



Recommend an appropriate strategy for moving campus
-
developed tools to the
KU Digital Library.



Recommend a process for prioritizing services to be implemented.



Develop principles

and guidelines for the type and frequency of evaluation of
tools and services, both before and after they are implemented.


The Task Force will present their report and work plan to the DLEG by September 1, 2000.
The plan should present specific recommen
dations, task plans, and timelines for
accomplishment.


This report is a result of Task Force discussions on the technical issues involved in
creating a digital library environment for the University of Kansas and draws on draft
Digital Library Executive G
roup documents defining the overall programmatic direction
for the KU Digital Library. While the Task Force focused on efforts on the Lawrence
campus, the Initiatives will be coordinated with the KU Medical Campus whenever
appropriate. Given the volatile

nature of technology in general and digital libraries in
particular, this report should be viewed as a
framework

for technical implementation.
While specific technical recommendations are based on current industry standards and
“best practices”, these wi
ll undoubtedly evolve as the industry and the initiative mature.


KU Digital Library Technical Infrastructure Task Force Report

Page
9

Working Draft


November 1
0, 2000

Conceptual Foundations of the KU Digital Library


Although this Task Force concentrated primarily on defining the
technical

aspects of
KU’s Digital Library, it is critical to have a shared
understanding of the basic
philosophical and conceptual foundations upon which these technical discussions and
recommendations are based. These foundation documents are available on the KU
Digital Library Initiatives website (
http://kudiglib.ukans.edu

).


Implementation of the KU Digital Library


Strategies


The goals of the KU Digital Library Initiatives (KU
-
DLI) will be achieved through a
variety of implementation strategies. The KU
-
DLI goals are restated bel
ow with
specific implementation strategies outlined for each.


Goal 1:

Develop digital collections
--

expanding over time in number and scope
--

created from the conversion to digital form of documents contained in our and other
libraries and archives, an
d from the incorporation of holdings already in electronic
form.




Identify needs and initiate projects to acquire or create appropriate digital
resources



Identify existing digital projects / resources on campus and incorporate them
into the framework of th
e Digital Library to the extent possible



Encourage the capture and centralized management of primary university
scholarly data including resources born digitally. As appropriate, manage
resources in conjunction with other University programs, such as the
Data
Warehousing Initiative and records management efforts.



Implement and/or develop scalable systems for handling classes of materials



Provide advisory services and training for campus users in standards
-
based
methods and procedures for creating and manag
ing digital materials



Provide a means for identification of resources associated with the KU
-
DLI


Goal 2:

Establish a collaborative management structure to coordinate and guide the
implementation and ongoing maintenance of the digital library collections;

to set
policy regarding participation, funding, development and access; to encourage and
facilitate broad involvement; and to address issues of policy and practice that may
inhibit full citizen access.




Provide a management focus and technical infrastruct
ure for digital
information projects on campus

o

Provide central coordinating and management services for campus
projects


KU Digital Library Technical Infrastructure Task Force Report

Page
10

Working Draft


November 1
0, 2000

o

Provide and encourage use of centralized server and storage services

o

Provide and encourage use of centrally supported developer tools an
d
services



Work with the campus community to identify and prioritize project areas for
digital collections



Actively seek partnerships with campus units and departments, the University
Press of Kansas; regional institutions, centers, organizations, and, in
doing
so, market IS’s expertise and information resources



Encourage greater interoperability and integration of digital collections across
campus and within the state, consistent with national and international
initiatives



Initiate and participate in conso
rtial activities as appropriate



Provide universal and open access to the campus and beyond to the extent
technically feasible and within appropriate licensing provisions



Develop policies and guidelines, as appropriate, for the management of and
access to
the Digital Library


Goal 3:

Develop a funding strategy that addresses the need for support from both
public and private sources to provide the means to launch new initiatives.




Advocate investment consistent with mission:

o

adequate staff for managing and
supporting the KU
-
DLI

o

funding for collections, both acquired and created

o

adequate technical infrastructure funding and support



Develop economic models to support a sustainable digital library environment
for the campus



Explore, identify, and secure externa
l resources (e.g., funding) and expertise
to support the Digital Library.

o

work with various campus funding and fund
-
raising groups, as
appropriate, to secure contributions

o

pursue grant opportunities


Goal 4:

Form selection guidelines that will accommodate

local initiatives and
projects; and ensure that the digital library collections comprise a significant corpus
of materials. Provide for coordination with the Kansas Digital Library.




Develop criteria and processes for project and resource selection.



Work

in cooperation with KUMC and the Kansas Digital Library to shape
standards, processes, and initiatives that meet both the needs of the
University and the state as a whole.


Goal 5:

Adopt common standards and best practices to ensure full informational
ca
pture; guarantee universal accessibility and interchangeability; simplify retrieval
and navigation; and facilitate archiving and enduring access.



KU Digital Library Technical Infrastructure Task Force Report

Page
11

Working Draft


November 1
0, 2000



Provide guidance for organization of campus electronic information
resources, especially in support of dista
nce learning



Incorporate appropriate tools and practices from the larger technical
environment for creating, managing, accessing, and analyzing content



Emphasize and encourage the use of standards and/or “best practices”, as
appropriate, for the creation a
nd management of digital resources



Provide a common interface to the KU
-
DL to the extent it is technically
feasible and legally possible



Apply appropriate technical standards consistent with established and
emerging standards and “best practices” (e.g., m
etadata, file formats, search
protocols, networking/telecommunications, etc.) for compatibility and
interoperability (e.g., minimize the number of interfaces, software
applications, etc.)



Coordinate the use of appropriate technical standards and “best prac
tices’
with KUMC and Kansas Digital Library



Provide a secure environment for access (I/A/A) and for resources



Facilitate long
-
term access to digital collections by developing and
maintaining a content migration / preservation strategy



Provide support faci
lities for resource management and tools

o

Recommend centralized server support

o

Provide guidelines for distributed local servers that participate as
resource repositories

o

Provide basic support for a recommended DL toolkit, conversion
services, and metadata c
reation and management


Goal 6:

Establish an ongoing and comprehensive evaluation program



Monitor and evaluate national and international efforts toward developing
criteria for digital library service evaluation



Develop evaluation criteria appropriate for

assessment by the University and
external funding agencies



Work with the University community through the Digital Library Advisory
Group
4
, campus surveys, usage monitoring, etc. to evaluate the success of
the KU
-
DLI


KU Digital Library Technical Infrastructure Task Force Report

Page
12

Working Draft


November 1
0, 2000

Functions


Looking at digital library
definitions broadly, the digital library can be distilled into three
essential aspects:




Selected and managed digital collections



Schema for organization and access



Supporting infrastructure and services
5


It is instructive to extrapolate them into their b
asic components: Access, Mediation,
Collection Development, and Archiving.




Figure
1
: Digital Library Functions
6


By building upon these functional foundations, it becomes possible to further define the
roles and responsibiliti
es and the technical infrastructure needed to develop a coherent
digital library environment for the campus.


Archive

Mediation

Collection

Development

Access

integrity

preservation

rights mgt

distributed

storage

distributed

access

economic

models

publishing

agents

commerce

interfaces

information

retrieval

metadata

links

structure


KU Digital Library Technical Infrastructure Task Force Report

Page
13

Working Draft


November 1
0, 2000

Roles and Responsibilities


There are many roles in a digital library program


and there is seldom a one
-
to
-
one
relationship between these roles
and individual people. These roles fall into several
broad categories including
Management, Requirements Analysis & Design, Core
Technical Support,

User Support,

and
Legal and Policy Issues Support
. The digital
library team must have a balance of skills
across a variety of roles with individual staff
members often wearing several hats at once. It is important to recognize that each of
these roles is critical to the success of the initiative.


Management


User sponsorship

User sponsors are usually projec
t initiators. Their role is to help make and then
support key project scope decisions. The user sponsor(s) will work closely with
program coordinator(s) to gain support and resources for the project.


Program Coordination

The program coordination role
is to work closely with the user sponsor(s) to achieve
success. Primary responsibilities of program coordination include:



educating University leadership and the campus community at large about the
application and impacts of digital libraries



gaining eco
nomic support for the program



leading the process of identifying and prioritizing applications



defining budgets and schedules



working with project managers to ensure project success



monitoring industry trends and identifying emerging technologies and
stand
ards that should be adopted



representing the KU
-
DLI to the broader library, IT, and academic communities



promoting the work and resources of the DL to the University and beyond


Project management

Project management is responsible for day
-
to
-
day direction
of project tasks and
activities including resource coordination, status tracking, and communication of
project progress and issues, working closely with staff and users involved with the
project. Project management responsibilities include:



create, manage
, and adjust project plans



define overall architecture and set standards



evaluate and select hardware platforms



evaluate and select networking facilities



evaluate and select middleware



KU Digital Library Technical Infrastructure Task Force Report

Page
14

Working Draft


November 1
0, 2000

Requirements Analysis & Design


User requirements analysis

This role
is responsible for leading user requirements definition activities and then
representing those requirements as the digital library environment is developed.
During the scope phase of the project, the user requirements analyst’s role is to
collect, consoli
date, organize, and prioritize the needs and problems the user
community presents. The objective is to create a set of requirements, which ensure
that the project is a success from the user’s perspective, not just a technical
success.


Technical architect
ure

This role is responsible for the design and oversight of the technical infrastructure
and security strategy to support the digital library and provide the overall
cohesiveness to ensure that the components will fit together. Close coordination
with st
aff who manage other IT infrastructures is important.


Content specialization / source analysis

The content specialist / source data analyst role is responsible for reviewing source
objects in a variety of formats (digital or non
-
digital) and from variou
s sources
(locally created/maintained, remote access), and recommending appropriate
conversion and/or integration processes. The analyst also works with end
-
user
application specialists to develop tools and access systems.


Metadata development / managemen
t

Metadata modeling and management is the process of determining the metadata
element requirements and their collection, organization, and maintenance
processes. This role is involved in developing / acquiring the digital library’s
metadata management sys
tem(s).


Repository design and management

The role is responsible for designing, implementing, and maintaining standards and
procedures for metadata and digital object repositories. Responsibilities include
development of object naming conventions, depos
it procedures, metadata registry
procedures, and migration/archiving methods.


Core Technical Support


Conversion/integration format expertise

The format expertise role is responsible for working with the content analyst(s) to
convert and/or integrate obje
cts into the digital library. Personnel in this role are
expert in the appropriate format standards and standards/’best practices’ for creating
and processing content.





KU Digital Library Technical Infrastructure Task Force Report

Page
15

Working Draft


November 1
0, 2000

Quality assurance

The quality assurance (QA) analyst ensures that the data loaded
into the digital
library is accurate. This role identifies potential data errors and resolves them, and
performs all QA tasks necessary for application development.


Database administration and physical database design

The function of applying formal guid
elines and tools to manage the information
resource is referred to as database administration. The DBA is often responsible for
day
-
to
-
day operational support of the database, ensuring data integrity, database
availability, and performance. This role ca
n be split into design and production roles.
The DBA typically performs these tasks:



creates, and modifies as necessary, the physical database structure



evaluates and selects database management software



runs load scripts to handle database loading



monit
ors query and database performance, and query repetitions



tunes the database for performance by analyzing response
-
time problems
and how the database structure can be modified to make it run faster



performs backup and restore operations as necessary



admini
sters user access and security



monitors database capacity



creates proactive monitoring and preventive action systems to avoid outages


Programming
-

middleware, applications support

Equally as important as the database are the middleware applications and d
esktop
tools used for querying, reporting, online analytical processing, or object
manipulation. This role implements / creates and maintains these types of end user
applications and tools. This role must:



determine which of several different implementat
ion strategies makes the
most sense in a specific environment and why



evaluate software



follow design and specification guidelines



develop, test, and document applications


Dataloading

Programmers are needed to construct and automate the data staging and
load
processes. Primary responsibilities can include:



developing and implementing acquired data re
-
engineering software



programming data acquisition and transformation processes



developing and documenting test plans



automating the load process



maintaining
, updating, and monitoring acquisition and loading



searching for causes of incompatibility





KU Digital Library Technical Infrastructure Task Force Report

Page
16

Working Draft


November 1
0, 2000

Operations support / backup

This role provides basic ‘24x7’ systems support and system and data backup
services. Personnel in this role are responsible for init
ial problem response /
resolution and referral of more complex system problems as appropriate. They are
also responsible for implementing backup schedules and recovery procedures.


R&D, testing, experimentation

This role is responsible for researching, te
sting, documenting, demonstrating, and
proposing new technologies for potential application by the DLI.


Training

Staff must be educated on the new technologies, standards, and procedures, as well
as existing system capabilities, data content, and end us
er applications. This role
typically develops initial course materials and delivers them on an ongoing basis.


User Support


Consulting / advising

Personnel in this role are knowledgeable in the basics of various end user tools,
data conversion standards
and processes, metadata systems, etc. and are able to
respond to or refer user questions as appropriate.


Trainin
g

End users must be educated on the system capabilities, data content, and end user
applications. This role typically develops the initial edu
cation course materials, as
well as delivers the education on an ongoing basis.


Technical support

These specialists are involved in early stages of the digital library to perform
resource and capacity planning. During product selection, they ensure com
patibility
with the existing technical environment. Once technology has been selected, they
are involved in the installation and configuration of the new components.


Legal and Policy Issues Support


This role is responsible for developing policy and proc
edures for rights procurement
and IP management; documentation; and monitoring and responding to intellectual
property concerns and access issues over the life of the program.


KU Digital Library Technical Infrastructure Task Force Report

Page
17

Working Draft


November 1
0, 2000

Architecture Components and Standards


One of the hallmarks of a library is t
he ability to provide coherence and context for
access to disparate collections of information resources. This is a critical principle to
carry forward into the digital environment and a distinguishing characteristic that
separates digital libraries from
simple collections of links to electronic objects.


There are several critical factors involved in being able to provide a coherent, contextual
environment for digital resources:




Use of standards for creating objects



Use of standards for describing resour
ces (for access)



A common methodology for access to object types, and



An understanding of object behaviors and user interactions


Component Selection


In order to build this environment, a number of components must be evaluated and
brought together into an

overall architecture. The specific recommendations for
components, services, and tools listed in this report should be viewed as a starting
point rather than an inclusive list that restricts what can be done or used over the
long
-
term. These initial rec
ommendations are weighted toward those specific areas
where standards are already developed or industry ‘best practices’ have emerged.


As the KU
-
DL grows and the environment matures, these initial recommendations
will undoubtedly evolve. However, in ge
neral, the primary criteria for inclusion of a
tool, component, or service in the future should be:



Is there a demonstrated, user
-
driven need?



Does it support an accepted standard or process within the KU
-
DL
infrastructure that is not already adequately s
upported?



Is it required to support interoperability with a critical University, state or
regional cooperative effort?



Is adequate support available for its use?


Incremental changes in components and tools should be determined and
documented within the ge
neral day
-
to
-
day management process of the KU
-
DL with
regular reports to the Digital Library Executive Group.


Changes that could impact the integrity or focus of the KU
-
DL should be approved
by the Digital Library Executive Group in consultation with DL m
anagers and
technical staff.


Based on strategic considerations, selection guidelines, and current digital library
community developments, the initial components needed to build a digital library
environment are described below.



KU Digital Library Technical Infrastructure Task Force Report

Page
18

Working Draft


November 1
0, 2000

Strategic Components


A s
uccessful digital library environment is not an isolated, self
-
sufficient entity that
exists and operates apart from the rest of the information and technology
environment of the institution. It is critically dependent on, and must work within, the
resour
ces and decisions made in many areas including networking, computing
support, information resource support, and information technology policies. A
simplistic model of the tiers of dependency can be seen in
Figure
2
.



Figure
2
. Strategic Components Diagram
7

Access to and use of information resources for the University depends upon a solid
foundation that provides an exceptional network and computing services
infrastructure. Two critical examples of this

dependency are:

A
A
p
p
p
p
l
l
i
i
c
c
a
a
t
t
i
i
o
o
n
n
s
s


&
&


E
E
n
n
v
v
i
i
r
r
o
o
n
n
m
m
e
e
n
n
t
t
s
s


I
I
n
n
f
f
o
o
r
r
m
m
a
a
t
t
i
i
o
o
n
n


R
R
e
e
s
s
o
o
u
u
r
r
c
c
e
e
s
s


a
a
n
n
d
d


S
S
y
y
s
s
t
t
e
e
m
m
s
s


I
I
n
n
f
f
o
o
r
r
m
m
a
a
t
t
i
i
o
o
n
n


I
I
n
n
f
f
r
r
a
a
s
s
t
t
r
r
u
u
c
c
t
t
u
u
r
r
e
e


D
D
i
i
s
s
t
t
r
r
i
i
b
b
u
u
t
t
e
e
d
d


N
N
e
e
t
t
w
w
o
o
r
r
k
k


&
&


C
C
o
o
m
m
p
p
u
u
t
t
i
i
n
n
g
g


I
I
n
n
f
f
r
r
a
a
s
s
t
t
r
r
u
u
c
c
t
t
u
u
r
r
e
e



KU Digital Library Technical Infrastructure Task Force Report

Page
19

Working Draft


November 1
0, 2000


Bandwidth


An excellent networking infrastructure is critical to the support and delivery of
digital resources to the user community. Given the realities of the digital
resource environment, information resources will be distributed acr
oss physical
locations rather than centralized. This distributed nature emphasizes the need
for a universally high level of network support since not only will users be
accessing resources in a distributed fashion, but the resources themselves will
be ser
ved from a variety of locations. Without a solid delivery system, even the
best set of information resources and discovery tools present frustrations and
appear inadequate to the user community.


Access Management

One objective of the Digital Library In
itiatives is to make materials available to as
wide an audience as possible. However, licensing restrictions or other
considerations may require limiting access to some material available through
the digital library to users specifically associated with t
he University.


Access management for network
-
accessible resources has several components:
user
authentication
, user profiling or
authorization
, and resource
-
specific
access
protocols
.


Authentication

is the process of validating the user identity asso
ciated with a
request to perform a given operation on a specified resource.
Authorization
, on
the other hand, is the process which associates a given set of privileges with an
authenticated identity. Privileges are generally assigned on the basis of a us
er
profile, made up of characteristics associated with the authenticated identity. In
combination, the two processes answer the questions
Are you who you say you
are?

and
Are you permitted to do what you have requested?
.


Authentication and authorization
services

should not be specific to the Digital
Library technical infrastructure, rather, the Digital Library should make use of
mechanisms developed for the institution as a whole. For example, services
such as certificates and LDAP
8
-
accessible profile di
rectories could be created
and maintained at the institutional level and applied in the Digital Library
environment.


Network
-
accessible resources are made available through a unique set of
access
protocols and mechanisms

(e.g. URLs, cookies, login scripts
, session IDs, etc.)
necessary to access a resource or resource class. While these mechanisms can
enhance use of digital library services, neither they nor A/A services should
maintain a record linking an individual’s identity to resources accessed, in or
der
to ensure
privacy

for the individual.


Building upon these basic underpinnings, a solid information services infrastructure
is needed. Information infrastructure technical activities center around developing a

KU Digital Library Technical Infrastructure Task Force Report

Page
20

Working Draft


November 1
0, 2000

collection of common systems and services

that make it possible to reliably store,
organize, and access digital objects. How adequate the services and support,
hardware and software support, standards, tools, training, archiving, etc. are will
determine how well the user community is able to mak
e the fullest use of the
resources provided.


Once the infrastructure is in place, it must be populated with resources that address
the needs of the user community. Resources should be added in such a way that
they will address long
-
term access requiremen
ts as well as the immediate needs.


Finally, tying it all together, applications and environments must be built to provide
access to the resources provided. A variety of access and analysis tools,
specialized interfaces for diverse user community needs, g
eneral access as well as
personalized spaces, are all components of this tier.





Local Repositories


The KU Digital Library Initiatives has among its goals the creation of enduring KU
-
specific digital resources, authored by, available to, and supported b
y KU faculty,
staff, studuents, and organizational units. Within this context, the conceptual model
of a
repository

as a set of services and related facilities may be defined. These
services and facilities include:




A datastore containing digital object
s (
content and metadata
) created by, or
under the auspices of, the KU
-
DLI



Services necessary to the smooth operation of satisfying requests for objects
residing in the datastore



Services for effective long
-
term management of objects in the datastore.


Repo
sitory services and facilities can be classed into three basic types or layers:




Core services and facilities

that are integral to the basic functionality of the
digital library



Specialized services and facilities

that are not part of the basic functional
ity
of the digital library but are regularly supported as services available to users,
most likely for an additional fee



Customized services and facilities

that are developed and/or customized
for users under an appropriate fee structure for this value
-
add
ed effort


Services and facilities offered under the KU
-
DL must satisfy the needs of a
recognized segment of the KU
-
DL user community and should be somewhat
narrowly defined, at least initially.



KU Digital Library Technical Infrastructure Task Force Report

Page
21

Working Draft


November 1
0, 2000

These repository services and facilities must be further def
ined within the business
and service models for their support and long
-
term continuation within the KU
environment. However, the common, base technical infrastructure should support:




Persistent storage



Persistent access and retrieval



Persistent object na
mes



Wide availability, as appropriate within copyright or other legal constraints



Access control (authentication / authorization, rights management)


Datastore facilities and services

may be either centralized or distributed, with a
hybrid approach being t
he most likely and practical. Not only will remote resources
be included but potentially, local resources hosted on non
-
DL servers as well. While
these may be included from an access perspective only, certain standards for
persistent access should be met

for materials to be included under the KU
-
DL
umbrella. Specific policies and guidelines should be developed to address:




Standards for content and metadata for deposited or accessed objects



Standards for naming objects within the KU
-
DL environment



Requir
ements for depositing objects within the institutional repository



Requirements for managing these objects


Other considerations will include development of economic models and incentives
for participation in the repository by other institutional units.


R
esource Naming Services


The lack of permanence, over time, of object names is both the hallmark and the
bane of existence of creating persistent access to digital resources.



The current method of discovering and locating resources on the Web relies on

allocating an identifier to all resources. At present, these identifiers are primarily
Uniform Resource Locators or URLs, and are allocated according to the location of
the resource. Although URLs have been serving the combined purpose of
identifying a
resource and describing its location for some time now, they are not a
satisfactory means of uniquely identifying a digital resource. The URL simply points
to the current location of the resource. If a resource is moved to a new location, the
previous UR
L is no longer useful. A persistent and unique identifier, specific to a
given digital resource, that preserves access to that resource regardless of its
location, is necessary for supporting a long
-
term digital library.
9


Names are persistent, location
-
i
ndependent identifiers for network
-
accessible
resources. Names are preferable to URLs since they can be used without regard to
the physical location of the resource.



KU Digital Library Technical Infrastructure Task Force Report

Page
22

Working Draft


November 1
0, 2000

A
Uniform Resource Name (URN)

is a standard, persistent, and unique identifier
for digit
al resources. In a sense, URNs are analogous to call numbers on books in
the library. A call number identifies a book, but does not tell you where it can be
physically found without a guide to the stacks. In the network environment, this
guide is a
name

resolution server
. If the library needs to rearrange the books, only
the stack guide needs to be updated, not the call number on every book. Similarly,
when digital objects need to be moved to a different machine or directory, only the
name resolution s
ervice needs to be updated.


Several persistent naming schemas have been developed including:



Handles (CNRI)



URNs



DOIs



PURLs (OCLC)


To facilitate the use of digital object names within the KU digital library environment,
the KU
-
DLI should adopt the conce
pt of name resolution services; develop and
provide a name resolution server for the campus; and develop and provide a set of
services that permit the KU community to create and manage names for their digital
objects.


Unfortunately, there is no common p
ractice or agreement on the use of the various
naming schemas as yet. The creation of an enduring model and service for digital
objects will require the creation of naming services that can be used now, and
migrated as commonly used standards emerge in th
e future. A potential model that
should be explored further within the KU context is under development at the Office
for Information Systems, Harvard University Library (HUL).
10


The HUL model involves development of a system of naming domains called
names
paces
. A namespace defines a set of rules under which names can be
formulated. Every namespace has a unique identifier (e.g. hdl for Handles). Names
created within a given namespace, by definition, do not conflict with names created
in another namespace
, because the namespace identifier is always part of the name
and differentiates it.


A name has three components: a
namespace identifier

(nid), a
separator character
(:), and the
namespace string

(nss or name) itself. Syntactically, the form of the
nam
e is:




nid: nss


Within the namespace, additional rules should be defined that permit name creation
and maintenance to be easily distributed within the organization, depending, of
course, on the rules governing access to the repositories. These namespac
e rules
depend on the concept of
naming authorities

and
authority paths
. Authority paths
are used to define compartmentalized subsets of the 'nid' namespace in which
names are created. Thus, the full syntax for a name under this scheme would be:


KU Digital Library Technical Infrastructure Task Force Report

Page
23

Working Draft


November 1
0, 2000





nid:

authority
-
path: resource
-
name



Since this model allows for both central and decentralized management of objects in
the namespace, in addition to developing naming conventions, policies and
processes must be developed to create organizational naming autho
rities and
outline the responsibilities for maintaining the validity of names over time.


Object Classes and Services


Early digital resource development has sometimes been referred to as the ‘age of
digital incunabula’.
11

Faced with a lack of standards fo
r resource creation and
navigation, many unique approaches were taken to move resources into the digital
arena. While these efforts were invaluable in exploring a range of possibilities, they
often presented obstacles when trying to combine a variety of r
esources in a broader
context. By defining and adhering to standards for resource creation, some of these
obstacles can be removed.


Along with unique methods of resource creation, a number of approaches to
providing access to these resources have been
tried. A common approach has
been to pre
-
define collections of materials and build tailored systems to access
them. While this approach works adequately for small numbers of resources and
systems, it becomes less useful as the numbers grow and users are
required to
select and individually search many sites, each with their own idiosyncrasies.


An emerging trend is to provide a more generalized approach to
classes
of objects,
based on the common characteristics of those objects and user interactions with
t
hem. One hypothetical class / behavior schema would divide objects into the
following categories (
Appendix C: Object Classes & Behaviors


Background
Information
):



Bibliographic



Text (Monographs, Reference, Jour
nals, Dictionary )



Images (Bi
-
tonal, Continuous tone, Video)



Audio



Geospatial



Numeric


These classes/subclasses have common object and/or user behaviors associated
with them that can be drawn on to develop common interfaces and sets of access
and manipulat
ion services. By standardizing the systems and tools designed for
object classes, development efforts can be maximized, management can be more
efficient, and the access environment can be made more coherent for users.


Standards for Content and Metadata



KU Digital Library Technical Infrastructure Task Force Report

Page
24

Working Draft


November 1
0, 2000

A key factor in the provision of a coherent, unified environment is the use of
standards, whenever possible, for the many different aspects of storing and
accessing digital information, including standards for: interoperability, data format,
resource ident
ification, resource description, and data archiving.


Standards in the creation of objects and metadata allow common storage, access,
and management processes to be used and economies of scale to be realized for
the institution. Common information search
& retrieval protocols provide a more
easily mastered retrieval environment for users. Archival migration of objects can
be made easier by the use of standards during object and metadata creation.


The use of recommended standards should be required for o
bjects and metadata
created locally under the auspices of the KU Digital Library and stored in the central
object / metadata repositories. These same standards should be strongly
encouraged for objects and metadata created outside the control or coordinat
ion of
the KU
-
DL and stored / accessed remotely (either on or off campus). Objects not
adhering to published KU
-
DLI standards should be accepted for long
-
term
management only under exceptional circumstances. Initial recommended standards
for the KU Digit
al Library include:


Content


Supported Formats and Recommended Standards




Images

The creation of image files in not an end in itself


they are generally
created to enhance access to institutional resources, extend the reach
of institutional collections
, provide digital surrogates or replacements
for original resources, and perhaps minimize institutional costs through
collaborative efforts. Before creating digital images, several questions
must first be answered regarding the intended lifetime of the ob
ject, the
intended use, and the anticipated audiences.
Key among these
questions is whether the intent is to create a digital
replacement

for the original object
(if the original is not “born
-
digital”)
or a usable

surrogate

(based on specified, documented

criteria).
The
requirements for digitization, and long
-
term management, are
quite different depending on the answer.


It is important to note that different standards / best practices are
emerging for different source materials


text images should be tr
eated
differently than visual images such as photographs, for instance. While
standards for
archival/preservation

images are not yet set, community
best practices are emerging
12
. In addition to scanning bit
-
depth, it is
important to choose a lossless compr
ession process to ensure no loss
of information between the original and subsequent files.



KU Digital Library Technical Infrastructure Task Force Report

Page
25

Working Draft


November 1
0, 2000

o

Source of Images / Use:



Master
13
,

14



TIFF using LZW compression (lossless)

o

Textual images



600 ppi for 1
-
bit (black & white)



400 ppi for 8
-
bit (grayscale)



300 ppi for
24
-
bit (color, RGB)

o

Photographs



5000 lines (8
-
bit grayscale)



5000 lines (24
-
bit color, RGB)

o

Maps / Plans / Oversized



300 ppi (8
-
bit grayscale)



300 ppi (24
-
bit color, RGB)



Previewing / Viewing (dynamic generation whenever
possible)



JPEG (continuous tone)



GI
F (multiple levels depending on use)



PDF (600 dpi)



MrSID (wavelet compression)



Audio
15

o

Downloadable files



wav (Microsoft format)

o

Streaming files



RealAudio



Video
16

o

Moderate
-
resolution downloadable files



Image size: 320x240 pixels



Frame rate: 30 fps



Data ra
te: ca. 1.2 MB/S(ca. 150KB/S)



Compression: MPEG
-
1



Format: mpg

o

Low
-
resolution downloadable files



Image size: 160x120 pixels



Color depth: 24 bits/pixel



Data rate: ca. 100 KB/S



Format: QuickTime (Apple Computer format)



File extension: mov

o

Streaming video



R
ealVideo



Text (marked
-
up, see Supported Markup Languages)

o

ASCII

o

Unicode


Supported Markup Languages and Recommended Standards




SGML (archival content)


KU Digital Library Technical Infrastructure Task Force Report

Page
26

Working Draft


November 1
0, 2000



XML (archival and display content)



HTML (display content)



XHTML (archival and display, when standards ar
e finalized and display
mechanisms are available)



MathML (archival and display content)


Metadata


Simply put, metadata is data about data. In order to convey this data in a
meaningful way, three elements are needed:



semantics
, or meaning, as defined by a

community to meet specific needs



syntax
, which is a systematic arrangement of data elements, which
facilitates the exchange and use of metadata among various applications



structure
, which is the formal arrangement of the syntax with the goal of
consistent

representation of the semantics.


Probably one of the more familiar examples of metadata is the library catalog
record which provides information about physical or electronic objects in, or
accessible by, the library. This type of metadata has served to
provide:



discovery and retrieval



identification / veracity / provenance.


As the library environment has evolved in the online arena, the concept of
metadata has also evolved to include additional functions such as:



rights management



interoperability



struc
ture / viewing and



preservation / longevity


This evolution of the functions that metadata serves has resulted in the definition
of several basic
categories of metadata

including:




Descriptive

primarily aimed at searching, discovering, and
retrieving the d
igital object




Administrative

primarily aimed at managing, preserving, and
perpetually identifying the digital object, including
creation data and data that uniquely identifies a
version / edition / instantiation




Structural

primarily aimed at storing and
presenting the digital
object, including navigation, behaviors and use of the
object




Intellectual / Rights

primarily aimed at controlling access to the digital


KU Digital Library Technical Infrastructure Task Force Report

Page
27

Working Draft


November 1
0, 2000

Management

object and protecting and rewarding the intellectual
property rights holders


Withi
n these categories, metadata can be hierarchical (i.e. metadata within
metadata and metadata nesting) in order to “accommodate the diversity of digital
objects and to propagate data with some efficiency”
17
. Metadata elements may
be supplied at multiple lev
els.
Levels of metadata

as defined by the Library of
Congress are:




Set
-
level:

applies to a broader “collection” formed from
aggregates that group objects by content and
custodial responsibility


applies to all objects
within the set




Aggregate:

a group
of objects organized by digital type and
custodial responsibility


can be a digital
“collection”


applies to all objects within an
aggregate




Primary Object:

specific item


usually the digital equivalent of
physical library items


applies to all interm
ediate
objects




Intermediate object:

a view or component of the primary object, e.g., a
book (primary object) can be presented as page
images (one intermediate object) and as plain text
(another intermediate object)


allows the
gathering of digital files
and metadata for the
creation of presentations




Terminal object:

the digital content file or files that is the object


terminal metadata is primarily structural, e.g., size,
extension, bit
-
depth, etc.


In order to maximize the sharing and use of resources
, interoperability of
metadata has become increasingly important, which has led to the development
of standards for metadata data structures, content element syntax, and data
communication.


Standards are mutually agreed
-
upon statements that help control

an action or
product. Data standards promote the consistent recording of information and are
fundamental to the efficient exchange of information. They provide the rules for
structuring information, so that the data entered into a system can be reliably
read, sorted, indexed, retrieved, communicated between systems, and shared.
They also help protect the long
-
term value of the data. Standards are the work of
communities and are necessary so that communities can work together.


KU Digital Library Technical Infrastructure Task Force Report

Page
28

Working Draft


November 1
0, 2000


Several
metadata standards

are already accepted and in wide use in the digital
library community. Their development has usually centered around their primary
application community and has tended to focus on specific content formats.
Some of the most commonly used standards include
:



MARC / MARC21 (
Machine Readable Cataloging
)

http://lcweb.loc.gov/marc/




Dublin Core

http://purl.org/DC/




TEI or TEI Lite DTD (
Text Encoding Initiative Data Type Definition
)

http://www.tei
-
c.org/




EAD (
Encoded Archival Description
)

http://lcweb.loc.gov/ead/




VRA Core (
Visual Resources Association
)

http://www.gsd.harvard.edu/~staffaw3/vra/vracore3.htm




CIMI (
Computerized interchange of Museum Information
)

http://www.cimi.org/standards/index.html




CSDGM (
Content Standard f
or Digital Geospatial Metadata, Federal
Geographic Data Committee
)

http://www.fgdc.gov/metadata/contstan.html



An emerging metadata standard that provides recommendations for
interoperability amo
ng documents archives is the Open Archives Metadata Set
(oams), which is a component of the Santa Fe Convention. The semantics of this
metadata set have purposely been kept simple in the interest of easy creation
and widest applicability. The expectation
is that individual archives will maintain
metadata with more expressive semantics to allow more in
-
depth access
(
http://www.openarchives.org/sfc/sfc_oams.htm
).


In addition to the data structure
s used to convey metadata, there are a number of
standard resources available for determining the
semantics and syntax

of the
metadata content. Again, these are usually application
-

or data format
-

community specific and include resources such as:



AACR2 (
Anglo
-
American Cataloging Rules)



LCSH, (Library of Congress Subject Headings)



AAT (Art and Architecture Thesaurus)

http://shiva.pub.getty.edu/aat_browser/



CDWA (Categories for the Description of Wor
ks of Art)


http://www.getty.edu/gri/standard/cdwa/
)



TGN (Getty Thesaurus of Geographic Names)


http://shiva.pub.getty.edu/tgn_browser/
)



ULAN
(Union List of Artist Names)


http://shiva.pub.getty.edu/ulan_browser/
).



KU Digital Library Technical Infrastructure Task Force Report

Page
29

Working Draft


November 1
0, 2000

The advent of the Internet and the exponential growth in electronic resources has
increased users' demand for th
e ability to search across many different metadata
structures simultaneously. Access to the universe of online resources has now
become the goal of many institutions that manage information resources. This
has motivated institutions either to convert their

metadata to a format more
readily accessible, or to provide a single interface to search many heterogeneous
databases at the same time. Whether the plan is to design search interfaces or
to convert data to a new standard, the first step is to analyze the

information
elements in each database and correlate the discrete information fields in the
different databases that have the same or similar meaning. This is sometimes
referred to as
metadata mapping

or semantic mapping.
Crosswalks

are the
visual represen
tations or "maps" that show these relationships. Mapping
supports the ability of a search engine to query fields with the same or similar
content in different databases; in other words it supports "semantic
interoperability." Crosswalks are not only impor
tant for supporting the demand
for "one
-
stop shopping," or cross
-
domain searching, they are instrumental in
converting data from one format to another that is more widely accessible.
18

The
development of crosswalks is important in that they eliminate the n
eed for
monolithic, universally adopted standards and shift the focus to flexibility and
interoperability.


Examples of existing bi
-
directional crosswalks include:



Dublin Core to MARC

o

Nordic Metadata Project

o

National Library of the Netherlands projects

o

OC
LC CORC



Dublin Core to GILS



Dublin Core to CSDGM



CSDGM to MARC



GILS to MARC



MARC to SGML


Regardless of the apparent benefits of crosswalks, some essential principles to
bear in mind when considering them include:




Crosswalks fill a fundamental need:

o

Th
is is especially true among the various descriptive metadata formats
/ standards. It is essential to be able to constructively migrate one to
another

o

Granularity and specificity of content designation are crucial

o

Conversions are never perfect, there are p
roblems with converted
records:



Complex vs. simple schemes


some data and content
designation may be lost