Putting it all

tansygoobertownInternet and Web Development

Dec 8, 2013 (3 years and 8 months ago)

69 views

Putting it all
together for
Digital

Assets

Jon Morley

Beck Locey

LDS Church History Library


Our collections consist of manuscripts, books, Church records,
photographs, oral histories
,
architectural drawings, pamphlets,
newspapers, periodicals, maps, microforms, and audiovisual
materials. The collection continues to grow annually and is a
prime resource for the study of Church history. Our
collections
contains approximately:


270,000 books, pamphlets, magazines, and newspapers


240,000 collections of original, unpublished records (journals,
diaries, correspondence, minutes, etc.)


3.5 million
blessings
for Church members


13,000 photograph collections


23,000 audiovisual items


Digitization Objective


Patrons



Globalization


To provide
access to library content
to 14 million church
members and the public throughout
worldwide. Remote sites for new content.


Internal Operations



To create an
automated
digital pipeline

from digitizing
content to patron consumption.

Extending to crowdsourcing in future.

Agenda


The first half of this presentation
will focus on “what” we want to
accomplish with an automated
digital pipeline.



The second half will focused on
“how” we built the digital pipeline.

DIGITAL PIPELINE

What we want to accomplish

Digital Content

Aleph

(Master
Record)

Primo

(Discovery)

Rosetta

EAD Tool

(Encoded
Archival

Description)

Church

History

Library

Physical Assets

2010

2011

Rosetta Dual Role

Two instances of Rosetta

1.

DRPS


Digital Records Preservation
System (dark archive)


2.
DCMS


Digital Content Management
System (public display)




Digital Pipeline

Rosetta

EAD Tool

(Encoded
Archival

Description)

Aleph

(Master
Record)

Primo

Ingest Collection

PID

555 tag

Harvest

& Index

855 tag

EAD Tool
Interface

Click on
Thumbnail

Patron can a item to be
digitized.


Organizing and
displaying

Collections

EAD Tool

(
Encoded
Archival
Description)


A finding aid that adds a viewable structure to a collection.






-
Many
large, complex
collections

-
Aleph stores only the
collection

level descriptive
metadata

-
Add
/ edit / delete
component

level
metadata

-
Rosetta 3.0 doesn’t have enough functionality for our
collections.



EAD Tool
-

Staff View



EDIT

Drag and drop

XML or CSV files

Challenges


Configuration is fairly complex within and between systems


Large Batch Ingest has been difficult


Re
-
ingesting content into Rosetta
has been
problematic
(preservation system)


Collections Management


Built
EAD tool
to manage collections


Serials?


PDF
Content


Progressive PDF
download in Adobe Reader X
is not fully
supported
until Rosetta v3



2 searches to see text within a PDF


Successes


Ingested over 57,000 Family history books ranging
from 1 MB to 800 MB for the Family History
department.
(books.familysearch.org)


Ingested 700,000 files. Most are linked to
collections for Church History library.


Consistent user experiences with large files


Good response times


75,000 views / Month


Once the pipelines are established, they work well.


Able to create customizable solutions for multiple
institutions.


Under Development


Restricted content


Based on IP addresses


authentication


Multiple viewing experiences (Responsive
Design)


Reporting capabilities


Monitor usage

DIGITAL PIPELINE

How we glued it all together

Digital Pipeline

Aleph

(Master
Record)

Primo

Rosetta

EAD Tool

(Encoded
Archival

Description)

Church

History

Library

Catalog in Aleph


Physical assets are
cataloged in Aleph
(collection level).



Staff assigns a call
number and Aleph
assigns a BIB number.



While browsing, a patron
requests that the asset be
digitized.

Aleph

Call #: MS 2877 4

BIB #: 000114027

Digitized into Rosetta

Rosetta


The asset(s) from the
Aleph collection are
digitized.


A custom tool (SIP tool)
queries the Aleph SRU
server using BIB # to get
collection level metadata.


Digitized assets, BIB #
(CMS ID) and metadata
are ingested into Rosetta.
PIDs get assigned.

Aleph SRU

BIB #: 000114027

Call #: MS 2877 4

Title: Parley P. Pratt…

000114027

MS 2877 4

Parley P. Pratt…

+

Query

Response

Rosetta to Aleph / EAD Tool


Every night a custom script
(
cron

job) queries Rosetta
using the OAI harvester.



The PIDs and item titles from
Rosetta are inserted into the
EAD Tool or prepared for
Aleph (856 tags).


Every night a custom script
(
job_list
) ingests the metadata
into Aleph.

Rosetta OAI

Aleph

Query

Response

Dublin

Core

EAD Tool

From Aleph to Primo


Every night Aleph publishes
all data sets to Primo (publish
-
06 in
job_list
).



Aleph data sets include
collection level metadata plus
856 tags (Rosetta PID) or 555
tags (EAD link).


During harvest, Primo creates
links from 555 or 856 tags
which point to the EAD Tool
and Rosetta respectively.

Publish

MARC

XML

Aleph

Primo

Digital Pipeline Notes


Aleph BIB # provides the linkage between
the various systems.


Call # provides the reference


Linux scripting glues it all together.


Cron

jobs,
job_list
,
wget

and custom logs work
together to get data, re
-
format data, move data
and start new jobs.


NFS shares allow us to move data around
easily. Rights are a bit of a hassle.


Timing matters.


Pull from Rosetta, update EAD Tool, send to
Aleph, publish to Primo, and run Primo pipes.

Backup Slides

Rosetta OAI Harvester


Linux script using
wget

requests
Rosetta data from last 24 hours


Query: by publication set with from / until


Response: Dublin Core

Wget

http://<host>.<domain>.org/oaiprovider/request?verb=ListRecords&
metadataPrefix
=
oai_dc&set
=<pubset>&from=2012
-
08
-
20T20:00:00Z&

until=2012
-
08
-
21T19:59:59Z


Rosetta OAI

Query

Response

Linux script

Aleph SRU Server


Aleph SRU server response to Rosetta


Query: by BIB # (CMIS ID)


Response: Dublin Core (or MARC XML)

Query

Response

Rosetta

Aleph SRU

Aleph SRU Server URL


http://<your
-
aleph
-
host>.<your
-
domain>.org:5661/
<xyz01>?version=1.1&operation=searchRetrieve...


…&query=
dc.callno
=“MS 318”&
maximumRecords
=1


…&query=rec.id=000082419&maximumRecords=1


…&query=
dc.title
=“Book of
Mormon”&
maximumRecords
=5


…&query=
dc.subject
=
Apostles&maximumRecords
=10


…&query=
dc.creator
=“John
Taylor”&
maximumRecords
=10