Erin Kinney, Wyoming State Library

tansygoobertownInternet και Εφαρμογές Web

8 Δεκ 2013 (πριν από 3 χρόνια και 10 μήνες)

66 εμφανίσεις

Erin Kinney, Wyoming State Library

Motivation


#1 priority that came out of 2004 statewide
digitization meeting


WSL received many reference questions, obituary
and ILL requests

Digitize all newspapers published in

Wyoming 1849
-
1922* and make them

easily accessible over the internet.



*Preserved on microfilm at the Wyoming State Archives


Project

Project


1,436 microfilm reels


850,000 full pages


8,000,000 clippings


Funding


Applied for
a 2006
NDNP grant, which was not
funded.


The
Wyoming State Legislature
appropriated
$940,000
to the State Library in
FY07
-
08.


Requested additional funding in FY09
-
10
which was later denied.

Wyoming State
Library

Wyoming State Archives

University
of Wyoming

American Heritage Center

Wyoming
Press Association

Wyoming State Historical Society

The Partners

Partners


Wyoming State Archives provided copies of
master microfilm reels


Wyoming State Historical Society provided
metadata workers


The company picked to do the work was PTFS
from Bethesda, MD


Why PTFS?


Expertise: people, process, software, hardware


More than ten years imaging experience


All media types, qualities, formats


Many hardware and software configurations


R&D


Development of an archiving system has helped PTFS perfect
imaging capabilities

Technical Requirements


All text searchable


Content management system (CMS) with a web
interface, and a customizable thesaurus


Very powerful search engine

Technical Standards


Project followed 2007 NDNP best practices


High Accuracy OCR

&

Auto
-
Zoning


400 dpi grayscale


Enhanced

metadata


Articles,

legal/land notices, and advertisements clipped


Digitization Processes

Receive newspaper
microfilm reels;
Inventory control

Categorize,
sort
,
prepare

Scan
Microfilm
at 400 dpi

Export to USB
External Drive

Enhance
metadata

Post Image

Processing

Data
formatting
for
system

QC/QA

OCR images

ArchivalWare
Approval
Server

Zone, crop & de
-
skew full images


Create
image/text
PDFs: full page
&
clippings

File

Approx. Size

1 reel strip image (~1000 pages)

50 GB

2 page up TIFF images

50 MB

Archive TIFF images

30 MB

Uncompressed page level PDFs

5 MB

Compressed page level PDFs

500
-
900 kb

Clipping PDFs (uncompressed)


100
-
800 kb

Image Sizes

PTFS Confidential

Challenges


Image Quality, OCR Accuracy


Difficult to achieve high OCR accuracy


Original text quality varies : yellowed paper, bleed through,
faded, bound page curvature


Microfilm quality varies


Dark borders, washed out sections, out of focus


Misc: Scratches, Thumbs, Tape, Staples!!


Grayscale best, but results in large files sizes

PTFS Confidential

Challenges


Rules for zoning (for clipping) are complicated
to design and execute


Newspaper formats vary widely from title to title &
year to year


Determine zoning rules and consistently follow


PTFS Confidential

Challenges


“NDNP Ready”


Imaging & metadata standards, XML packets


Massive storage requirements

many, many terabytes of storage


File types
: TIFF and PDF


Browse hierarchy


Determines organization of collection


Supports logical presentation


Page & clipping relationships



PTFS Confidential

Solutions


Browse Hierarchy


Organized by county/city, then newspaper title, year, month, date


Pages and clippings will be presented together


Page & clipping relationship


Clippings linked to pages


Archive quality image location


Archive quality images transported via USB external drive and backed
up to tape (twice!)


Lessons Learned


Lots of open communication, between all partners,
contractor and sub
-
contractors


Start looking for money early, but make sure you
have your ducks in a row. Don’t get discouraged.


Lessons Learned


Think of the long term implications of decisions
made at the beginning of the project


Decisions made
at the beginning of the project can
have unforeseen, and often huge, implications.

Opportunities


Fill in gaps in coverage


Orphan papers

publishers and even towns that no
longer exist


“New to us” newspaper titles that haven’t been
located and microfilmed yet

Wyoming Newspaper Project
Contact


Erin
Kinney, Digital Initiatives Librarian

erin.kinney@wyo.gov



http
://wyonewspapers.org