Download the Presentation

architectgroundhogInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

84 εμφανίσεις

As the Library of Congress began to deal with increasing amounts of digital content,
they faced some issues:


How
do they know what
files they have and
who
they belong to?


H
ow do they
get files
from
where they are to where they need to
be?


The
Library of Congress Repository Development Center
began working on a solution
--

tools
for transfer activities including:


Adding
digital content to the
collections (whether internal or external data)


Moving
digital content between storage systems


Review
of digital files for fixity, quality and/or
authoritativeness


I
nventorying
and recording transfer life cycle events for digital
files


Origin
:
It started with a simple need



Here is what Leslie Johnson
(Library of Congress
contributor) and John
Kunze

(
California Digital Library
co
-
creator) shared about the project’s origin:

Origin
:
It evolved naturally from that need




The name comes from the
concept of "bag
it and tag it
”.
BagIt

allows for the transfer of digital
files by packaging them into a digital

bag” that is accessible
for the
library
to download.


A bag is like a folder or directory on a computer; it can hold documents, photos, movies, music,
or even other folders.


Bags are comprised
of three
main elements
:

1.
A
bag declaration text
file (like
a seal of
authenticity)

2.
A
text
-
file
manifest (tag)
listing the files in the
collection

3.
A
subdirectory
filled
with the digital
content


A bag can also contain an optional text
file with a
small amount of administrative
metadata (e.g.
contact info for the collection owner and a description of the collection)


Once a bag is sent, the
receiving computer
can analyze the
manifest and
run checksums on
the
contents; if the checksums
match (i.e. the files are unchanged),
the transfer is successful
.


It’s that simple!

Origin
:
But what is it exactly?




Working
with John
Kunze

of the California Digital Library, Andy
Boyko
, Justin Littman, Liz Madden,
and Brian Vargas of the Library produced
draft version of
BagIt

(
initially referred to as the “LC
Package
Specification”) in December 2008.


This was posted on the LOC and California Digital Library sites and as an internet “Request for
Comment” (RFC).


It was also promoted on blogs, in conference presentations, articles, etc. NDIPP strongly
encouraged partners to “bag” their content for transfer.



Through the process, project
managers
began learning what was still missing
and where
the
specification needed clarification.


The team then
launched a Digital
Curation

Google group
to
support the activities of this
participatory
community and encourage open, public discussion
.


BagIt

is now on version 0.97, having undergone several iterative
revisions (6 drafts to date).


Evolution
:
Community involvement




BagIt

was intended to be simple enough for users to work with directly. However, the community increasingly
began to request tools to help with the use of
BagIt
, as well as the source code so that they could develop
their own further tools.


The LOC developed three
initial
scripts
-

key
utilities for the movement and validation of
bagged content
-

and
released them through
SourceForge

on
December 18
, 2008
under a BSD
license (essentially open
-
sourced).

These tools have been rather popular with 4,617 downloads to date (31 this week).


The Parallel Retriever:
automates the retrieval of remote resources such as web pages, files on an FTP
server, or files on a network drive, and then wraps them into a package that meets the
BagIt

specification.


The Bag Validator Script:
checks that a bag meets the standards of the specification (i.e. all files listed in
the manifest are in the data directory, there are no files in the directory not in the manifest, and there
are no duplicate entries in the manifest)


VerifyIt

Script:
verifies the checksums of files in a bag against the manifest each time the files are
moved or copied.


They later released the
BagIt

Library (BIL)


a Java library
to support
key functionality such as creating,
manipulating,
validating, and verifying Bags,
and reading
from and writing to a number of
formats.


A
client
-
side Bagger application
was also underway in 2009.
Bagger is
intended to
provide a graphical desktop
for
the Bagging of content, and ideally
will require no
client
-
side IT support or infrastructure.






Evolution
:
Tools



The
BagIt

tool set became the LOC’s first open source software release. Since then, several
BagIt

specific
tools have been created
to simplify the process in
several programming
environments (it was originally
designed for use with Unix utilities):



Python
BagIt

Library


at least two recent versions exist for this, one completed by
Andrew
Hankinson (
https://
github.com/ahankinson/bagit
) and
the other by Ed Summer
(
https://
github.com/edsu/bagit
).
These libraries can
be used to create
BagIt

style packages
programmatically in Python or from the command
line.


Drupal


Mark Jordan developed a Drupal module for
BagIt

(
http
://
drupal.org/project/bagit
).


Ruby


Francesco
Lazzarino

at the Florida Center for Library Automation developed
a Ruby
adaptation for
BagIt

(
https
://
github.com/tipr/bagit
).


PHP


A PHP implementation of
BagIt

was created by Wayne Graham and Mark Jordan
(
https
://
github.com/scholarslab/BagItPHP
).


RESTful

Bag Storage Proposal
-

Chris Adams developed this draft protocol for serving
BagIt

repositories
RESTfully

(
https://github.com/acdha/restful
-
bag
-
server
).


Evolution
:
Adaptations





“Why
are such transfer tools and processes so important? Transfer processes are not surprisingly
linked with preservation, as the tasks performed during the transfer of files must follow a
documented workflow and be recorded in order to mitigate preservation risks
...
While initial
interest in this problem space came from the need to better manage transfers from external
partners to the Library, the transfer and transport of files within the organization for the purpose
of archiving, transformation, and delivery is an increasingly large part of daily operations. The
digitization of an item can create one or hundreds of files, each of which might have many
derivative versions, and which might reside in multiple locations simultaneously to serve different
purposes. Developing tools to manage such transfer tasks reduce the number of tasks performed
and tracked by humans, and automatically provides for the validation and verification of files with
each transfer event
.”


--

from “
Releasing
Open Source at the Library of
Congress” by Leslie Johnson


Practicalities
:
Where does
BagIt

fit?






Bags are uncomplicated, and are therefore able to transcend differences in institutional
data,
data architecture, formats
and
practices
.


Bags have built
-
in inventory checking (validation) to help ensure that the content is
transferred unchanged and fully intact.


Unlike other packaging tools like zip or tar,
Bagit

does not require special software to extract
the files.


Additionally, in these formats, all individual files included are condensed into a single zip or tar
file. However,
BagIt

creates a logical package where files maintain their individuality and are
simply stored in a traditional folder or directory container.


There
is no limit to the
number / type of
files that can be
transferred through the use of
BagIt
.


Bags are flexible and can work in many different settings


including situations when the
content is located in many different places.


A bag’s metadata is machine readable, meaning that data can be ingested automatically.


Bags can be used over computer networks or through the use of portable storage devices.


Practicalities
:
What’s so special about
BagIt
?


As of
2009, a
significant percentage of the 130 NDIIPP partners
were already utilizing
the
BagIt

specification in their preservation transfers to the Library.



A few of the organizations who are using
BagIt

include:


The University of Virginia Libraries


The Stanford Digital Repository


Archivematica


Ghent University Library


The Dryad Data Repository


The University of North Texas


Central Connecticut State University


Towards Interoperable Preservation Repositories (including the Florida Center for Library
Automation, Cornell University, and New York University)


Practicalities
:
Who Is Using
BagIt
?


The Stanford Digital Repository:
Having had success
using
BagIt

to
move geospatial data from the National Geospatial Digital
Archive project from Stanford to the Library of
Congress, they settled on
BagIt

as the primary transfer format for content being
deposited into their repository (ingest stage of OAIS)

(
http
://
www.dlib.org/dlib/september10/cramer/09cramer.html
)
.



Ghent
University

Library:
They currently use
BagIt

as
archival format for
their digital collections. They also use it as an
interchange format for the addition of new
external collections
(e.g. Google Books)
to the local repositories.
http://
www.slideshare.net/hochstenbach/grep
-
ghent
-
university
-
repository




The Dryad Data Repository
:
(a repository of data underlying scientific publications) is using the
BagIt

specification to share
data and related metadata with
TreeBASE
, a repository of phylogenetic information
.
http://
wiki.datadryad.org/BagIt_Handshaking




Towards
Interoperable Preservation Repositories (TIPR
):
is a partnership between the Florida Center for Library Automation,
Cornell University, and New York University to
develop, test and promote a standard interchange format for exchanging
information
packages
among OAIS
-
based repositories. The proposed
format
is using the
BagIt

specification to exchange
package bundles via HTTP
.
(
http://wiki.fcla.edu:8000/TIPR
);
(
https://github.com/tipr/bagit/
)



Practicalities
:
BagIt

Usage Highlights




The North
Carolina State
Archives has provided a set of 10 thorough tutorials to explain the
BagIt

process. The first video includes a summary of the steps involved; the second set
explains the installation process; and the third details creation and verification step
-
by
-
step:
http://
www.youtube.com/playlist?list=PL1763D432BE25663D&feature=plcp




The
NDIIPP
-
funded
GeoMAPP

project has published a
BagIt

User Guide
that can be found at:
http
://
www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf



The Library of Congress
NDIIPP Partner Tools and Services Inventory

page includes a brief
description of
BagIt
, a PDF of the latest version of the
BagIt

specification, links to some of the
BagIt

tools, and a brief video demonstrating the
BagIt

process:
http://www.digitalpreservation.gov/partners/resources/tools/index.html#b



The Process
:
Tutorials

Four Steps
to use
BagIt






Prepare Files
for Transfer





Create &
Verify Bag









Copy &
Verify Bag



The process is as simple as 1, 2, 3, 4…

Extract Files
for Use



Image courtesy of the GeoMapp.net
BagIt

Guide

http://www.geomapp.net/docs/Using_BagIt_ver2_geomapp_FINAL_20110321.pdf












Prepare
files for transfer



A bag must have three things


a bag declaration, a list of the content files
(manifest), and the content
itself


Validate content and metadata


Perform virus check (
suggested)












Create and verify
the bag



Attach portable drive to computer (or use shared drive)


Create a
new

folder to serve as the holding place for your bag


Use the “
BagIt
” command to create the bag on this drive


Verify the bag by using the “
verifyvalid
” command















Copy the bag to a staging
area


Validate the received bag


Run virus check software on the bag

Copy and Verify
the bag











Extract
files for use



Unpack
the bag


Your files are now ready for use!


Lack of
information:
The LOC website contains little information aside from what is
included in their brief 3 minute video and short printed description. It’s hard to
find much more via outside online sources either. It would be useful to have
further example implementations to really understand how it can be used and
what the advantages are over other formats such as zip files.



Learning curve
:
Most of the documentation language is complicated, and would
not be easy to understand by the average person.
BagIt

doesn’t currently have an
easy to use GUI interface to make the process simple for non
-
techie users. Bagger
may help with this, but there is little information out there about the Bagger
interface.


Challenges
:
Limiting Usage Factors







?

And that concludes our tour
of
BagIT



Any Questions?




Additional Sources

"
BagIt

File Packaging Format."
IETF Documents
. Internet Engineering Task Force, 15 Apr 2011. Web. 1 Apr 2012.


<
http://tools.ietf.org/html/draft
-
kunze
-
bagit
-
06>.


BagIt
: Transferring Content for Digital Preservation
. 2009. video. The Library of Congress, Washington, DC.


Web
. 1 Apr 2012. <http://www.digitalpreservation.gov/multimedia/videos/bagit0609.html>.


Johnston
, Leslie. "Releasing Open Source at the Library of Congress
. "
OCLC
Systems & Services: International Digital

Library
Perspectives
. 26.2 (2010): 94
-
102.


Johnston
, Leslie, and John
Kunze
. "
BagIt

funding and versions." 29 Mar 2012.
N.p
., Online Posting to
Digital
Curation


Google
Group
. Web. 1 Apr. 2012. <http://
groups.google.com/group/digital
-

curation
/
browse_thread
/thread/ace8eafae819762b?pli=1
>.


Lavoie
, Brian. "The Open Archival Information System Reference Model: Introductory Guide."
Technology Watch

Report
. 04
-
01 (2004).


Lazorchak
, Butch. "From There to Here, from Here to There, Digital Content is Everywhere
!."
The Signal: Digital

Preservation
. The Library of Congress, 3 Jan 2012. Web. 1 Apr 2012.

<
http://
blogs.loc.gov/digitalpreservation/2012/01/from
-
there
-
to
-
here
-
from
-
here
-
to
-
there
-
digital
-

content
-
is
-
everywhere
/>.


Willett
, Perry. "
BagIt

File Packaging Format." California Digital Library, 10 Feb 2012. Web. 1 Apr 2012.

<
https://wiki.ucop.edu/display/Curation/BagIt>.