catbackup_instructions.doc - Homepages

completemiscreantData Management

Nov 28, 2012 (4 years and 6 months ago)

244 views


1

Implementing a Backup Catalog…on a Student Budget



Setting Up a CatBackup Machine


Intended Audience: Novice


You have some SQL experience, some Perl,


are comfortable with everyday Unix commands,


can use a Unix
-
based text editor.

Intended Installation:


Academic institution.


Hardware and O.S.

Obtain a suitable PC. It doesn’t have to be very big or very fast (if you have a larger collection, build times will
benefit from a faster machine). Our first machine ran at 450Mhz. The rack machine we presently us
e is a
900Mhz Pentium III. It has three drives in a RAID configuration, resulting in about a 33gig “drive”, similar in
size to the first machine. We’ve used RedHat 7.2 and 7.3 in the past, and are now using 8.0 on our production
machine.


RedHat still has
RedHat Linux (currently version 9) as we know it available through April 2004. Go to
http://www.redhat.com/download/products.html

to download (free). After that, I imagine that it will be
available from the many mirror sites for some time to come. If you
want to buy it, you’ll probably want to

buy it now. Fedora is what’s replacing it; RedHat says it’s for “developers and early high
-
tech enthusiasts using
Linux in non
-
critical computing environments”. I think that’s their way of saying that they’re not goi
ng to
officially support it. Go to
http://www.redhat.com/solutions/migration/rhl/

for details on Fedora. To
download Fedora, go to
http://fedora.redhat.com/download/
. If you’re feeling adventurous, you could try
one of the many other Linux distributions.


The instructions here address RedHat Linux 8, but should probably be the same, if not similar, for the other
versions or distributions.


If you want to download, you’ll need to have a fast connection. You could do it over a phone modem , but it
would take
a very long time (days) if it worked at all. You’ll be downloading around two gigs of data in three
pieces. Select “burn disk image” when you put the .iso files on your blank CDs.


If you’re new to the system administrations aspect, I recommend you look at

the RedHat installation guide and
if you feel uncomfortable about doing some of that, having a knowledgeable person assist you might be a good
idea. Security consideration: you may want to consult with someone knowledgeable on this to make sure that
you w
ill have an at least reasonably secure configuration.


Install Linux on your machine (start with disk 1). The install process (wizard) walks you through the steps.
You’ll want to setup adequately large partitions except for /usr/local, which should be “fil
l to disk”, or “free
space”. We have the following partitions on our catbackup machine:




2


/



1024


/boot



96


/home


1024


/opt


1024


/dev/shm



256


/tmp


1024


/usr


3072


/var


1024


/usr/local

free or remaining space


The amounts

in the right column are approximate allocations. You can choose Disk Druid or fdisk to do this.



Software

Your installation probably includes Perl and Apache; you may want to install newer (more secure?) versions. I
had Perl 5.6.1, and Apache 1.3.23 (tes
t bed machine not accessible outside the university, so that’s ok;
production machine has newer versions installed).


You’ll need to make some changes to the
httpd.conf

file for Apache; do this as root. This file is often found in
/etc/httpd/conf
. If it’s
not there:



find /

name httpd.conf

print


This should show you where to find it.


Changes needed for httpd.conf:


(ServerAdmin section)



ServerAdmin

some_suitable_account@your.IP.address.here


(ServerName section)


ServerName

your.IP.address


(
various places)


Options


FollowSymLinks
ExecCGI


(occurs more than once, leave defaults as is. you may need to add ExecCGI)


(search for this line)


AddHandler

cgi
-
script.cgi (and uncomment it)


(ScriptAlias section)


ScriptAlias

/cgi
-
bin/
"/var/www/cgi
-
bin/"


(Aliases section, add the following)


Alias /images/ “/var/www/images/”


(DocumentRoot section)


DocumentRoot

/var/www/html



3

Your IP address refers to your catbackup machine. Some of these settings may already be as desired. If you

will
have multiple databases on your catbackup machine, you’ll have to create virtual URLs for each
database/catalog. You should find templates for this in the file. I used name
-
based virtual hosting. (Some of the
above lines will be in each virtual host
definition section, and will be different for each virtual host. You still
should do the above.)


For the remainder of the software, you’ll want the newest stable versions. In each case, I got the xxx.tar.gz files.


Postgres

Download Postgres from the Post
gres site,
http://www.postgres.org
.


Install it:


(These instructions are for version 7.3.4 and the commands may be the same for your version)



gunzip file.tar.gz


tar

xvf file.tar


cd
to the
postgresql
-
v.v.v directory
, (source distribution), which w
ill be a subdirectory “down” one level
from the one in which you unzipped and untar’d the original file (the v’s represent the version number).


Enter the following commands:



./configure


gmake


su (to root, if you’re not already)


gmake install


useradd SOBackup


mkdir /usr/local/pgsql/data


chown SOBackup /usr/local/pgsql/data


su


SOBackup


/usr/local/pgsql/bin/initdb

D /usr/local/pgsql/data


/usr/local/pgsql/bin/postmaster

D /usr/local/pgsql/data >logfile 2>&1 &


/usr/local/pg
sql/bin/createdb test


/usr/local/pgsql/bin/psql test


\
q


These last two commands create a test database and log you into it;
\
q

logs you out of the database. While
you’re logged in, issue the
\
l

(slash ell) command to get a listing of databases. You
should see a database called
SOBackup. If you don’t, log out and use the createdb command above to create it.

Test things out to make sure all is ok.


Making sure Postgres starts on system bootup:


Login as
root


cd /etc/rc.d/init.d


cp xxx/contrib/sta
rt
-
scripts/linux postgresql


where xxx is the complete path spec for your source distribution directory referenced above.



4

You’ll have to edit this
postgresql

file…

PGUSER
should be

SOBackup


chmod 755 postgresql


To verify that Postgres starts up, rebo
ot and then:

ps

ef | grep postgres


As root, give the SOBackup account a password:

passwd SOBackup


Supporting Software:

Get the tar’d and gzip’d modules, following this procedure:

go to the URL for each piece of software

click on the distribution for th
e module you’re getting

click the download button on the following page.


Get DBD (for Postgres) and DBI from the CPAN site.

http://search.cpan.org/author/TIMB/


for the DBI module, and (1.34 9/03)
http://search.cpan.org/author/DWHEELER/

for the DBD
-
pg mod
ule (1.22 9/03)


Get the Unicode modules from CPAN also.

http://search.cpan.org/author/MSCHWARTZ/

for Unicode
-
Map
---

(0.112 9/03)

http://search.cpan.org/author/GAAS/


for Unicode
-
Map8
---

and Unicode
-
String
---

(2.07 9/03)

http://search.cpan.org/author/DANKO
GAI/

for Jcode (0.83 9/03)

http://search.cpan.org/author/SNOWHARE/

for Unicode
-
MapUTF8
---

(1.09 9/03)


The numbers in parentheses at the ends of the lines indicate the most current version available as of that date.

DBI and DBD enable the catbackup softwar
e to talk to the Postgres database(s). The other modules are used for
the Unicode character translation (you may not need them, depending on your Voyager version and Perl
version).


These modules must be installed in the specified order:



DBI


DBD
-
pg


Jcode


Unicode
-
String


Unicode
-
Map8


Unicode
-
Map


Unicode
-
MapUTF8


The procedure is the same for each and you should be root.

gunzip and untar the module, then cd to its directory just created.

Look at the README file; make sure your version of
Perl is new enough.

Issue the following commands, and scan the output from each to make sure that things are ok:



perl Makefile.PL


make


make test


make install



5


Creating the Home Environment

Log in as user SOBackup, and in that home directory,
create the following directories:


pwd (should show /home/SOBackup, if not cd to it)


mkdir Build


mkdir control


cd Build


mkdir logs


Protect the SOBackup directories to facilitate web access:



cd ..


chmod 755 control


cd /home


chmod

755 SOBackup


Log in as root and make the following directories you’re creating belong to user SOBackup:



cd to /usr/local


mkdir SOB


chown SOBackup SOB


chgrp SOBackup SOB


cd SOB


mkdir indata


chown SOBackup indata


chgrp SOBackup ind
ata


mkdir temp


chown SOBackup temp


chgrp SOBackup temp


mkdir words


chown SOBackup words


chgrp SOBackup words


Log in as SOBackup:



cd /home/SOBackup (if you’re not already there)


ln

s /usr/local/SOB/indata indata


ln

s /usr/loca
l/SOB/temp temp


ln

s /usr/local/SOB/words words


These last three lines create soft links to these new directories. Above, you created directories in
/usr/local
.
Creating these links to the actual directories makes it look as if these directories are i
n
/home/SOBackup
. These
directories will contain a number of huge files that would not fit in the
/home

partition. Recall that when
/usr/local

was created, it got all the remaining space on the hard drive. This is where all the incomng and in
-
process data
is located. Recall from the Postgres setup that its data also resides in the
/usr/local

partition.


(If you will be using multiple databases, you’ll want to keep things separate. I created subdirectories under
words

and
indata
;
temp

is ok as is.)


In order

to facilitate SOBackup’s use of Postgres, you’ll need to add its binaries’ location to your path.


6

Edit

.
bash_profile

(unless you set up a different shell) in
/home/SOBackup
.

You’ll find a line in there that looks something like the following, but may cont
ain more data:



PATH=$PATH:$HOME/bin

Using this example, you should end up with the following:



PATH=$PATH:$HOME/bin:/usr/local/pgsql/bin


Note that the colon “:” separates the individual entries in the line.



Getting Catbackup Specific Files


There

is a number of files that are unique to the catbackup implementation. These are available from
http://homepages.wmich.edu/~zimmer

Go to the section for EUGM 2004. Click on
catbackup.tar.gz

to
download it. gunzip and untar it (see earlier instructions). T
his process will create, off of your current directory,
a directory called
generic
. You should print out the
readme.txt

file there and follow its instructions on
customization. Further subdirectories contain the catbackup files, organized by category.



Fi
les for the Build Process


Login as SOBackup



cd /home/SOBackup/Build


Put the build files in here:



buildX.script


buildX


buildnew.pl


dobuild


marc2unicode.pl


creserve.pl


split.pl


Xcallbibload.sql


dict.pl


Xindexes.sql


dep
t.map


where X represents the database. This is useful when more than one database is being used on your catalog
backup machine. If you have only one database, you can leave the X part off. The rest of the installation refers
to a single database installat
ion. The order of the files shown above is also the order of execution.

Marc2unicode is presently required with Perl 5.8.0. Once you upgrade to a Voyager version that works with

Unicode, this step probably will not be needed, and your build process would
take appreciably less time.


buildX.script
is the “cron” script to call buildx.


buildX

merely provides a convenient way to call buildnew.pl, supplying it with X.


7


buildnew.p
l handles the marching of the database versions through several generations, and
creates the current

version of the database.


dobuild

is the backbone of the build process, and logs all activity.


The incoming data file is gunzip’d.


marc2unicode.pl

does the MARC to Unicode translation on the data file.


creserve.pl

processes the cour
se reserve and opac suppression information in there. Some




housekeeping/clean up is done.


split.pl

takes this preprocessed data and prepares it for use on the catbackup machine.


Xcallbibload.sql

creates callnumber and bib tables and indexes. This run
s when
split.pl

is done,



but some overlap in processing is possible. You may need to adjust the value of the “wait”



variable in the
dobuild
script to avoid this.


dict.pl
creates a dictionary of searchable words.


Xindexes.sql

builds the other tables

and indexes that make up a backup catalog database.


Then “pointers” to the generations of the database are updated.


Lastly, the oldest version of the database is deleted.


dept.map

contains data used for the course reserve handling.


There are two fiel
ds, the first is a department abbreviation (6 characters), and the second is the full

department name (remaining characters). This file is used in
split.pl
. You will have to populate this

file with data for your institution.


Note on
creserve.p
l: this pr
ogram, along with the
dept.map

file, ensures that items suppressed in OPAC, but
on course reserve, do show up in the backup catalog.


Apply the following protections:


chmod 755 *.pl *.sql *.script dobuild


chmod 664 dept.map wmuDB


HTML File

This goes

in
/var/www/html
.

It’s what the user sees when first interacting with your catbackup implementation. The user enters parameters
and starts a search here. This file should be called
index.html
, or you can call it something else, or put it
elsewhere, and ma
ke
index.htm
l a symlink to that other file (see use of the
ln

command above). You will have
to customize it for your installation, particularly the “
Form action…
” line, which needs a URL that references
your catbackup machine.

Protect this file:



chmod
755 index.html
.


For multiple databases, you’ll need separate HTML files for each database. I put them in their own directories
under
/var/www
.


Images

Login as root.


cd /var/www/html


8


mkdir images


cd images


Put these files there:


search4.gif



searchagainbutton.gif


searchagainexplain.gif


refinesearchbutton.gif


refinesearchexplain.gif


go2topbutton.gif


xxxsmall_logo.gif


xxxinstitution_logo.gif


The xxxsmall_logo.gif is an optional file supplied by you. It would contain a small
copy of an image associated
with this catalog (the one you’re backing up on this machine). You’ll probably want a different one for each
catalog if you have more than one. The xxxinstitution_logo.gif file is also optional. It typically would be an
image of

your institution logo.


cgi
-
bin

Install XsearchMe.cgi and, as root, apply the needed protection:



cd /var/www/cgi
-
bin


chmod 755 XSearchMe.cgi
.


Edit it, searching for
http
, then change the url in those lines to point to your machine. For multiple da
tabases,
you’ll have multiple
XSearchMe.cgi

files here, otherwise drop the X identifier.



HOW TO FEED YOUR CATBACKUP MACHINE


On Your Voyager Machine

There are several files needed for the extract process:


extractX.script

gets scheduled to run in cron. I
t calls
doextract
.


doextract
does just what its name implies.You’ll need to put the readonly username and password


for your database in the script.


getX.sql

queries the Voyager database to get the data for this backup extract, and sorts and gzips the re
sults.


X.transfer

handles the secureftp file transfer.


trigger.X
is copied to the transfer directory. when the cron process on catbackup finds it, the build process


starts up.


/m1/scratch

is used as a data staging area for the extract process.



9

Scrip
ts for cron

On your Voyager machine, run
extractX.script

from cron. On your catbackup machine,
buildX.script

should be scheduled in cron. It should run as SOBackup.


How to Use cron

A “cron job” is a task scheduled to run via Unix’s cron process. It takes
its instructions from a crontab file.
crontab

l

lists the contents of your crontab file,
crontab

e

invokes
vi

(sorry!) on the file so that you can
edit it. The contents of the crontab file follow a specific format:



[minute] [hour] [day of month] [m
onth] [day of week] task
-
to
-
run


0
-
59 0
-
23 1
-
31 1
-
12 0
-
6
(starting with Sunday at 0)


Here’s the crontab entry to run the build for one of our catbackup databases:



0 3 * * 1,2,4,5 /home/SOBackup/Build/buildwmu.script >/tmp/wmub
uild.log 2>&1


buildwmu.script

will run at 3 AM on Monday, Tuesday, Thursday and Friday of each week. It can run on any
day of the month, so we specify “*” for any day, and it will run every month, thus the “*”. The
>/tmp…

redirects program output intera
ctively seen on the screen (stdout) to the specified path and file. If we experience
problems, we can go look at
/tmp/wmubuild.log

to check for error messages, etc. You should schedule your
build runs via cron as user SOBackup.



USE


When the appropriate
scripts have been correctly set up with cron, you’ll have a fresh backup image on your
catbackup machine whenever the scripts run. In our case, we run the extract and build process on Monday,
Tuesday, Thursday, and Friday. We omit Wednesday because we have

a full backup of our Voyager machine
running that night. The build process is finished by around 7 a.m..


Searches of the backup catalog are logged. I created
logcat.pl

to provide usage reports, useful for those times
when we rely on the backup catalog as

the catalog. The search process creates monthly log files. You tell
logcat.pl which logfile to use and which days to look at. The report, as supplied, is set up for our situation with
two institutions on two databases in one Voyager implementation. You’ll

need to edit
logcat.pl

for your use.
Searching for “wmu” and “kvcc” will show you which areas to change.



PERFORMANCE


Our Voyager box is a Sun UltraSPARCII with 4 processors, 4 gigs RAM, and over 80 gigs drive space at the
time of this writing. The bac
kup machine specs are listed earlier in this document.


We have somewhere over a million and quarter bib records. The extract takes about two and a half hours. The
build process takes about three and three quarter hours. Searches are very fast. Your milea
ge will vary.




10

CAVEATS:


I may have forgotten some details. You’ll have to be resourceful (like I was when I put this together and
modified the basic process, a lot). If something does not work, check permissions on related files and
directories. Looking
at the web server log file may also be helpful. In the setup detailed above, that is
/etc/httpd/logs/error_log
. I will not provide support. I have a job. I might answer simple quick questions.
You implement this software and anything associated with it at
YOUR risk.



COMMENTS ON COST:


Price of PC


as much as several thousand, or possibly free if you have an available machine


Price of Software


free


Cost of Labor


free (built into overhead)


Benefit of Implementation


priceless, when you need it! (
and it might not cost you anything!)