Report Writing on a Budget: Using Perl

whooploafSoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

78 views

372 Posters
Report Writing on a Budget: Using Perl
Hallett German
GTE Laboratories, Inc.
Introduction
At last year's NESUG, Marge Scerbo presented an
interesting paper showing how a few simple SAS®
datasteps statement could be used to generate
powerful and customizable reports.
As I read through the paper, I wondered "Gee, I
could do most of this in Perl. Or can I?" This paper
is a response to that thought. The following is
an outline of the paper:
1. What is Perl?
2. How can Ileam more about Perl?
3. Perl Concepts
4. Basic Reports -- SAS vs. Perl
*
Input Forms
*
Reports
5. Conclusions
6. References
After reading the paper, you should have a good
overview of Perl's reporting capabilities and
hopefully be encouraged to create your own reports
with this command language.
What is Perl?
Perl was developed by Larry Wall starting in 1986. It
officially stands for Practical Extraction and Rep:>rt
Language. [But there are those who say that like
SAS it is a group of letters with no meaning in itself.
You be the judge.]
Perl is a powerful command language that has
elements of C, UNIX shells, awk, sed, and much
more. The result is a seH-contained portable
language. Perl is now almost a de facto standard with
UNIX system administrators. [It also is used internally
at the SAS Institute.]
Perl's appeal is also because it is distributed with
source and available free as part of GNU public
sottware.!t can be obtained via e-mail or from various
anonymous ftp sites. Per I can now be found under
AmigaOS,Atari OS, DOS (it runs fine under MS­
Windows], Macintosh, UNIX, and VMS.
Perl contains many different elements:
-- Over 100 built-in functions
NESUG '92 Proceedings
- A rich built-in library
-- networking capabilities
-- database capabilities
-- C interfaces
-- debugger
-- report capabilities
-- converters (awk, sed, C header libraries to Perl)
Many utilities and interfaces have been built with
Perl. These include interfaces to Oracle, Sybase,
Curse, and X Windows.
How can I learn more about Perl?
Here are some places to look:
 A free man (help) document has over 100 pages on
Perl. A formatted copy can be obtained from the
anonymous ftp site chem.bu.edu.
 Various conferences give tutorials on Perl. These
include USENIX, SUG (SUN), and DECUS (DEC).
 The Usenet group comp.lang.perl is a treasure
trove of Perl tips. Perrs creator Larry Wall is actively
posting useful messages there.
*
Once a month, a FAQ (frequently asked questions)
list is posted on comp.lang.perl
 The Wall and Schwartz book (see references) is
considered the source on Perl. An advanced Perl
book is planned.
 The German book covers Perl portability and has
a healthy number of Perl references.
Perl Concepts
Before looking at our first Perl report, it is helpful to
understand the following:
 Perl statements must be in lowercase except for
filenames, and subroutines.

Perl statements must end with a semicolon.
(Making SAS users feel right at home.]
 A series of statements
may
be processed as a
block. A block is contained within braces. (i.e. (})

Comments begin with a #.
 Perl supports a number of data types each with its
own unique identifier:
- $ -- Scalar variables may contain numbers
(including decimals, characters, or Boolean (1,0).
Scalars also may hold the elements of simple and
associative arrays.
examples $a = 1 ; #Assigned a number
$a = "dog" #Assigned string
- @ -- Simple arrays. Can contain elements with
numbers or characters. Each element is designated
by a numeric key marking the position in the array.
examples @array1 #entire array
$array1
[0] #First element in array
$array1[$#array1] #Last element in
#array.
- % - Associative arrays. Can contain elements
with numbers or characters. Each element is
designated by a numeric OR character key marking
the position in the array. Associative arrays are
beyond the scope of this paper.
 The following are some of the functions that are
used in these examples:
- CLOSE. Closes an open file.
- DIE. If a condition is met then die (end program) with
an optional message. A WARN function is also
available.
- OPEN. A powerlul command. May open a file
for reading (default), writing, or bothl An alias for the
file is assigned by the user. (Like SAS's libref or
fileref component in a LIBNAME or FILENAME
statement.) Also may be used like SAS's LlBNAME
PIPE/FILENAME PIPE statements to pipe output
from a n operating system command to or from a file.
Basic Reports  SAS vs Perl: Input Forms
[Do note that all examples shown are "standard Perl"
and should be portable across operating systems. I
created these examples on MS-DOS or a Macintosh
and ran them of UNIX "as is!',
Data may be inputted two different ways. Interactively
and non-interactively:
Posters 373
Interactively:
The following is a Simple program that takes user
input and writes it to a file. The chop function
removes the newline.
un:
cy:
printit:
open(FILE1,"»input.txt");
$cnt = 1;
print "Enter the NAME of the University\n";
$univ=substr( <STDIN>,O ,21);
chop($univ):
print "Enter the CITY of the University\n";
$city=substr( <STDIN>,O, 16);
chop($city):
print FILE1 "$univ $city \n";
print "Do you wish to enter another record? V/N\n";
$choice=substr«STDIN>,O,1 );
if
($choice eq "V·) {$cnt++: goto un;}
else {die "$cnt records added\n";}
This approach is ideal for small databases. A rich
range of data checking is possible.
Non-interactively:
For smaller files, you can pre-build an
array
that
contains values:
@array1= ("Brown University Providence",
''Comell Ithaca' ;
For larger files, it is recommended to use
compressed files or dbm files.:
Compressed (Binary) Files: Files with variable­
length records are compressed and uncompressed
using the pack/unpack functions. This is shown a
little later in the paper. They can also be set up as
random-access files
DBM files. DBM stands for Data Base Management.
DBM is available in some format for aU Perl
interpreters except the Amiga and the Macintosh ..
This is done using associative arrays and is beyond
the scope of this paper.
Basic Reports  SAS vs Perl: Input Forms
Report #1 -- A Simple List
The following report should be produced:
NESUG '92 proceedings
374 Posters
BROWN UNIVERSITY
PROVIDENCE
CORNELL
ITHACA
UNIV OF MARYLAND
BALTIMORE
UCLA
LOS ANGELES
COLUMBIA
NYC
SYRACUSE UNIV.
SYRACUSE
To do this, the program will also: 1) split the "fields"
of the "record" to appear on two lines and 2) convert
the values of these fields to uppercase regardless
whatever was the original case of the value.
Here is the program that creates both the input
record and the report:
#Example1 -- Standard Approach.
#
#########################
# a. Create an array #
#########################
$fileo = "ex1.tx1"; #Set value for file
@array1= ("Brown University Providence",
"Comell Ithaca",
"Univ of Maryland Baltimore",
"UCLA Los Angeles",
"Columbia NYC·,
"Syracuse Univ. Syracuse");
########################
# b. Open a file for writing #
########################
open(EX1,">$fiIeo'1;
foreach $cnt
(0 ..
$#array1) (
############################
# c. Split the "record" into two fields #
############################
($univ,$loc) = split(' ',$array1[$cntJ);
#############################
# d. Translate record to uppercase #
#############################
($university = $univ) =-
trIa-vA-V;
($Iocation = $Ioc) =-
trIa-v
A-V;
############################
# d. Write out record and close file #
############################
print EX1 "$university\n$location\n";
}
close(EX1);
Note that a scalar variable contains the value of the
NESUG '92 Proceedings
file name. This allows you to easily change a file
name IN ONE PLACE ONLY when needed.
Report
#2 --
A Formatted List
Formatted list like the one below can also be created
with Perl.
BROWN UNIVERSITY
CORNELL
UNIV OF MARYLAND
UCLA
COLUMBIA
SYRACUSE UNIV
PROVIDENCE
ITHACA
BALTIMORE
LOS ANGELES
NYC
SYRACUSE
Note that it would be easy to add the UNIV text as in
Marge's example.
The following part creates the binary file:
#Example 2 -- Fixed Records (Use PaCk/Unpack)
Input Part
####################################
#a.
Create an
array
#
####################################
@univs = ( "Brown University", "Providence",
"Cornell",
''Ithaca'',
"Univ of Maryland", "Baltimore",
''UCLA'',
''Los
Angeles",
"Columbia", "NYC",
"Syracuse Univ.", "Syracuse");
####################################
#b.
Open a file for writing #
####################################
open (EX2,">ex2.txt")
II
die "Can't open ex2.txt $!\n"; #exception handling
####################################
#C.
Go
through array
#
####################################
foreach $i (0 .. $#univs) {
####################################
#d.
If
university, #
# then assign to $university. #
####################################
if
«$i
=
0)
II
(length($V2)==1»){
#first
record
$university = $univs[$ij;
}
####################################
#e.
If
location,
#
# then assign to $Iocation #
# write out "packed" record #
#
cbsefile
#
####################################
if
(Ienath($Vgt==~
{ #location
}
}
$Iocation = $univs[$i];
$line = pack("A20 A1S",$university,$location);
print EX2 $line;
close(EX2);
This example is used to retrieve and unpack the
records from the file and create the report:
# Example 2 -- Fixed Records (Use Pack/UnPack)
Report Part
#####################
#
a
open file
and
#
# retrieve
packed
line #
#####################
file-part:
}
open (EXP2,"ex2.txt")
II
die "Can't open ex2.txt $!\n";
while «EXP2» {
chop;
$rme=$~
close(EXP2);
###########################
#
b.
Loop through Hne: #
#
Unpack Hne
#
# Strip leading characters #
# Rejoin line #
# Set line to uppercase #
#
Print
line
#
##########################
rpt..part:
$Ien = length($line);
for($offset=O;($offset<$len) ;$offset=$offset+34) {
$lin = substr($line,$offset);
($univ,$loc) = unpack("A20 A1S",$lin);
@univ=Splil(' ',$univ); #Trim leading blanks
@Ioe=split(' ',$loC);
$unn = join(' ',$univrOj,$univ[1),$univ[2J);
($univ .. $unn) =-
trla-zJA-V;
#Change to
uppercase
print
)
$Ion = join(, ',$loc[O],$loc[1]);
($Ioc= $Ion) =-
tr/a-zJA-ZJ;
printf ""I020s %15s\n",$univ,$loc; #formatted
Example
#3
Creating a formatted report using Perl.
Perl has a powerful report facilit that can do pretty
much anything SAS can with PUT statements. Here
is a simple example:
University
BROWN UNIVERSITY
UNIV. OF MARYLAND
UCLA
COLUMBIA
SYRACUSE UN IV
Posters
375
University List
State Zip
RI
MD 21201
CA
NY
10005
NY
13112
ThiS is the data as stored in the input file: [Note the'
as a field delimiter]
Brown University'ri*
Univ. of Maryland*md*21201
UCLA"CA"
Columbia*ny"10005
Syracuse Univ*ny*13112
This is the Perl script that generated it: [Note that you
first create a template and then use it.)
#Example 3 -- Using Formatted Reports
#Create a header format. Period = end of format.
format HEAD1=
University List
University
State
Zip
#Define report format. Accent = blank line
format EX3B=
#<<< -
Place holder and left justification
@«««««««««< @« @««<
#Variables in report
$un,
$st,
$zip
open(EX3A,"ex3a.txt")
II
(die 'cant open ex3a.txt
$!\n");
open(EX3B, ">ex3b.txt")
II
(die "cant open ex3b.txt
$!\n");
#
System Variables $" - header format name
# $- -- report format name
select (EX3B); $" = "HEAD1"; $- = "EX38";
while «EX3A» {
chop;
($unn,$stt,$zipp) = split(A*/,$->; #Parse fields
($un= $unn) =-
tria-vA-V;
#Set to Uppercase
($st= $stt)
=-
tr/a-zJA-V;
($zip: $zipp)
=-
tr/a-zJA-V:
write(EX38); #Write out report
}
close(EX3A);
close(EX3Bi;
NESUG '92 Proceedings
376 Posters
Here is a list of report variables:
$1
$%
$=
$-

$'
o
(default) writes out
buffer every x lines.
>0 Writes out buffer
after a write or print.
Current Page number
Current page length.
Oefault=60.
Number of lines left on a
page available for
writing.
Current report format
Current header format
Many other capabilities are possible such as sorting
records, changing lines per page, and generating
footers. Unfortunately, it would take far more pages
than I have to cover that material.
Conclusions
This can only be the briefest of introduction to Perl's
reporting capabilities. It offers a strong (and free)
alternative for SAS in doing simple reports. The
reader is encouraged to try the examples and read
the suggested references. Posters in future years
may discuss some of Perl's advanced reporting
capabilities and how to create interactive Perl
applications.
Getting in touch with me/Trademarks
Hallett German
GTE Laboratories Inc
40 Sylvan Road
Waltham, Ma 02254
617-466-2290
hhg1 @bunny.gte.com
SAS
®
and all other SAS products mentioned are a
registered trademark of the SAS Institute
References [Annotated]
Bates, Douglas "Data Manipulation in Perl"
Unpublished Paper pp1-6.
[Strongly recommended. Has a good section on
how to use Perl to clean up datafiles. Some of this
capability was added into the 6.07 release.]
German, Hallett Command Language Cookbook
1992 Van Nostrand Reinhold pp.
247-305
[Has plenty of Perl references and a good discussion
on Perl portability.1
Scerbo, Marge "Oat aStep Reporting" NESUG 91
Proceedings 1991 pp.
60-66
NESUG 192 Proceedings
[If you want to see how to generate the same
examples using SAS, look at Marge's paper.1
Wall, Larry and Randall
L.
Schwartz Programming
Perl 1991 O'Reilly
&
Associates. pp
1-42,106-118
[The Perl "bible". Also called the Camel book
because what is on the cover. A reference, tutorial,
and code ideas book all in one place. Strongly
recommended.]