Programming Language Principles

brawnywinderΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

70 εμφανίσεις

CSC
3120

Lab Page
1

of
3

Programming Language Principles



HW #8
:
Intro to
Regular Expressions in Perl




In t
his week’s lab
we
will
use a

Perl script

that search
es

through
a file

for words that match certain
patterns that are specified using regular expressions.

This is a solo assignment; you may discuss it with
your peers only in general terms.

You are not writing a script today; you are simply determining a series
of regular expressions the script
can

use in order to retrieve certain information.


At the end
of the lab session, you mus
t turn in a file that contains as many of the regular expressions
worked out as you can get during the lab period.

The final submission
will just include the regular
expressions and the words they extract;
you do not need to
return the starter file.


Understanding the Starter File




The starter file opens a file for reading with the open() function. Note that to begin with it contains
the contents of the Linux dictionary file /usr/share/dict/words, stored in the file
usr_share
_dict_words.txt.




The script then uses a do..while loop to read one line at a time from the file:


$somestring = <$fh>




It

test
s

to see if a given string matches a regular expression, be sure to use the “
=~
” operator:


if (
$
somestring
=~

/
regular
-
expression
-
here
/

)




To use case
-
insensitivity, append the flag “i”:


if (
$
somestring
=~

/
regular
-
expression
-
here
/
i ) …




At the end of the
script
,

it

print
s

out how many words were a match, or “No matches found” if there
were no matches.


Using the starter file, create regular expressions in Perl to
serve the following tasks
. Your submission for this
assignment is a file containing, for each problem:

a.

Your regular expression(s)

b.

The words you extracted


1.

Find all words that contain the three
-
letter string “ghi.” Name your favorite three.


2.

There are
salty

words in the English language that contain the consecutive letters N
-
A
-
C
-
L. Name
any three of them.


3.

If you were asked to name a familiar word that contains two S’s, followed by another lette
r, and then
two more S’s, you might say ASSESSOR. What common word contains two D’s, followed by
another letter, and then two more D’s?


CSC
3120

Lab Page
2

of
3

4.

Find all words that contain the two
-
letter string “yz” not followed by an e or an i. Name your
favorite three.


5.

Are
there any words in which your initials show up in not necessarily consecutive locations, but your
first initial starts the word, your last initial ends it, and your middle initial is somewhere in between?

(If not, and you want to see if you wrote the regu
lar expression correctly, try mine: LNM.

That
should get you 59 matches, the last of which is “
l
yse
n
kois
m
.”
)


6.

How about a word in which each of your initials shows up once, and only once (in order
, but not
necessarily starting or ending the word
)?


7.

Find
all words that contain four consecutive vowels. Name your favorite three.


8.

In the old days, we used to

use the number pad on
non
-
qwerty
-
keyboard

phone
s

to enter “text”
messages. Most number
-
keys on
such phones

refer to three or more letters, e.g., (2) ABC
. New
entering schemes are emerging, such as T9, which stands for Text on 9 keys:

T9's objective is to make it easier to type text messages. It allows words to be
entered by a single keypress for each letter, as opposed to the multi
-
tap approach
used in th
e older generation of mobile phones in which several letters are
associated with each key, and selecting one letter often requires multiple
keypresses. It combines the groups of letters on each phone key with a fast
-
access
dictionary of words. It looks up
in the dictionary all words corresponding to the
sequence of keypresses. (wikipedia.com)

What words can the
phone number 666
7666 spell?

Hint: 6 is the letters “mno” and 7 is the letters
“pqrs.”


9.

What words can your cell number spell? (Note: you may have

to shorten your number, e.g., what
words can your last four digits spell?)


10.

Are there any words in which the same three letters in the beginning occur in reverse order at the
end? For example:
des
pi
sed
. Again, name your favorite three.


11.

Are there any
words in which a pair of letters is repeated later in the string, followed by two repeats
of the letters in reverse order? (There can be things in the middle, so the sequence will be like
….AB….AB….BA….BA.)


12.

There’s a well
-
known old puzzle to name words t
hat contains the vowels A, E, I, O, and U, each
exactly once, in that order. What are your favorite three?


Create a second Perl expression that looks for anagrams (rearrangements) of words using the following
technique:



An anagram contains the letters of
the original word.



An anagram
just

contains the letters of the original word.




CSC
3120

Lab Page
3

of
3

Therefore,

w
e can use two regular expressions: One to match potential anagrams, and one to
rule out

those that have extra letters. One example for the word
dear

would be:


my

$patternOne

=

'^[
dear
]
{4}
$';




# one of d,e,a,r repeated 4 times

my $patternTwo

=

‘(.
).*
\
1';




# any char, followed by 0+ chars, and a repeat


To test to see if a word is made up of the same letters in genome,

if ( $word =~ $patternOne ) ….


To test to
see if a word does NOT match the second pattern,

if ( $word !
~ $patternTwo ) …


13.

Find the anagram of
kitchen
.

List both regular expressions and the word
.


14.

Find

the anagram

of
fitness
. List both r
egular expressions and the word
.

This one is harder, because

the letter
s

is allowed to repea tbut not to occur three times.


Extra Credit


Use the same technique to find answers to these last few.


15.

What is the only one common, uncapitalized, seven
-
letter English word, containing just a single
vowel


that does not

have the letter S anywhere within it? Hint: The answer starts with a letter T.


16.

When you’re writing in script, there are four letters of the alphabet that can’t be completed in one
stroke


‘i’, and ‘j’ (which require dots) and ‘t’ and ‘x’ (which require
crosses). Can you find a
common English word that uses each of these letters exactly once?


REFERENCES




LeBlanc, M.D. and Dyer, B.D. (2007).
Perl for Exploring DNA
.

Oxford University Press




Shortz, Will (1996).
The Puzzle Master Presents: 200 Mind
-
Bending
Challenges
. Random House.



Shortz, Will (2003).
The Puzzlemaster Presents: Will Shortz’s Best Puzzles from NPR
. Volume 2.
Random House.


BEFORE YOU LEAVE
, turn in what you have to the Blackboard

link for this lab/homework.


BEFORE FRIDAY’S CLASS
, turn in your completed assignment to the Blackboard link for
this lab. Remember that the lab programs must follow the Programming Standard for the department.

The due date for this assignment is th
e start of the next Friday class.