cs421 Theory

Yoshii

Week 2 Thursday
========================================
NEW TOPICS:
/More on Languages
/Grammars
/Formal Notation of a Grammar
/L(G)
/Review
========
We have reviewed mathematic induction which we will use to
prove various the
orems (usually based on string lengths).
We have also reviewed Proof by Contradiction.
We have also learned what it means
to generate sentences and recognize sentences.
The language L is a subset of E^*.
We want a machine M such that:

given x whi
ch is a member of L
the M(x) will return yes.

given y which is not a member of L
the M(y) will return no.
This machine M is a Recognizer
.
##Inter1* L is Recursively Enumerable if the recognizer is ???
##Inter2* L is Recursive
if the rec
ognizer is ???
/More on Languages
==================
Here are some more examples of languages.
Alphabet E = {a,b}
L1 = {a^n b^n  n >= 0 }
an infinite language; a's followed by the same number of b's
/
\
is p
art of this language
L2 = {a^n b a^n  n >= 0}
an infinite language; b surrounded by the same number of a's before
and after. There may be no a's.
##Inter3* Describe in English what this language is
## L = {a^n b a^m  n = 2
m and n >= 0, m >= 0 }
L^R is a set of all L strings reversed = {w^R  w is in L }
Example: L = {a^n b^n  n >=0 }
L^R = {b^n a^n  n >=0 }
L1L2 is a concatenation of 2 languages i.e. its strings are concatenation of
strings from L1 wit
h those from L2 = {xy  x is in L1 and y is in L2}
LL is L concetenated with itself = {xy  x is in L and y is in L}
L^n is L concatenated with itself n times. L^0 is { /
\
}
L^* is union of all L^n i.e. L U LL U LLL U LLLL U ......
Example: L = {a, b}
L^2 = LL = {aa, ab, ba, bb}
Example: L = {a^n b^n  n >=0 }
L^2 = LL = {a^n b^n a^m b^m  n >=0 m>=0 }
##Inter4* Why do we have to have m and not just n in L^2 ??
/GRAMMARS
=========
The concept of a grammar was originally formal
ized by linguists
in their study of natural languages.

we want to define what is or is not a valid sentence of a language.

we want to provide structural descriptions of the sentences.
We still do not have a definitive grammar for English that
a comput
er can use for natural language processing.

what consti
tutes a valid English sentence is not clear.

we cannot interpret/parse an English sentence unambiguously without using
semantic information.
But we can write grammar rules for a subset of Englis
h sentences.
Each grammar rule has the form LHS

> RHS.
Below, things in <> are non

terminals. (syntactic part)
Things not in <> are terminals. (actual words)
English Subset Example Grammar:
1 <sentence>

> <np> <vp>
2 <np>

> <adj> <np>
3 <np>

> <adj> <singular noun>
4 <vp>

> <singular verb> <adverb>
5 <adj>

> The
6 <adj>

> little
7 <singular noun>

> boy
8 <singular noun>

> girl
9 <singular verb>

> ran
10 <adverb>

> quickly
Here, alternative rules fo
r the same left side are separately listed.
However, one may use  to mean OR to list all alternatives on the
right side separated by 's.
Example: <sing
u
lar noun>

> boy  girl
In a parse tree, <> will be the internal nodes (non

terminals)
actual words will be the leaves (terminals)
**************************************************************
If you can construct a parse tree for a sentence using
the grammar, then the sentence is in the language.
**************************
************************************
##Inter5* Given "The little boy ran quickly",
construct a parse tree. Start with the terminals and work your
way up toward <sentence>.
Each time you use a rule to get a new non

terminal node
,
write its number next to the non

terminal node.
##Inter6* Generate another sentence by starting with
<sentence> and replacing it repeatedly by the left

side of the rules
until you reach the terminals.
Each time you u
se a rule to expand a non

terminal,
write its number next to the non

terminal node.
Thus, the complete grammar for English should be able to
parse ALL and ONLY the grammatical English sentences and
should be able to generate ALL and ONLY the gram
matical English sentences.
This complete set of rules does not exist.
However, for a programming language, this complete set of rules does exist.
Let's look at the grammar notation we will use for programming languages.
/Formal Notation of a Grammar
==
===========================
G = (V, T, S, P)
V = a finite set of non

terminals (variables)
We will use captal letters such as A, B and C for variables
instead of < >.
T = a finite set of terminal symbols.
We will use lower case letters such
as a, b, and c
i.e. strings/tokens of the language are made up of these symbols.
P = a finite set of production rules (i.e. grammar rules)
(we will number the rules P1, P2, P3, etc.)
S = the start symbol of P (it is a member of V)
Example: G = {
{S, A, B}, {a,b}, S, P} (P rules are not listed here)
N = V U T (all symbols of the grammar, non

terminals and terminals)
e.g. N = {S,A,B,a,b}
N^* = a set of strings made up of terminal and non

terminal symbols
e.g. {Sab, aAb, etc.}
Each P r
ule is of the form: LHS

> RHS
where LHS is a member of N^* V N^* (at least one non

terminal)
RHS is a member of N^* (mixture of terminals and non

terminals)
Read "

>" as "goes to"
Production rules are used to derive the right side from the
left side.
Example grammar rules:
S

> A B (all non

terminals)
A

> a (non

terminal goes to a terminal)
aBa

> aba (there are terminals on the left side)
S

> aSb (the same non

terminal is repeated on the right side)
*****************
**********************************************************
LHS derives RHS: In each application of a rule, the left side is replaced by
the right side. e.g. S is replaced by A B
In other words, S expands into A B
**************************************
*************************************
*****************************************************************
Derivation path: A path of derivations from S to a string in L
by applying production rules.
**********************************************************
*******
/Derivations and Sentential Forms
=================================
A derivation path is a horizontal representation of a parse tree.
Sentential forms are in N^* (made up of non

terminals and terminals)
=> relates two sentential forms if the
second can be obtained from the first
by the application of a single production.
x => y (x directly derives y)
=*=> relates two sentential forms if the second can be obtained from the first
by the application of 0 or more productions.
x =*=> (x derives y and 0 or more steps)
##Inter7* what do you think =+=> means?
Derivation Example:
G = {{S}, {a,b}, S, P}
P is
1 S

> aSb
2 S

> /
\
Each production rule “expands” S.
A der
iv
ation path S => aSb => a
aSbb => aabb i.e. S =*=> aabb
The first two => are from applying rule 1
The last => is from applying rule 2.
In deriving a string,
many derivation steps may be involved.
S => x1 => x2 => x3 => ....... => w
S, x1, x2, x3
.....w are all called sentential forms and may
consist of both non

terminals and terminals.
##Inter8* Give the derivation path for S =*=> aaabbb using the P above
/Some Useful Techniques in Writing Grammar Rules
=======================================
=========
A

> a A is a recursive rule and each time it is applied
one more a is created.
A

> a will take A to one a so that you can take
a a a a A => a a a a a
A

> /
\
will take A to an empty string s
o that you can take
a a a a A => a a a a
S

> a S a is a recursive rule and each time it is applied
two more a's are generated
S => a S a => a a S a a => a a a S a a a
Thus you will get an even number of a's.
S

> /
\
will take S to an empty string so that you can take
a a a S a a a => a a a a a a
S

> a S b thus will generate one a and one b each time
Note that all a's will be before b's
Thus, you will ge
t the same number of a's and b's.
But
S

> A B
A

> a A
B

> b B
A

> /
\
B

> /
\
will expand A and B independently.
So, you will get a different number of a's and b's.
S

> a C b C the two different C's are expanded independent
ly
/Grammar Examples
==================
S

> a A (generates one a and goes to A)
A

> a A  a (generates 1 or more A's)
##Inter9: Give a set former description for the language.
S

> a S a  a (generates 2 a's from opposite ends and end with
a)
##Inter10: Give a set former description for the language.
S

> a S b  /
\
(generates a and b from opposite ends and ends with /
\
)
##Inter11: Give a set former description for the language.
S

> aa A bb B (A and B are expanded independently)
A

> aa A  /
\
B

> bb B  /
\
##Inter12: Give a set former description for the language.
/Definition of L(G)
====================
L(G)
= the language GENERATED by the grammar G
= {w  w is in T^* and S =*=> w}
i.e. w is made up of terminals
and S derives w.
To prove that L' is the language generated by G,
1. Prove that for every w in L', S =*=> w (G can generate L')
2. Prove that every x, where S =*=> x, is in L' (G generates only L')
End.
Comments 0
Log in to post a comment