cs421 Theory - Yoshii - Week 2 Thursday ======================================== NEW TOPICS: /More on Languages /Grammars /Formal Notation of a Grammar /L(G)

mumpsimuspreviousAI and Robotics

Oct 25, 2013 (3 years and 7 months ago)

58 views


cs421 Theory
-

Yoshii
-

Week 2 Thursday

========================================


NEW TOPICS:

/More on Languages

/Grammars

/Formal Notation of a Grammar

/L(G)


/Review

========


We have reviewed mathematic induction which we will use to

prove various the
orems (usually based on string lengths).


We have also reviewed Proof by Contradiction.


We have also learned what it means

to generate sentences and recognize sentences.


The language L is a subset of E^*.

We want a machine M such that:


-

given x whi
ch is a member of L


the M(x) will return yes.


-

given y which is not a member of L


the M(y) will return no.


This machine M is a Recognizer
.


##Inter1* L is Recursively Enumerable if the recognizer is ???


##Inter2* L is Recursive

if the rec
ognizer is ???


/More on Languages

==================


Here are some more examples of languages.


Alphabet E = {a,b}



L1 = {a^n b^n | n >= 0 }


an infinite language; a's followed by the same number of b's


/
\

is p
art of this language


L2 = {a^n b a^n | n >= 0}


an infinite language; b surrounded by the same number of a's before


and after. There may be no a's.


##Inter3* Describe in English what this language is

## L = {a^n b a^m | n = 2
m and n >= 0, m >= 0 }



L^R is a set of all L strings reversed = {w^R | w is in L }


Example: L = {a^n b^n | n >=0 }


L^R = {b^n a^n | n >=0 }


L1L2 is a concatenation of 2 languages i.e. its strings are concatenation of

strings from L1 wit
h those from L2 = {xy | x is in L1 and y is in L2}


LL is L concetenated with itself = {xy | x is in L and y is in L}

L^n is L concatenated with itself n times. L^0 is { /
\

}

L^* is union of all L^n i.e. L U LL U LLL U LLLL U ......


Example: L = {a, b}


L^2 = LL = {aa, ab, ba, bb}


Example: L = {a^n b^n | n >=0 }


L^2 = LL = {a^n b^n a^m b^m | n >=0 m>=0 }


##Inter4* Why do we have to have m and not just n in L^2 ??



/GRAMMARS

=========


The concept of a grammar was originally formal
ized by linguists

in their study of natural languages.

-

we want to define what is or is not a valid sentence of a language.

-

we want to provide structural descriptions of the sentences.


We still do not have a definitive grammar for English that

a comput
er can use for natural language processing.

-

what consti
tutes a valid English sentence is not clear.

-

we cannot interpret/parse an English sentence unambiguously without using


semantic information.


But we can write grammar rules for a subset of Englis
h sentences.

Each grammar rule has the form LHS
-
> RHS.


Below, things in <> are non
-
terminals. (syntactic part)


Things not in <> are terminals. (actual words)


English Subset Example Grammar:


1 <sentence>
-
> <np> <vp>


2 <np>
-
> <adj> <np>



3 <np>
-
> <adj> <singular noun>


4 <vp>
-
> <singular verb> <adverb>


5 <adj>
-
> The


6 <adj>
-
> little


7 <singular noun>
-
> boy


8 <singular noun>
-
> girl


9 <singular verb>
-
> ran


10 <adverb>
-
> quickly


Here, alternative rules fo
r the same left side are separately listed.

However, one may use | to mean OR to list all alternatives on the

right side separated by |'s.



Example: <sing
u
lar noun>
-
> boy | girl



In a parse tree, <> will be the internal nodes (non
-
terminals)



actual words will be the leaves (terminals)


**************************************************************


If you can construct a parse tree for a sentence using


the grammar, then the sentence is in the language.

**************************
************************************


##Inter5* Given "The little boy ran quickly",


construct a parse tree. Start with the terminals and work your


way up toward <sentence>.


Each time you use a rule to get a new non
-
terminal node
,


write its number next to the non
-
terminal node.


##Inter6* Generate another sentence by starting with


<sentence> and replacing it repeatedly by the left
-
side of the rules


until you reach the terminals.


Each time you u
se a rule to expand a non
-
terminal,


write its number next to the non
-
terminal node.


Thus, the complete grammar for English should be able to

parse ALL and ONLY the grammatical English sentences and

should be able to generate ALL and ONLY the gram
matical English sentences.


This complete set of rules does not exist.


However, for a programming language, this complete set of rules does exist.


Let's look at the grammar notation we will use for programming languages.


/Formal Notation of a Grammar

==
===========================


G = (V, T, S, P)


V = a finite set of non
-
terminals (variables)


We will use captal letters such as A, B and C for variables


instead of < >.


T = a finite set of terminal symbols.


We will use lower case letters such
as a, b, and c


i.e. strings/tokens of the language are made up of these symbols.


P = a finite set of production rules (i.e. grammar rules)


(we will number the rules P1, P2, P3, etc.)


S = the start symbol of P (it is a member of V)


Example: G = {

{S, A, B}, {a,b}, S, P} (P rules are not listed here)



N = V U T (all symbols of the grammar, non
-
terminals and terminals)


e.g. N = {S,A,B,a,b}


N^* = a set of strings made up of terminal and non
-
terminal symbols


e.g. {Sab, aAb, etc.}


Each P r
ule is of the form: LHS
-
> RHS



where LHS is a member of N^* V N^* (at least one non
-
terminal)


RHS is a member of N^* (mixture of terminals and non
-
terminals)


Read "
-
>" as "goes to"

Production rules are used to derive the right side from the
left side.


Example grammar rules:


S
-
> A B (all non
-
terminals)


A
-
> a (non
-
terminal goes to a terminal)


aBa
-
> aba (there are terminals on the left side)


S
-
> aSb (the same non
-
terminal is repeated on the right side)


*****************
**********************************************************


LHS derives RHS: In each application of a rule, the left side is replaced by


the right side. e.g. S is replaced by A B


In other words, S expands into A B

**************************************
*************************************


*****************************************************************

Derivation path: A path of derivations from S to a string in L

by applying production rules.

**********************************************************
*******


/Derivations and Sentential Forms

=================================


A derivation path is a horizontal representation of a parse tree.


Sentential forms are in N^* (made up of non
-
terminals and terminals)


=> relates two sentential forms if the
second can be obtained from the first


by the application of a single production.


x => y (x directly derives y)


=*=> relates two sentential forms if the second can be obtained from the first


by the application of 0 or more productions.



x =*=> (x derives y and 0 or more steps)


##Inter7* what do you think =+=> means?


Derivation Example:



G = {{S}, {a,b}, S, P}


P is


1 S
-
> aSb


2 S
-
> /
\



Each production rule “expands” S.



A der
iv
ation path S => aSb => a
aSbb => aabb i.e. S =*=> aabb



The first two => are from applying rule 1


The last => is from applying rule 2.



In deriving a string,
many derivation steps may be involved.



S => x1 => x2 => x3 => ....... => w



S, x1, x2, x3
.....w are all called sentential forms and may


consist of both non
-
terminals and terminals.


##Inter8* Give the derivation path for S =*=> aaabbb using the P above


/Some Useful Techniques in Writing Grammar Rules

=======================================
=========



A
-
> a A is a recursive rule and each time it is applied


one more a is created.



A
-
> a will take A to one a so that you can take


a a a a A => a a a a a



A
-
> /
\

will take A to an empty string s
o that you can take


a a a a A => a a a a



S
-
> a S a is a recursive rule and each time it is applied


two more a's are generated


S => a S a => a a S a a => a a a S a a a


Thus you will get an even number of a's.



S
-
> /
\

will take S to an empty string so that you can take


a a a S a a a => a a a a a a



S
-
> a S b thus will generate one a and one b each time


Note that all a's will be before b's


Thus, you will ge
t the same number of a's and b's.




But



S
-
> A B


A
-
> a A


B
-
> b B


A
-
> /
\


B
-
> /
\



will expand A and B independently.


So, you will get a different number of a's and b's.



S
-
> a C b C the two different C's are expanded independent
ly




/Grammar Examples

==================


S
-
> a A (generates one a and goes to A)

A
-
> a A | a (generates 1 or more A's)


##Inter9: Give a set former description for the language.


S
-
> a S a | a (generates 2 a's from opposite ends and end with

a)


##Inter10: Give a set former description for the language.


S
-
> a S b | /
\

(generates a and b from opposite ends and ends with /
\
)


##Inter11: Give a set former description for the language.


S
-
> aa A bb B (A and B are expanded independently)

A

-
> aa A | /
\

B
-
> bb B | /
\


##Inter12: Give a set former description for the language.


/Definition of L(G)

====================



L(G)

= the language GENERATED by the grammar G


= {w | w is in T^* and S =*=> w}


i.e. w is made up of terminals

and S derives w.



To prove that L' is the language generated by G,


1. Prove that for every w in L', S =*=> w (G can generate L')


2. Prove that every x, where S =*=> x, is in L' (G generates only L')



End.