ss_unit1_ch2 - WordPress.com

mangledcobwebSoftware and s/w Development

Dec 14, 2013 (3 years and 8 months ago)

94 views

Chapter
-

2

Data Structure for Language
Processing

Classification of Data structures


Language
processor makes frequent use of
search
operation

over its data structures.


The data structure used in language processing
can be classified on the basis of the following
criteria:

1.
Nature of a DS


whether a
linear

or
nonlinear

DS.

2.
Purpose of a DS


whether a
search DS

or an
allocation DS.

3.
Lifetime of DS


whether used during language
processing or during target program execution
.

Linear data structure



A
Linear DS

consist of a linear
arrangement of elements in the
memory.


Linear
DS requires a contiguous memory for
elements as
shown in fig(a).


Problem: In the
situations where the size of a data structure is
difficult to predict. In such a situation a designer is forced to
overestimate

the memory requirements of linear data
structure. This leads to wastage of memory
.




Non linear data structure


The elements of a
nonlinear data structure
need
not contiguous areas of memory, which avoids
the memory allocation problem seen in the
context of linear DS.



fig(b) shows allocation to four nonlinear data
structure i.e. E,F,G and H, where F is stored in 3
-
different memory areas.



nonlinear arrangement of elements leads to
lower search efficiency
.



Search Data
Structure


Search DS (or search structure) is a set of entries, each
entry accommodating the information concerning one
entity in source program.



Each entry assumes to contain a key field which forms
the basis for a search operation
.



Search
DS are used to construct various tables of
information
.



Search DS are used during language processing to
maintain attribute information of different entities in
source program.



Search Data
Structure Cont...

Entry Formats
:


E
ntries
consist of two parts, a fixed part and a variant part.
Each part consists of set of fields.


Fields of the fixed part exist in each entry of the search
structure.


The value in the tag field of the fixed part determines the
information to be stored in the variant part of the entry
.



For e.g.
, Entries in the symbol table of a compiler have the
following fields:

1.
Fixed part: Fields symbol and class (class is the tag field)

2.
Variant part: variable, operator, procedure name, function
name etc.



Search Data
Structure Cont...

Algorithm (Generic search procedure)
:

1.
Make a prediction concerning the entry of the
search DS with symbol ‘s’ may be occupying. Let this
be entry e.

2.
Let s
e

be the symbol occupying eth entry. Compare
‘s’ with s
e
. Exit with success if the two match.

3.
Repeat steps 1 and 2 till it can be concluded that the
symbol does not exist in the search DS.



Each comparison of step 2 is called a
probe

(p).

P
s

: Number of probes in a successful search

P
u

: Number of probes in an unsuccessful search





Search Data
Structure Cont...

Operations on search structure
:


The following operations are performed on
search structure:

1.
Operation
add:
Add the entry of a symbol

2.
Operation
search:
Search & locate the entry of a
symbol.

3.
Operation
delete:
Delete the entry of a symbol.


The entry for a symbol is created only once, but
may search for a large number of times during
the processing of a program.




Search Data Structure Cont...

Table organization


Table is linear data structure. Two points can be made
concerning table as search structure.


Given the location of an entry of the table, so
easy to move
on next entry or previous entry of table for search
technique.


Tables using the fixed length entry organization. It states
that the
address of an entry
in a table can be determined
from its entry number.



3
-
main types of Table organization are:

1.
Sequential search organization

2.
Binary search organization

3.
Hash table organization


Search Data Structure Cont...

Sequential search organization
:


It uses Generic search procedure to search any symbol from
the table.


Fig. shows a typical state of a table using the sequential
search organization.



Search Data Structure Cont...

Sequential search organization (operations)
:


Search for a symbol:



P
s
= f/2
for a successful search



P
u

= f
for an unsuccessful search




Add a symbol:


The symbol is added to the first free entry in the table. The value of

f


is updated accordingly.



Delete a symbol:

1.
Physical deletion :
Entry is deleted by erasing or by overwriting. Thus, if
the
d
th

entry is to be deleted, entries
d+1

to
f
can be shifted ‘up’ by
one entry each. This would require (
f
-

d
) shift operations in symbol
table.

2.
Logical deletion :
It is performed by adding some information to the
entry to indicate its deletion. This can be implemented by introducing
a new field to indicate whether an entry is active or deleted.


Search Data Structure Cont...

Binary search organization
:


All entries in a table are assumed to satisfy an ordering relation.


Algorithm (Binary search):

1.
Start

:= 1;
end

:=
f
;

2.
While
Start

<=
end


a)
e
:= (
Start

+
end

)/2; take rounded value.


Exit with success if
s

=
s
e
.

a)
If s <
s
e
then

end := e


1;



else
start := e + 1
;

3.

Exit with failure.



Search Data Structure Cont...

Hash table organization:


Search prediction depends on the value of s.


3
-
possibilities exist:

1.
The entry may be occupied by s

2.
The entry may be occupied by some other symbol, or

3.
Entry may be empty


Algorithm (Hash table organization):

1.
e

: =
h(s)

2.
Exit with success if s = s
e

and with failure if entry e is
unoccupied.

3.
Repeat steps 1 and 2 with different hashing functions
(multiplication function or division functions etc…).



Allocation Data Structure


We will discuss two allocation data structure,
stack(linear) and heaps(nonlinear).

Stack:

A stack is a linear Data Structure which specifies the
following properties:

1.
Allocation and
deallocations

are performed in a
last
-
in
-
first
-
out

(LIFO) manner.

2.
Only the last entry is accessible at the time.

Allocation Data Structure

Following fig. illustrates the stake allocation and
deallocation

process.

Allocation Data Structure

Extended stack


Sometimes extension is needed in the simple stack
model because all entities may not be of the same size.
The size of an entity is assumed to be an integral
multiple of the size of a stack entry.



Following figure shows extended stack model. In
addition to SB and TOS, two new pointers exist in the
model:


A record base pointer (RB) pointing to the first word of the
last record in stack.


The first word of each record is a reserved pointer. This
pointer is used for housekeeping purposes as explain
below.


Allocation Data Structure

Extended stack


Extended stack mode (b)
-
allocation (c)
-
deallocation

Allocation Data Structure


Allocation time actions:

No



Statement

1.
TOS

:= TOS + 1 ;

2.
TOS*

:= RB;

2.
RB


:= TOS;

3.
TOS


:= TOS + n;



Deallocation

time actions:

No



Statement

1.

TOS

:= RB
-

1 ;

2.

RB

:= TOS*;


Heap


A heap is a
nonlinear DS

which permits allocation
and
deallocation

of entities in a random order.


An allocation request returns a pointer to the
allocated area in the heap.


A
deallocation

request must present a pointer to
the area to be
deallocated
.


Memory management
: memory management
thus consisting of:

1.
Identifying the free memory areas (or holes).

2.
Reusing free memory areas.


Heap Cont…

Identifying the free memory areas:


Two popular techniques used to identify free memory space are:

1.
Reference Counts

2.
Garbage Collection


Reference Counts


In reference count techniques, the system associates a
reference
count

with each memory area to indicate the number of its active
user.



The number incremented when a new user gains access to that
area and is decremented when a user finishes using it. The area is
known to be free when its reference count drops to zero.



Advantage
: reference count technique is simple to implement


Disadvantage:
Incremental overheads, i.e. overheads at every
allocation and
deallocation
.


Heap Cont…

Garbage Collection:


Garbage collection makes two passes over the memory
to identify unused areas.



In the
first pass
it traverses all pointers pointing to
allocated areas and
marks
the memory areas which are
in use.



The
second pass
finds all unmarked areas and declares
them to be
free.



The garbage collection overheads are not incremental.
They are incurred every time the system runs out of
free memory to allocate to fresh requests.


Heap Cont…

Reuse of memory:


When a free list is used, two techniques can
be used to perform a fresh allocation:

1.
First fit technique
: Select the first free area
whose size is >= n words, where n is the number
of words to be allocated.

2.
Best fit technique:
This technique finds the
smallest free area whose size >= n.