5. File Structure and Indexing

naivenorthΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

71 εμφανίσεις

5. File Structure and Indexing

5
-

1

5. File Structure and Indexing


File Structure

Background

Overall System Structure

Disk Manager

File Manager

Buffer Manager


Indexing

Introduction

B
-
tree

Hasing

5. File Structure and Indexing

5
-

2

Background



The database technology is very useful to deal with very
large

volume of data
in many applications such as on
-
line transaction

processing, file management systems, mailing systems, etc.




A DBMS has a
data access system
as its lower subsystem.

A data access system manages various
storage structures
such as

sequential file and indices to be used for efficient key
-
associative

accesses to large volume of data. And it provides method to access

the storage structures for upper subsystem, i.e., a query processor.

Moreover, data access system has
concurrency control
and

recovery facility
to support transaction concepts (atomic action).

5. File Structure and Indexing

5
-

3



Data Access Systems


(1) RSS (Relational Storage System)
-

1976 Astrahan


-

Data access system of System R (IBM's first RDBMS)


-

B+ tree, concurrency control, recovery facility


-

generally accepted as one of pioneers


(2) WiSS (Wisconsin Storage System)
-

1985 Chou


-

Interface to UNIX


-

B+ tree, extensible hash as indices


-

Support a storage structure to store long data items.


-

MWiSS (Gamma Relational Database Machine, 1989)


(3) RSAM (Relational Storage Access Method)


-

1987 Informix Inc.


-

Lower subsystem of Informix
-
Turbo


-

Manage disk instead of UNIX file system.

Background
(cont'd)

5. File Structure and Indexing

5
-

4

Overall System Structure

user

query parser

strategy selector

recovery manager

concurency controller

lock table

log

buffer manager

file manager

data files

indices

system catalog

user transaction

buffer (main memory)

DISK MANAGER (OS)

5. File Structure and Indexing

5
-

5



Query Parser

translates statements in a query language into lower
-
level

language



Strategy Selector

transform a user's request into an equivalent and efficient



access plan,

thus finding a good strategy for executing the query.



Buffer Manager

replace pages using LRU algorithm.



File Manager

manage database file such as data files, system catalog.

* Indices are usually managed by
Index Manager.

Overall System Structure

(cont'd)

5. File Structure and Indexing

5
-

6



Recovery Manager

ensures that the database remains in consistent (correct)

state despite system failures.



Concurrency Controller

ensures that the database interactions with the database

proceed without conflicting with one another.



Data Files

database itself.



Indices

provide for fast access to data items holding particular values.



System Catalog

store information about the structure of database, and

authorization information, constraints and statistical data.

Overall System Structure

(cont'd)

5. File Structure and Indexing

5
-

7



Objectives of DBMS Performance

-

Minimize disk access time


(micro floppy disk : 400 ms, Main Frame Disk : 30 ms)

--
> needed good
storage structure
and
access method



Access Procedure

Database Access

Disk

Disk Manager

File Manager

user request (set of records)

request records

records

request pages

pages

I/O operation

data move

FM decides pages which

have given records

DM decides the place in disk

device and move it in memory

5. File Structure and Indexing

5
-

8



Functions


-

Mapping page number into physical disk address.




Operations


-

Rerieval page P from a page set


-

Insert page P from a page set


-

Delete page P from a page set

Disk Manager

5. File Structure and Indexing

5
-

9



Functions


-

Manage stored files




Operations


-

Rerieval record R from a stored file (set of records)


-

Insert record R into a stored file


-

Delete record R from a stored file


-

Create a stored file F


-

Delete a stored file F


File Manager

5. File Structure and Indexing

5
-

10



Basic Idea


-

Locate related data in the same or adjacent disk place


-

very important in system performance




Example


We have read record
R1
. Next, We need record
R2
. Suppose
R1


is stored in page
P1

and

R2
is stored page
P2
.

(1) If R1 and R2 are the same page,


we have only one access.


(2) If R1 and R2 are not same page but adjacent,


we cn reduce the I/O seek time.


Data Clustering

R1

R2

5. File Structure and Indexing

5
-

11



Example of Page Management

Page Management

UNIV

database

STUDENT

COURSE

ENROL

Sno Year Year Dept


100 Na 4 CE

200 Lee 3 EE

300 Jung 1 CE

Cno Cname Credit Lecturer


C123 PL 3 Kim

C312 DS 3 Hwang

C324 File 3 Lee

Sno Cno Grade


100 C413 A

100 E412 A

200 C123 B

300 C312 A

300 C324 C

S1

S2

S3

C1

C2

C3

E1

E2

E3

E4

E5

* Suppose one record is stored in one page

5. File Structure and Indexing

5
-

12



Disk Structure

Page Management
(cont'd)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

page number

S1 S2 S3 C1

C2 C3 E1 E2 E3

E4 E5

disk directory

free page



page = block



Transfer unit Memory


to Disk

5. File Structure and Indexing

5
-

13

Page Management
(cont'd)



Loading UNIV database


-

Disk Manager allocate the STUDENT table,


COURSE table, ENROL table.



Insertion Record


-

Insert S4 (Sno = 600)


=> allocate S4 in page #12 (free page), connect pointer



Deletion Record


-

Delete S2 (Sno = 200)


=> delete S2 (page #2) return to free page set.


* if another insertion is happened, the record is allocated in


front of free page set (page #2).

5. File Structure and Indexing

5
-

14



Disk Structure
after

insertion and deletion operations

Page Management
(cont'd)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

page number

S1 S3 C1

C2 C3 E1 E2 E3

E4 E5

disk directory

free page

S4

5. File Structure and Indexing

5
-

15



Disk Directory :
control information to manage pages

Page Management
(cont'd)

0

next page pointer / null

Page set

Address


Free page set

2

STUDENT

1

COURSE

4

ENROL

7

page number

5. File Structure and Indexing

5
-

16



Example of Record Management :

Suppose several records are stored in one page



Loading STUDENT file in a page

Record Management

p

next page pointer

page number

S1

S2

S3

Free space

5. File Structure and Indexing

5
-

17



Insertion Record


-

Insert S4 (Sno = 400)


=> allocate S4 in free space of a page



Records are identified by RID (Record IDentifier)


-

RID = Page Number + Offset

Record Management
(cont'd)

p

next page pointer

page number

S1

S2

S3

Free space

S4

5. File Structure and Indexing

5
-

18



Deletion Record


-

Delete S2 (Sno = 200) and Compact remaing records

Record Management
(cont'd)

p

next page pointer

page number

S1

S3

S4

Free space

5. File Structure and Indexing

5
-

19



Structure of Data File

Record Management
(Implementation View))

record 1 record 2 ......

Slot[2] Slot[1] RIDCnt

Free FileID ThisPage PrevPage NextPage

recor

Slot[2] Slot[1] RIDCnt

Free FileID ThisPage PrevPage NextPage

DATA

Control

Information

Data Page

FileDesc


LastPageID

FirstPageID


....

5. File Structure and Indexing

5
-

20



Field Information

Record Management
(Implementation View))

char

Data[ ]

Data area

short

Slot[ ]

Start address (offset) of records

short

RIDCnt

Number of records

short

Free


Start address of free space



T_FID FileID

File Identifier including this page

T_FID ThisPage

This page Identifier

T_FID PrevPage

Previous page address

T_FID NextPage

Next page address

typedef struct {

} T_DATAPAGE;

5. File Structure and Indexing

5
-

21



Record Management Functions

Record Management
(Implementation View))

(1) Add Record


-

insert new record in
Free Space


-

update
Free pointer and Slot


-

increase
RIDCnt

record 1 record 2

Slot[3] Slot[2] Slot[1] RIDCnt

Free FileID ThisPage PrevPage NextPage

DATA

Data Page

new

5. File Structure and Indexing

5
-

22

Record Management
(Implementation View))

(2) Delete Record


-

delete record 2


-

Fill negative integer in
record's length field


-

decrease RIDCnt

record 1

Slot[3] Slot[2] Slot[1] RIDCnt

Free FileID ThisPage PrevPage NextPage

DATA

Data Page

record 3

-
Length / Data

5. File Structure and Indexing

5
-

23

Record Management
(Implementation View))

(3) Update Record


-

Update record 1


-

if length of new and old record is equal, overwrite record 1



else, delete old record and add new record in the end of


last record.

Free FileID ThisPage PrevPage NextPage

record 1

Slot[3] Slot[2] Slot[1] RIDCnt

updated data

Data Page

record 3


record 2

5. File Structure and Indexing

5
-

24

Indexing

5. File Structure and Indexing

5
-

25

Introduction

To find some information in a
book



Sequential Scan Operation


-

search the entire book



Index Scan Operation


-

search the index and go to the page

To find some data in the
database



Sequential Scan

화일

자체가

순서화
,
색인

존재

안함



Index Scan


데이타

화일은

무순
,
별도

색인

화일

존재


+ faster than a sequential scan


-

more slow than a sequential scan in insert/delete operation


much storage occupation than sequential file







5. File Structure and Indexing

5
-

26

Classification

Three types of access method



Sequential File Method



Index File Method


(B+ tree index structure)



Hashing




Sequential File Method

-

Logical order is equivalent to Physical order

-

All the records may be moved to maintain the physical order


for Insertion / Deletion operation

-

application : batch processing

-

See Fig. 6
-
12 in Lee S. H.

5. File Structure and Indexing

5
-

27

Classification
(cont'd)



Index File Method

-

Concept : Index File + Data File

-

We have
several indices
in
one

data file (multikey file)


(1)
Inverted File


-

There exist many index in a data file


-

See Fig. 6
-
16


(2)
Multilist File


-

records connected by pointer


-

See Fig. 6
-
17

5. File Structure and Indexing

5
-

28

B
+

tree

(3) B
+

tree


-

most popular techniques for organizing an index structure


-

duplication of the keys in index part and sequential order


-

direct access + sequential access


-

balanced tree



Component



Index part


-

the internal nodes


-

for rapid direct access to the keys in the leaves


Sequence set


-

the leaves


-

the linked list of the keys in sequential order

5. File Structure and Indexing

5
-

29

B
+

tree
(Definition)

Definition




Root have 0, 2, or n/2 to n subtree



All the node except root have n/2 ~ n subtree



All leaf nodes have the same depth



Internal node have (n
-

1) key values



Leaf node connected in sequential

차수
,
포인터















5. File Structure and Indexing

5
-

30

B
+

tree
(operations)

Structures



Leaf node









Internal node



P
1

K
1

P
2

K
2

P
3

K
3

.... P
n
-
1

K
n
-
1

P
n

file record with

search
-
key K
i

next leaf node

P
1

K
1

P
2

K
2

P
3

K
3

.... P
n
-
1

K
n
-
1

P
n

bucket pointers

(node) which have key value <= K
1

bucket pointers

(node) which have key value > K
n
-
1

5. File Structure and Indexing

5
-

31

B
+

tree
(operations)

Operations


Retrieval


log
n/2
(K) n : # of pointer,





K : # of total key value in a file


(ex) K = 1,000,000 n = 20 ~ 200



Number of Retrieval? 3 ~ 6


Insertion (Fig. 6.21)

(1) search and store (insertion ‘115’)

(2) search and reorder (insertion ‘129’)

(3) split (insertion ‘90’)




*

n/2



기존노드
,
나머지



새노드



기존

노드의



데이터는

부모

노드에

첨가

Also see Korth book p. 256







5. File Structure and Indexing

5
-

32

B
+

tree
(operations)



Deletion

(1) search and delete key only leaf node(deletion ‘132’)

(2) merge






삭제하니

노드가

empty


경우
(
부모에

포함
):



부모

노드에

삭제된

데이터가

있으면

삭제



(deletion ‘130’)





삭제하니

노드가

empty


경우

(
부모에

미포함
):



(deletion ‘110’)



부모

노드에

삭제된

데이터가

없으면




부모

노드를

합병
,
경우에

따라

리프

노드도

조정




Also see Korth book p. 256

5. File Structure and Indexing

5
-

33

Hashing

-

Hash
함수

값으로

원하는


(
데이타
)




번만에

접근

-

Hash
함수

값의

범위가

대체로

크고

(
무한대
),


값의집합은

적다
.

-

Hash
함수를



설계하여

가능한



값이

고르게

분포되게

한다
.



open problem (overflow)


종류


Static Hashing


-

버켓주소

집합

B


고정


-

초기에는

공간

낭비
,
데이타가

증가하면

성능

저하


-

주기적으로

hash
구조

재조정

(
소요시간
,
사용금지



단점
)


Dynamic Hashing (Extendible Hashing
대표적
; Fagin 1979, ACM TODS)


-

DB


축소
,
확장됨에

따라

버켓을

분리

융합하여



DB


변화에

대처한다
.


-

저장공간의

효율성


-

버켓의

분리
,
융합에

따른

부단이

다소

있으나
,



static hashing


비하면

경미하다
.

5. File Structure and Indexing

5
-

34

Hashing
(operation)



0

Directory

(
주소

테이블
)


0

bucket

(
데이타
)

depth:
접근에

필요한

bit

depth:
같은

비트



(1)
초기화

1


1

(2)
입력


0


0


1

Round (0101)


1

Perry(1111)

down(1010)

Red(1011) ?

5. File Structure and Indexing

5
-

35

Hashing
(operation)


2


1

(2)
입력

00

01

10

11

Round (0101)


2

down(1010)

Red(1011) ?


2

Perry(1111)

0



10




11

5. File Structure and Indexing

5
-

36

연습문제

1.
사용자가

데이타베이스에

접근하는

과정을

단계별로

설명하라
.


2. B+
트리의

특성은
?


3.
확장성

해싱의

동작

방법
?


4.
페이지

관리

기법을

설명하라
.