Database Design &Implementation

closebunkieΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

80 εμφανίσεις

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
1
>

Database Design &Implementation


First priority of database management:


Never lose the content of an operational database.


This ability to automatically preserve data off
-
line
is called
persistence
.


Hybrid Object
-
Relational DataBases (ORDB’s)
are one way to solve the persistence problem
when implementing object
-
oriented applications
with persistent data content.


Understanding ORDBs requires more details
about database architecture
(next slide).

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
2
>

Hybrid Object
-
Relational DataBases
(ORDB’s
)


ORDB’s can preserve network
-
structured data content by
what is called ‘
pointer swizzling
’: Every address pointer in
a memory
-
resident structured data object is mapped to/from
an object key or unique identifier in external RDB tables.
This must identify the table/class type and the row/object
number within the table.


The COOL framework supports ORDB management
with separate components for data and method access:


GEN generates C/C++ code for persistent I/O and navigation
through data structures in the ORDB. This code depends only
on the database model or schema, not on the application code.


LCP captures event
-
mediated flow of control information in a
state model database. Application programmers write the State
Actions

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
3
>

LCP (Life Cycle Prototype) Interpreter


LCP controls application behavior by capturing event
-
mediated flow
-
of
-
control information in a state model
database. It dispatches events to object instances by

delegating’

execution to a prototype ‘Active Class’.
This is one way of implementing inheritance.




LCP supports method access inside prototype object
instances by (compile
-
time) mapping of named state
actions to function pointers for each Active Class.
Developers write the application
-
specific code for the
State Actions.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
4
>

LCP Interpreter Performance

LCP performance depends on two kinds of overhead:



Control
-
flow interpretation from the State Model
database: this overhead increases as state actions
become more fine
-
grained.



Event
-
dispatching overhead: this is inherent in
distributed and multi
-
processor (non
-
shared
-
memory) applications, but not in shared
-
memory
multi
-
threaded ones.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
5
>

Relational Databases


An RBD ‘instance’ (a snapshot of its value at time t) is a set
of ‘tuples’; each tuple contains a sequence of scalar attribute
values for its coordinates or fields.


One tuple is stored externally as a record in a ‘flat’ file, and
internally as a sequence of data members for a single object.


Any homogeneous set of tuples can be viewed conceptually
as a table with rows and columns, with one column per field
or data member and one row per tuple or object instance.


‘Tuples generalize Vectors in the sense that coordinates of
the tuple space can be different types, while vectors have
homogeneous type for their coordinates.


A table of tuples has an abstract geometrical interpretation
as a set of points that ranges over a multi
-
typed coordinate
space.



Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
6
>

Relational Databases


An RBD is a set of relations, each of which is a set of
‘tuples’.


Each tuple represents an object with scalar attributes.


Each relation may be stored externally as a set of similarly
structured records in a ‘flat’ file.


A single relation or file can be viewed conceptually as a
table of rows.


The tuples of one relation define a geometrical set of points
in a multi
-
typed coordinate space. Tuples generalize Vectors
in the sense that coordinates of the tuple space can be
different types, while vectors have one homogeneous type
for all of their coordinates.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
7
>

Database Normalzation


Complex structured data types (and object instances) are
decomposed or

normalized’

into simple parts of
‘Second
Normal Form’ (
2NF
): (
no composite=structured attributes or
repeating groups of values are allowed
).




For maintenance and reliability reasons, the design is further
normalized
(
3NF
):

(
There are no redundant or indirectly
computable field values and all properties are stored in only
one place.)





[Refs: Sanders Ch. 3 & Appndx A; Kroenke Chapter 2 ]

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
8
>

Composite pkeys in RDB’s


Every tuple must have a unique field (or set of fields) called
its
primary key

(pkey) which uniquely identifies it.


A composite pkey for a child or component tuple is often
built by concatenating multiple key fields from a chain of
ancestors. This complicates pkey
-
to
-
fkey matching.


Example:

Dept
---
>Course
---
>Section (ERD on next slide)


CS has Dept# = 91 and OOAD has course# = 91.523.


Many courses have the same section # 201, so 201 is only a unique
identifier within sections of a
particular

course, just as 523 is only
a unique course# within a
particular

Department.


In my syllabus the course id 05f523 assumes only one Dept (91)
and one Section (291), but it combines a semester id (05f) for the
term=Fall 2005 with the course number 523 so I can partition
directories for this course over multiple terms.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
9
>

Composite Pkeys (2 Examples)

A unique pkey for my section of
SWEngI is a composite of Dept,
Course and Section number:


CS Dept.

pkey: 91

SWEngCourse

pkey: 91+523

Section
(one of many)

pkey: 91+523+291

(These are ’instance diagrams’, not ERD’s.

They show field values for a single table row;

an ERD only shows entity and field types.

Term

pkey: 05f

RJL Course

pkey: 523

Examples
:
SWEngI: (Schedule id vs. my directory id


My directory id for SWEngI
91.523 in fall 2005 associates
05f with 523:

Directory (one)

Pkey: 01f523

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
10
>

Surrogate Keys in RDB’s


The composite key for my section of OOAD in the Student
Information System (SIS) Database is a composite of Dept,
Course and Section number: 91.523.291.


For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’
pkey to replace the user
-
defined composite keys. This
improves code quality and performance
(by expediting the
fundamental RDB operation ‘join’: match pkeys to fkeys).



[Ref: EFCodd: Extending the RDB Model…, ACM TODS Dec’79]


Example:

Entity with old and new key name and value:


Entity:

alternate (old pkey):

surrogate (name = value):


DEpt

deptNo = 91


DEid

= DE0000xx


COurse

courseNo = 91+523

COid

= CO000yyy


SEction

sectNo= 91+523+291

SEid

= SE000zzz

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
11
>

RDB with Surrogate pfkeys:


A GEN Example
:
CS Dept View of SIS Database ERD:


Entity:

alternate (old pkey):

surrogate (name = value):


Dept

deptNo = 91


DEid

= DE000001


Course

courseNo = 91+523

COid

= CO000220


Section

sectNo= 91+523+291

SEid

= SE002601*

Department DE

DE000001

91

Course CO

CO000220

DE000001

523

Section SE*

SE002601

CO000220

291

A Persistence Requirement

(WHY?):


Each table has a mnemonic
abbreviation (DE,CO,SE) encoded
into the pkey value of its objects.

(Note that the fkey only
references the
immediate
ancestor

or container of
an object or tuple.)

(*Sections of
ALL Courses
are in table
SE)

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
12
>

Surrogate Keys in COOL/GEN


GEN uses surrogate pkeys

and matching
fkeys, but does not hide them. (OK for
CAD/CASE tools with hi
-
tech users.)*



Pkeys can never be re
-
used for new objects,

as long as fkeys exist that can reference their
former object (in old but still
-
in
-
use
database versions).

____________

* CJDate: The RDB Model (A
-
W 2001) faults EFCodd for
recommending that users should not see surrogate pfkey values.


Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
13
>

Persistent Object Identifiers


C++ and Java runtime objects have an
object
-
id

(oid),
typically represented by its virtual memory address on the
heap. This oid corresponds at least conceptually to the
pkey of an RDB tuple. This type of oid is not visible and
not persistent, because it disappears when the program
terminates
.


Large RDBMS’s can avoid loss of information and
achieve persistence by taking over or duplicating OS
virtual memory management functions: moving large
segments of virtual memory to/from mass storage in a
fail
-
safe manner, saving relative addresses on hard disk.


Another way to achieve persistence is to convert
pkey/fkey relationships to/from object references during
database import/export. This is done by COOL/GEN.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
14
>

Persistent Databases


Persistence

means that pkeys and fkeys are preserved
during export to mass storage or remote sites and re
-
import by the same or another DataBase Management
System (DBMS)


A relational database (RBD) supports inter
-
object
relationships by
foreign key

(fkey) fields. These are
both user
-
visible and persistent: they get saved in mass
storage if the program terminates.


The process of mapping RDB pkey
-
fkey associations to
and from C++ pointers is called ‘pointer swizzling’.

Database

in Main

Memory

Database

in Mass

Storage

import

export

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
15
>

Referential Integrity


The principle of
’Referential Integrity’:



To maintain valid database content, all fkey values
must match the unique primary key or object
identifier of another tuple, or else have the reserved
‘null’ (unknown or undefined) value.


N
-
ary relations (N
-
way associations) can be implemented
by a new
associative entity
, whose tuples contain exactly N
fkeys (plus optional non
-
key attributes).


Most relations are binary (N = 2). Note that fkeys may refer
to the same or different types.


See Example 2, Composite Pkeys (slide 9) and next slide 16

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
16
>

N
-
ary Relation (ERD Styles)


N
-
ary relations are many
-
to
-
many associations among N
object instances (of the same or different types).


N
-
way associations can be implemented by introducing a new
associative entity
, whose tuples contain exactly N fkeys (plus
optional non
-
key attributes).
(Most relations are binary: N = 2).



The diamond indicates a ternary relation among types AA, BB
and CC. [
It is superfluous if N=2, if the relation is one to many, or if
an associative entity replaces it.]

Optional

attributes

AA

BB


CC

AA

BB

CC

AABBCC

New Entity AABBCC gives these attributes a decent
home, and replaces the diamond.

(3 fkeys


inside)

Example

for N=3:

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
17
>

Extended ER Diagrams


When an RDB implements an Extended ERD (EERD), a
tuple’s fkeys or inter
-
object

cross
-
references can identify
either a super
-
class
object

or an associated parent or
container
object

(instance of a class).


These two fkey types have different semantic interpretations.


The principle of ‘Referential Integrity’ constrains fkey values to
the pkey value of some existing parent (AA or BB) instance.


To improve readability, EERD’s use different link layouts
for inheritance and for instance
-
level associations:

AA

BB

CC

0..*



In this example, CC both

inherits from AA and is a

component of the

composite entity BB.

It contains two fkeys,

(say) AAid and BBid.

1..1

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
18
>

Multiple Inheritance on EERD’s


Multiple inheritance requires an fkey to each superclass
object whose properties (attributes or methods) are
inherited.


In a prototype or delegation implementation of multiple
inheritance, superclass object[s] actually exist apart from
their corresponding subclass object[s]. Each sub
-
object has
one fkey to each of its direct ancestor object types.


For a C or C++, an object with multiple inheritance paths to
the same superclass type must be constrained to inherit data
from the same instance as well. (Path divergence is risky!)


In an ORDB, fkeys also support dynamic method
delegation. The COOL/LCP interpreter implements such a
dynamic map
(from a concrete object to its generic or prototype
Active Instance, from object class to generic Active Class).

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
19
>

ORDB via Prototype Delegation


An Extended ERD (EERD) can be implemented as either a
relational RDB, object
-
oriented OODB, or object
-
relational
ORDB. An OODB is supported by its own class
-
based data
representations.*


An ORDB can be class
-
based or prototype
-
based with
delegation. (GEN is prototype
-
based.)**


Prototype delegation does not rely on Class membership for
method inheritance
-

it creates object
-
level relationships to
support method delegation: ANY client object can ‘delegate’
any of its behavior to another server object via the oid
equivalent of an fkey.

*
CJDate insists that OODBs are fully supported by the relational
model without any extensions. (I agree :
-
) [Ibid]

** Breugg:OOSE (10.4.4) calls this a vertical mapping.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
20
>

GEN Database: Persistence

Our GEN tool imports an external RDB to a memory
-
resident
object
-
relational database (ORDB):


Its external
persistent

RDB format is a union of records
representing tuples of different types.


During import, fkeys are augmented or replaced by parent
and first
-
child and next
-
sibling object reference pointers,
which follow strict GEN naming conventions.


During export, pointers are discarded; fkeys are preserved
or restored for persistent storage in external RDB tuples.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
21
>

GEN Database: Schema Constraints


The external RDB schema (or EER Diagram) is first
converted to Third Normal Form.


Other attributes that would normally comprise a user
-
defined
(and typically composite) primary key can be removed
during schema or EERD conversion to Third Normal Form.


This eliminates redundant attributes that functionally depend
on some fkey instead of the pkey attribute.


The corresponding EERD network tends to have links which
are all 1 to many relations with mandatory parents and
optional children (1..1 to 0..* on UML class diagrams).


Inheritance links are 1..1 to 0..1 on class diagrams, because
at most one subclass instance can inherit from any given
ancestor class instance.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
22
>

GEN Database: External Format


Our GEN tool imports an external RDB to a memory
-
resident object
-
relational database (ORDB):


Its external RDB format is a union of records representing
tuples of different types:


Every tuple record has an integral and immutable
‘surrogate’ primary key attribute (and object id).


Different tuple types have pairwise disjoint pkey ranges.


All foreign keys (fkeys) use this surrogate pkey value to
refer to their parent (container or superclass) record type.


Chgen and gencpp are external
-
database
-
compatible.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
23
>

GEN Database: Internal Format


During import, fkeys are augmented or replaced by direct
parent object pointers plus first
-
child and next
-
sibling
object reference pointers. These are constructed from fkey
names following strict GEN naming conventions.


This results in an internal ORDB format which is a set of
multiply
-
threaded linked lists of parent
-
to
-
children and
super
-
to
-
subclass object (tuple instance) reference pointers.


Parent
-
pointers support direct access to parent table
attributes, replacing pair
-
wise join queries in an RDB.


For each 1
-
to
-
many parent
-
child relationship, chgen
provides a child_loop macro while gencpp provides a for
-
each iterator.


Gencpp has two optional modes: chgen or C++/STL.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
24
>

GEN Database: Import/Export

GEN creates two schema
-
based import/export utilities:



pr_load parses tuples and imports an external RDB into a
memory
-
resident object
-
relational database (ORDB);


pr_dump exports the modified ORDB back to the persistent
external RDB.


During import, fkeys are augmented or replaced by direct
parent pointers plus first
-
child and next
-
sibling object
reference pointers. These are constructed from fkey names
following strict GEN naming conventions, Super
-

and sub
-
class objects are also connected in the same way.


This results in an internal ORDB format which is a set of
multiply
-
threaded linked lists from each parent through each
of its child
-
sets, that supports parent
-
child JOINs.

[Ref: Jiri Soukup: Taming C++, A
-
W 1994; see also $PH/JiriSoukup]

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
25
>

Importing RDB’s to C++/Java


If the RDB is imported to an object
-
relational database
implemented in C++ or Java, then during import the fkey
fields of RDB tuple types should be converted to
corresponding C++/Java object reference types.


Caveat
/
pre
-
condition:

All fkeys implied by links on the
RDB’s data model or EERD must conform to inheritance
and type constraints of the language (C++ or Java).


Fkeys in an RDB can also support non
-
exhaustive or over
-
lapping subclasses (going beyond C++ constraints).


Fkeys and object references can also support dynamic
migration (of an object among the subclasses of its class).


Example: An object may make transitions among OLC
states (states become subclasses of the object’s class).

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
26
>

Object
-
Relational Databases
-


Prototypes and Delegation

The last few slides were inspired by Shlaer
-
Mellor
-
User
Group email related to Divergent Inheritance (parallel
hierarchies). This motivates the use of
prototypes

and
delegation

to explain the static information architecture that is
supported by COOL’s chGEN/GENcpp code generator, and
illustrates concurrent sub
-
state machine models for dynamic
behavior.



To: shlaer
-
mellor
-
users@projtech.com


Subject: Re: (SMU) Polymorphic events and other
paranormal activity


Message 10/734 From lschneid@eng.delcoelect.com


Sep 04, 01 08:45:33 AM


responding to Fontana: . . .


Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
27
>

Divergent Hierarchies



responding to Fontana:


> I think Jay was driving at divergent hierarchies, not multiple
inheritance, eg:


> relationship S1
-

supertype Dog, subtypes BigDog and SmallDog


> relationship S2
-

supertype Dog, subtypes BlackDog and WhiteDog


Relationship S1:

(BLACK xor WHITE)

(Mutex and exhaustive):

DOG CLASS

Relationship S2:

(BIG x
or

SMALL)


(Mutex and exhaustive):

Black Dog

White Dog

Big Dog

Small Dog

Divergent Hierarchies Example:


> relationship S1
-

supertype Dog, subtypes BigDog xor SmallDog

> relationship S2
-

supertype Dog, subtypes BlackDog xor WhiteDog

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
28
>

OLC’s with Concurrent Sub
-
states


> Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC):


> BigDog: Woofing <
--
> Sleeping


> SmallDog: Yipping <
--
> Skittering


> BlackDog: Panting <
--
> Drooling


> WhiteDog: Shedding <
--
> Scratching


> Now create one instance of Dog
-

let's say it is a big black dog, with a


> dogId = 13. It must be in one of the BigDog states (Woofing or Sleep
-
> ing),


> AND in one of the BlackDog states (Panting or Drooling).

Dog #13 (Big and Black):

Woofing

Sleeping

Panting

Drooling

Big:

Black:

(Illustrates direct product composition of concurrent STDs)


Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
29
>

Merging OLC Behaviors of

Concurrent Subclasses:


Each of the 4 subclasses has its own ‘object lifecycle’ (OLC);


E.g. every Big&Black Dog must be in
one

of the BigDog states (Woofing
or Sleeping), AND in
one

of the BlackDog states (Panting or Drooling).


Dog #13 (Big and Black) has the behavior/activity of
both

BigDogs and
BlackDogs:

Woofing

Sleeping

Panting

Drooling

Big Dog OLC:

Black Dog OLC:

Woof&

Pant

Woof&

Drool

Sleep

&Pant

Sleep&

Drool

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
30
>

Divergent Hierarchies
-

revisited (1)


C++ does not support divergent class hierarchies.


One alternate is prototype objects with delegation.


RDB’s
can

support prototypes and delegation:


In our example, each dog object belongs to one subclass for color,
and

simultaneously to another subclass for size.


That is, a ‘real’ dog object simultaneously belongs to, and inherits
from, exactly one of the subclasses in each inheritance tree above.


The next slide shows (by its messiness) that multiple inheritance is
best avoided.

Partition S1:

(BLACK xor WHITE)

(Mutex and exhaustive):

DOG CLASS

Partition S2:

(BIG x
or

SMALL)


(Mutex and exhaustive):

Black Dog

White Dog

Big Dog

Small Dog

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
31
>

Divergent Hierarchies
-

revisited (2)


Level 3 models concrete ‘leaf’ objects or ‘real’ dogs, which
simultaneously belong to a distinct pair of subclasses at level 2 of
the inheritance tree (compositional inheritance of properties).


So there are really 4 leaf classes at level 3, below level 2 above.


Each leaf class instance at level 3 has exactly two paths up to level
1; both paths must end up at the
same

root object (Dog instance).

Partition S1:

(BLACK xor WHITE)

(Mutex and exhaustive):

DOG CLASS

Partition S2:

(BIG x
or

SMALL)


(Mutex and exhaustive):

Black Dog

White Dog

Big Dog

Small Dog

Big Black Dog

Big White Dog

Small Black Dog

Small White Dog

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
32
>


Composition or Implementation
Inheritance


With compositional inheritance, dogs will inherit from two
‘component’ classes: Color and Size.


This is ‘impure’ multiple inheritance in C++ ( impure
because the two ancestor classes have nothing in common
with animals, which may not behave well as clients of
Color or Size ancestor methods).


Java does not have multiple inheritance
-

but any class may
‘implement’ the interfaces Color’ and ‘Size’ instead.


Dogs must then be eligible to inherit (C++) or implement
(Java) all the methods of the Color and Size classes
-

an
undesirable compromise. Over
-
riding only hides the mis
-
match between class Dog and Color or Size classes.

Obj
-
RelDBv05f.ppt
-

RJL 050907
-

<
33
>

References


Frank & Ulrich: ”Delegation: An Important Concept for the
Appropriate Design of Object Models”, JOOP June 2000 (p.13
-
17,44)


Eliens:
Principles of OO Software Dev.

2ed., AWL 2000 (Sect. 5.4:
Prototypes
-

delegation vs. inheritance)


Kilov/Ross:
Information Models
, PH 1994 (Not about delegation, but
covers multiple/concurrent/overlapping subclass membership.)


Lee &Tepfenhart:
UML and C++: A Practical Guide to OO Dev,

2ed,
PH 2001(pp206
-
210) (Multiple Inheritance examples Fig. 12
-
4,12
-
5)


Sanders:
Data Modeling,

Boyd
-
Fraser/ITP 1995 (Ch. 3 and Appendix)


Jiri Soukup: Taming C++, A
-
W 1994; see also $PH/JiriSoukup


CJDate: The RDB Model, A
-
W 2001


CJDate: Database Systems, 7ed! A
-
W 2000


EFCodd: Extending the RDB Model…, ACM TODS Dec’79


Kroenke: Database Concepts, 2nd Ed, P
-
H 2005 Chapter 2


Breugge & Dutoit: OO Software Engineering, A
-
W 2004.