Object-Oriented Database

processroguishΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

188 εμφανίσεις

CIS
-
552

Introduction

1

Object
-
Oriented Database


New Database Applications


Object
-
Oriented Data Models


Object
-
Oriented Languages


Persistent Programming Languages


Persistent C++ Systems

CIS
-
552

Introduction

2

New Database Applications


Data models designed for data
-
processing
-
style
applications are not adequate for new technologies
such as computer
-
aided design, computer
-
aided
software engineering, multimedia, and image
database, and document/hypertext databases.


These new applications requirement the database
system to handle features such as:


Complex data types


Data encapsulation and abstract data structures


Novel methods for indexing and querying

CIS
-
552

Introduction

3

Object
-
Oriented Data Model


Loosely speaking, an object corresponds to an
entity in the E
-
R model.


The
object
-
oriented paradigm

is based on

encapsulating

code and data related to an object
into a single unit.


The object
-
oriented data model is a logical model
(like the E/R model).


Adaptation of the object
-
oriented programming
paradigm (e.g. Smalltalk, C++) to database
systems.

CIS
-
552

Introduction

4

Object Identity


An object retains its identity even if some or all of the
values of the variables or definitions of methods change
over time.


Object identity is a stronger notion of identity than in
programming languages or data models not based on
object orientation.


Value


data value; used in relational systems.


Name


supplied by user; used for variables in procedures.


Build
-
in


identity built into data model or programming language


No user
-
supplied identifier is required.


Form of identity used in object
-
oriented systems.

CIS
-
552

Introduction

5

Object Identifiers

Object identifiers

used to uniquely identify objects


Can be stored as a field of an object, to refer to another
object.


E.g., the
spouse

field of a
person

object may be an
identifier of another
person

object


Can be system generated (created by database) or
external (such as social
-
security number)

CIS
-
552

Introduction

6

Object Containment


Each component in a design may contain other components


Can be modeled as containment of objects. Objects containing other
objects are called
complex
or
composite
objects.


Multiple levels of containment create a
containment hierarchy:
links
interpreted as
is
-
part
-
of
, not
is
-
a
.


Allows data to be viewed at different granularities by different users.

bicycle

wheel

brake

frame

gear

rim

lever

cable

spokes

tire

pad

CIS
-
552

Introduction

7

Object
-
Oriented Languages


Object
-
oriented concepts can be used as a design
tool, and be encoded into, for example, a relational
database (analogous to modeling data with E/R
diagram and then converting to a set of relations).


The concepts of object orientation can be
incorporated into a programming language that is
used to manipulate the database.


Object
-
relational systems


add complex types and
object
-
orientation to relational languages.


Persistent programming languages


extend object
-
oriented programming language to deal with databases
by adding concepts such as persistence and collections.

CIS
-
552

Introduction

8

OO
-
DBMS


Save objects created by an OOP language to
disk (make objects persistent).


Ensure that if an object is saved, all of the
objects it references are saved.


Allow saved objects (and the objects they
reference) to be retrieved from disk.


Provide transaction management and
concurrency control to maintain data
integrity.

CIS
-
552

Introduction

9

Persistent Programming Language


Persistent programming languages:


Allow objects to be created and stored in a database without any
explicit format changes (format changes are carried out
transparently).


Allow objects to be manipulated in
-
memory


do not need to
explicitly load from or store to the database.


Allow data to be manipulated directly from the programming
language without having to go though a data manipulation
language like SQL.


Due to power of most programming languages, it is easy to
make programming errors that damage the database.


Complexity of languages makes automatic high
-
level
optimization more difficult.


Do not support declarative querying very well

CIS
-
552

Introduction

10

Persistence of Objects


Approaches to make transient objects persistent include
establishing persistence by:


Class


declare all objects of a class to be persistent;
simple but inflexible.


Creation


extend the syntax for creating transient
objects to create persistent objects.


Marking


an object that is to persist beyond program
execution is marked as persistent before program
termination.


Reference


declare (root) persistent objects; objects are
persistent if they are referred to (directly or indirectly)
from a root object.

CIS
-
552

Introduction

11

Object Identity and Pointers


A persistent object is assigned a persistent object identifier.


Degrees of permanence of identity:


Intraprocedure


identity persists only during the
execution of a single procedure.


Intraprogram


identity persists only during execution
of a single program or query.


Interprogram


identity persists from one program
execution to another.


Persistent


identity persists through program
executions and structural reorganizations of data;
required for object
-
oriented systems.

CIS
-
552

Introduction

12

Object Identity and Pointers (Cont.)


In O
-
O languages such as C++, an object identifier
is actually an in
-
memory pointer.


Persistent pointer


persists beyond program
execution; can be thought as a pointer into the
database.

CIS
-
552

Introduction

13

Storage and Access of Persistent Objects

How to find objects in the database:


Name objects (as you would name files)


cannot scale to
large number of objects.


Typically given only to class extents and other
collections of objects, but not to objects.


Expose object identifiers or persistent pointers to the
objects


can be stored externally.


All objects have object identifiers.

CIS
-
552

Introduction

14

Storage and Access of Persistent Objects (Cont.)

How to find objects in the database (Cont):


Store collections of objects and allow programs to iterate
over the collections to find required objects.


Model collections of objects as
collection types


Class extent


the collection of all objects belonging to
the class; usually maintained for all classes that can
have persistent objects.


CIS
-
552

Introduction

15

Persistent C++ System


C++ language allows support for persistence to be
added without changing the language


declare a class called
Persistent_Object

with
attributes and methods to support persistence


Overloading
-

ability to redefine standard function names
and operators (i.e., +,
-
, the pointer dereference operator

) when applied to new types


Providing persistence without extending the C++
language is


relatively easy to implement


but more difficult to use

CIS
-
552

Introduction

16

ODMG C++ Object Definition Language


Standardized language extensions to C++ to support persistence


ODMG standard attempts to extend C++ as little as possible, providing
most functionality via template classes and class libraries


Templates class
Ref<class>

used to specify references (persistent
pointers)


Template class
Set<class>

used to define sets of objects. Provides
methods such as insert_element and delete_element.


The C++ object definition language (ODL) extends the C++ type
definition syntax in minor ways.


Example: Use notation
inverse
to specify referential integrity
constraints.

CIS
-
552

Introduction

17

ODMG C++ ODL: Example

Class Person : public Persistent Object {

public:


String name;


String address;

};

class Customer : public Person {

public:


Date member_from;


int customer_id;


Ref<Branch> home_branch;


Set<Ref<Account>> accounts
inverse

Account::owners;

};

CIS
-
552

Introduction

18

ODMG C++: Example (Cont.)

Class Account : public Persistent_Object {

private:


int balance;

public:


int number;


Set<Ref<Customer>> owners
inverse

Customer::accounts;


int find_balance();


int update_balance(int delta);

}

CIS
-
552

Introduction

19

ODMG C++ Object Manipulation Language


Uses persistent versions of C++ operators such as
new(db).


Ref<Account> account = new(bank_db) Account;

new allocates the object in the specified database, rather than
in memory


Dereference operator


when applied on a

Ref<Customer>
object in memory (if not already
present) and returns in
-
memory pointer to the object.


Constructor

for a class


a special method to initialize
objects when they are created; called automatically when
new is executed


Destructor

for a class


a special method that is called
when objects in the class are deleted.

CIS
-
552

Introduction

20

ODMG C++ OML: Example

int create_account_owner(String name, String address) {


Database * bank_db;


bank_db = Database::open(“Bank
-
DB”);


Transaction Trans;


Trans.begin();



Ref<Account> account = new(bank_db) Account;


Ref<Customer> cust = new(bank_db) Customer;


cust
-
>name = name;


cust
-
>address = address;


cust
-
>accounts.insert_element(account);


account
-
>owners.insert_element(cust);


… Code to initialize customer_id, account number, etc.


Trans.commit();

}

CIS
-
552

Introduction

21

ODMG C++ OML: Example of Iterators

int print_customers() {


Database * bank_db;


bank_db = Database::open(“Bank
-
DB”);


Transaction Trans;


Trans.begin();


Iterator<Ref<Customer>> iter =
Customer::all_customer.create_iterator();


Ref<Customer> p;


while (iter.next(p)) {



print_cust(p);


}


Trans.commit();

}


Iterator construct helps step through objects in a collection


CIS
-
552

Introduction

22

Mapping of Objects to Files


Mapping objects to files is similar to mapping tuples to
files in a relational system; object data can be stored using
file structures.


Objects in O
-
O databases may lack uniformity and may be
very large; such objects have to be managed differently
from records in a relational system.


Set fields with a small number of elements may be implemented
using data structures such as linked lists.


Set fields with a larger number of elements may be implemented as
B
-
trees, or as separate relations in the database.


Set fields can also be eliminated at the storage level by
normalization.

CIS
-
552

Introduction

23

Mapping of Objects to Files (Cont.)


Objects are identified by an object identifier
(OID); the storage system needs a mechanism to
locate an object given its OID.


logical identifiers

do not directly specify an object’s
physical location; must maintain an index that maps an
OID to the object’s actual location.


physical identifiers

encode the location of the object
so the object can be found directly. Physical OIDs
typically have the following part:

1. a volume or file identifier

2. a page identifier within the volume or file

3. an offset within the page

CIS
-
552

Introduction

24

Management of Persistent Pointers


Physical OIDs may have a
unique identifier.

This identifier
is stored in the object also and is used to detect references
via dangling pointers.

Vol. Page Offset
Unique-Id
Physical Object Identifier

Unique-Id
Data ……
Object

(a) General Structure

(b) Example of use

51
… data …
6.32.45608
51
6.32.45608
50
6.32.45608

Good OID

Bad OID

Location

Unique
-
Id

Data

CIS
-
552

Introduction

25

Management of Persistent Pointers (Cont.)


Implement persistent pointers using OIDs; persistent pointers are
substantially longer than are in
-
memory pointers


Pointer swizzling cuts down on cost of locating persistent objects
already in memory.


Software swizzling (swizzling on pointer dereference)


When a persistent pointer is first dereferenced, it is
swizzled

(replaced by an in
-
memory pointer) after the object is located in
memory.


Subsequent dereferences of the same pointer become cheap


The physical location of an object in memory must not change if
swizzled pointers point to it; the solution is to pin pages in
memory


When an object is written back to disk, any swizzled pointers it
contains need to be
unswizzled
.

CIS
-
552

Introduction

26

Hardware Swizzling


Persistent pointers in objects need the same amount of
space as in
-
memory pointers


extra storage external to the
object is used to store rest of pointer information.


Uses virtual memory translation mechanism to efficiently
and transparently convert between persistent pointers and
in
-
memory pointers.


All persistent pointers in a page are swizzled when the
page is first read in.


Thus programmers have to work with just one type of
pointer, i.e. in
-
memory pointer.


Some of the swizzled pointers may point to virtual memory
addresses that are currently not allocated any real memory.

CIS
-
552

Introduction

27

Hardware Swizzling


Persistent pointer is conceptually split into two parts: a
page identifier, and an offset within the page.


The page identifier in a pointer is a short indirect
pointer: each page has a translation table that provides a
mapping from the short page identifiers to full database
page identifiers.


Translation table for a page is small (at most 1024
pointers in a 4096 byte page with 4 byte pointers)


Multiple pointers in a page to the same page share same
entry in the translation table.

CIS
-
552

Introduction

28

Hardware Swizzling (Cont.)


Page image when on disk (before swizzling)

2395
255
Page ID Off.
4867
020
Page ID Off.
4867
170
Page ID Off.
2395
679.34.28000
4867
519.56.84000
Object 2

Object 1

Object 3

PageID

FullPageID

Translation Table

CIS
-
552

Introduction

29


When an in
-
memory pointer is dereferenced, if the operating system
detects the page it points to has not yet been allocated storage, a
segmentation violation

occurs.


mmap

call associates function to be called on segmentation violation


The function allocates storage for the page and reads in the page from
disk.


Swizzling is then done for all persistent pointers in the page (located
using object type information).


If pointer points to a page not already allocated a virtual memory
address, a virtual memory address is allocated (preferably the
address in the short page identifier if it is unused). Storage is not
yet allocated for the page.


The page identifier in pointer (and translation table entry) are
changed to the virtual memory address of the page.

Hardware Swizzling (Cont.)

CIS
-
552

Introduction

30

Page image after swizzling


Page with short page identifier 2395 was allocated address 5001.
Observe change in pointers and translation table.


Page with short page identifier 4867 has been allocated address 4867.
No change

in pointer and translation table.

Hardware Swizzling (Cont.)

5001
255
Page ID Off.
4867
020
Page ID Off.
4867
170
Page ID Off.
5001
679.34.28000
4867
519.56.84000
Object 2

Object 1

Object 3

PageID

FullPageID

Translation Table

CIS
-
552

Introduction

31


After swizzling, all short page identifiers point to virtual
memory address allocated for the page


Functions accessing the objects need not know it has persistent
pointers!


Can reuse existing code and libraries that use in
-
memory pointers.


If all pages are allocated the same address as in the short
page identifier, no changes required in the page!


No need for deswizzling


page after swizzling can be
saved back directly to disk


A process should not access more pages than size of virtual
memory


reuse of virtual memory addresses for other
pages is expensive.

Hardware Swizzling (Cont.)

CIS
-
552

Introduction

32

Disk versus Memory Structure of Objects


The format in which objects are stored in memory may be
different from the format in which they are stored on disk
in the database. Reasons are :


software swizzling


structure of persistent and in
-
memory
pointers are different


database accessible from different machines, with different data
representations


Make the physical representation of objects in the database
independent of the machine and the compiler.


Can transparently convert from disk representation to form
required on the specific machine, language, and compiler,
when the object (or page) is brought into memory.

CIS
-
552

Introduction

33

Large Objects


Very large objects are called
binary large objects

(
blobs
)
because they typically contain binary data. Examples
include:


text documents


Graphical data such as images and computer aided designs


audio and video data


Large objects may need to be stored in a contiguous
sequence of bytes when brought into memory.


If an object is bigger than a page, contiguous pages of the buffer
pool must be allocated to store it.


May be preferable to disallow direct access to data, and only allow
access through a file
-
system
-
like API, to remove need for
contiguous storage.

CIS
-
552

Introduction

34

Modifying Large Objects


Use B
-
tree structures to represent object: permits reading
the entire object as well as updating, inserting, and deleting
bytes from specified regions of the object.


Special
-
purpose application programs outside the database
are used to manipulate large objects:


Text data treated as a byte string manipulated by editors and
formatters.


Graphical data is represented as a bit map or as a set of geometric
objects; can be managed within the database system or by special
software (e.g. VLSI design).


Audio/video data is typically created and displayed by separate
application software and modified using special purpose editing
software.


checkout/checkin
method for concurrency and version control