Logical Database Design

radiographerfictionΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 4 χρόνια και 8 μέρες)

89 εμφανίσεις

Database Systems

Logical Database Design


Fusheng Wang


Department of Biomedical Informatics

Department of Mathematics and Computer Science

Emory University

Database Systems

Objectives of Database Management


Data availability

make an integrated collection of
data available to a wide variety of users


Data integrity

insure correctness and validity


Privacy (the goal) and security (the means)


Management control

DBA: lifecycle control,
training, maintenance


Data independence (a relative term)
--

avoids
reprogramming of applications, allows easier
conversion and reorganization


physical data independence

program unaffected by
changes in the storage structure or access methods


logical data independence

program unaffected by changes
in the schema

Overview

Database Systems

Database Life Cycle


Requirements analysis


data required for processing


natural data relationships


software platform (OS, DBMS)


Logical database design


conceptual modeling


view integration


transformation of conceptual data model to tables


normalization of tables


Physical database design


selection of indexes (access methods)


partitioning and clustering


Database implementation,
monitoring,&modification

Overview

Database Systems

Database

Life Cycle

Overview

Database Systems

Requirements Analysis


Purpose: identify the real
-
world situation in enough
detail to be able to define database components.
Collect two types of data


natural data: input to the database


processing data: output from the database


Data and process dictionary entries


entity description


attribute description


relationship description


Process (application) description


Interviews


interviews at different levels: management,
employee, end user


Requirements Analysis

Database Systems

Requirement Analysis
(cont’d)

Requirements Analysis


Example requirement analysis document:


Pathology Image Database:

http://confluence.cci.emory.edu:8090/display/PIDB/Requirements


Database Systems

Conceptual Data Modeling



The data requirements are analyzed and modeled
by using an ER or UML diagram


Logical Design

Database Systems

View Integration


Multiple views of the model are consolidated into a
single model

Logical Design

Database Systems

Transformation of the Conceptual Data
Model to SQL Tables


Based on data modeling constructs and mapping
rules, each relationship and its associated entities
are transformed into a set of DBMS
-
specific
candidate


relational


tables

Logical Design

Database Systems

Normalization of SQL Tables


Decomposition of tables and removal of update
anomalies


Logical Design

Database Systems

Physical Design


The purpose of physical design is to then optimize
performance


Indexing


Clustering


Partitioning


Materialized views


Denormalization


Physical Design

Database Systems

Database Implementation, Monitoring,
and Modification


Database created with the data definition language
(DDL)


The data manipulation language (DML) can be used
to query and update the database, as well as to set
up indexes and establish constraints


Monitoring indicates whether performance
requirements are being met; If not, modifications
should be made

Implementation

Database Systems

Conceptual Data Modeling


Driving component of logical database design


The goal is to capture real
-
world data requirements
in a simple and meaningful way


Schema diagrams were formalized in the 1960s by
Charles Bachman


The
entity
-
relationship (ER)
approach was first
presented in 1976 by Peter Chen


The
Unified Modeling Language (UML)
was
introduced in 1997 by Grady
Booch

and James
Rumbaugh
, now a standard for specifying and
documenting large
-
scale software systems


The data modeling component has many similarities
with ER model


ER Model

Database Systems

Basic ER Modeling Concepts


Entity
-

a class of real world objects having
common characteristics and properties about which
we wish to record information


Relationship
-

an association among two or more
entities


occurrence

-

instance of a relationship is the collective
instances of the related entities


degree
-

number of entities associated in the
relationship (binary, ternary, other n
-
ary
)


connectivity

-

one
-
to
-
one, one
-
to
-
many, many
-
to
-
many


existence dependency
(constraint)
-
optional/mandatory

ER Model Concepts

Database Systems

Basic ER Modeling Concepts
(cont’d)


Attribute:
a characteristic of an entity or relationship


Identifier
-

uniquely determines an instance of an entity


Identity dependence
-

when a portion of an identifier is
inherited from another entity


Multi
-
valued
-

same attribute having many values for one
entity


Surrogate
-

system created and controlled unique key (e.g.
Oracle’s “create sequence”)


ER Model Concepts

Database Systems

Basic ER Modeling Concepts
(cont’d)

ER Model Concepts

Database Systems

ER Model Concepts


Role: the function an entity plays in a relationship

Degree

Database Systems

Connectivity and Existence

ER Model Concepts

Database Systems

Generalization:
Supertypes

and Subtypes


Similarities are generalized to a super
-
class entity,
differences are specialized to a subclass entity


Subclass inherits the primary key of the super
-
class, super
-
class has common
nonkey

attributes,
each subclass has specialized non
-
key attributes



Overlapping subtypes

Disjoint subtypes

ER Model Relationships

Database Systems

Aggregation


“part
-
of” relationship among entities to a higher
type aggregate entity (“contains” the inverse)


There are no inherited attributes in aggregation;
each entity has its own unique set of attributes

ER Model Relationships

Database Systems

Ternary Relationships


One
-
to
-
one
-
to
-
one







One
-
to
-
one
-
to
-
many


ER Model Relationships

Database Systems

Ternary Relationships
(cont’d)


One
-
to
-
many
-
to
-
many






One
-
to
-
many
-
to
-
many







The number of “one” entities implies the number of
FDs used to define the relationship semantics


ER Model Relationships

Database Systems

Example Relational Model

Database Systems

Example ER Model

Database Systems

Modeling Tools


CA
ERwin

Data Modeler
(ER model)


Enterprise Architect
(UML)


IBM
InfoSphere

Data Architect
(ER. $6,270!)


Formally
Rational Data
Architect. Rational
developed
Unified
Modeling
Language (by Grady
Booch
,
I
var

Jacobson and James
Rumbaugh
) and was acquired by IBM for $2.1 billion in 2003.


Oracle SQL Developer
(ER)


PowerDesigner

(ER, Sybase)


ArgoUML

(UML, open source)





Mix of ER and UML


http://en.wikipedia.org/wiki/Comparison_of_data_modeling_tools

http://www.databaseanswers.org/modelling_tools.htm

Modeling Tools

Database Systems

Modeling Tools

Example Logical Model with IBM
Infosphere Data Architect (ER)

Database Systems

Modeling Tools

Example Logical Model with Enterprise
Architect (UML)

Database Systems

IDA: Entity

“Foreign key relationship”: the key of the parent entity will be

copied to the child entity as its foreign key (1:n or 1:1)

Logical Modeling with IDA

Database Systems

IDA Relationship


Identifying foreign key relationship
:
the
existence of the child entity is dependent on the
parent


Weak entity

Logical Modeling with IDA

Database Systems

IDA Relationship

Non
-
Identifying foreign key relationship
:
relationships
between two independent
entities

(1:n relationship)


Optional
: a parent instance is not required.


Mandatory:
a parent instance is
required.
e
.g
: an
employee must be assigned to a
department.

Logical Modeling with IDA

Database Systems

IDA Relationship

Non
-
Identifying one
-
to
-
one foreign key
relationship


One instance in one entity has only a relationship
with a single instance in the other entity.

Logical Modeling with IDA

Database Systems

IDA Relationship


Many
-
to
-
many
: the

relationship

between two
entities in which each row in one entity can be
related to more than one row in the other entity.
For example, each
employee can
have multiple
projects,
and multiple
employees
can work on a
project.

Logical Modeling with IDA

Database Systems

IDA Relationship


Generalization: Represent
a grouping of common
attributes into distinct
and specific
entity types


Is
a form of 1:1 relationship


Subtype
identified by a
defining
attribute

Logical Modeling with IDA

Database Systems

Transformation to Physical Models

Database Systems

Transformed Physical Model

Transformation Rules

Database Systems

Generate SQL (DDL)

Transformation Rules

Database Systems

Generate SQL (DDL)

Transformation Rules

Database Systems

Transforming Conceptual Data Model to
SQL



Three types of tables will be produced after the
transformation


SQL table derived from an entity, with same
information


SQL table with the embedded foreign key of the
parent entity. Always occurs for entities from the
“many” (child) side of
one
-
to
-
many

relationship, or
from an entity of a
one
-
to
-
one

relationship


SQL table derived from a relationship, containing FKs
of all the entities in the relationship (
many
-
to
-
many

relationship)

Transformation Rules

Database Systems

Entities


Create a table for each entity


The table name is E (conventionally: E + ’s’).


The columns of this table are the attributes of the
entity.


The primary key of the table is the primary key of
the entity type.


If E’s key is composite, so will be the relational key. If
E has no key, add an artificial key to the table.

Transformation Rules

Database Systems

Weak Entities


Weak entities differ from entities only
in their need for keys from other
entities to establish their uniqueness


When a weak entity is translated, the
key attributes of the owner entity are
added as a key and foreign key


It makes sense to specify
ON

DELETE
CASCADES

for the foreign key: if the
parent entity is deleted, all the weak
entities will be removed from the DB
state, too

Transformation Rules

Database Systems

Referential A
ctions

A DBMS ensures
data integrity if rows in a referenced
table are to be
deleted

(or
updated
)


CASCADE:
Whenever
rows in the master (referenced) table
are deleted (resp. updated), the respective rows of the child
(referencing) table with a matching foreign key column will get
deleted (resp. updated) as
well


RESTRICT:
A
value cannot be updated or deleted when a row
exists in a foreign key table that references the value in the
referenced
table


NO
ACTION:
very much alike RESTRICT.
The main difference
is
that with NO ACTION the referential integrity check is done
after trying to alter the
table


SET DEFAULT , SET
NULL:
the
same for both ON DELETE or
ON UPDATE: The value of the affected referencing attributes is
changed to NULL for SET NULL, and to the specified default
value for SET
DEFAULT





Transformation Rules

Database Systems

One
-
to
-
Many Relationships


Add the key of the “one” side as a column to the
“many” table to implement R


This column will be a foreign key referencing a row
in the table representing the related entity


Use relationship and role to name FK column

Transformation Rules

Database Systems

One
-
to
-
Many Relationships
(2)


If the parent is mandatory: minimum cardinality is
1, null values are not allowed in the foreign key
column


If the parent is optional, minimum cardinality is 0,
null values are allowed in the foreign key column


The foreign key is null for those entities that do not
participate in R at all

Transformation Rules

Database Systems

One
-
to
-
Many Relationships: Relationship
Attributes


To transform one
-
to
-
many relationship attribute(s),
store the relationship attribute(s) together with the
reference to the related entity

Transformation Rules

date

date
date
,

Database Systems

Many
-
to
-
Many Relationships


A new table R will be added, which contains the
keys of both participating entity types


These columns act as foreign keys referencing the
entity tables, and also form a composite key for R

Transformation Rules

Database Systems

One
-
to
-
One Relationships


When both entities are mandatory, each entity
becomes a table, and the key of either entity can
appear in the other entity’s table as a foreign key


Transformation Rules

Database Systems

One
-
to
-
One Relationships
(2)


One of the entities in an
optional
relationship
should contain the foreign key of the other entity in
its transformed table


The other entity could also contain a foreign key
(dept no) with nulls allowed, more storage needed

Transformation Rules

optional

Database Systems

Recursive Relationships


Same rules as other binary relationships

Transformation Rules

Database Systems

Recursive Relationships
(2)

Transformation Rules

Database Systems

Transformation Rules

Database Systems

Transformation Rules

Database Systems

Transformation Rules

Database Systems

UML Based Data Modeling


Unified Modeling Language (UML) is a standardized
general
-
purpose modeling language in the field of
object
-
oriented software engineering


Different types of UML diagrams: structure and
behavior; 14 types for UML 2.2


The
class

and the
activity

diagram types are
particularly useful for database design


UML class diagrams and ER models are similar in
both form and semantics


UML

Database Systems

Basic Class Diagram Notation


Classes







Relationships (instance level)

UML

Database Systems

Association


Association represents the static relationship shared
among the objects of two classes.


It can be named, and the ends of an association
can be adorned with role names, multiplicity, etc





Associations are always assumed to be bi
-
directional


In a
uni
-
directional association, two classes are
related, but only one class knows that the
relationship exists

UML

Database Systems

Aggregation


Aggregation

is a variant of the "has a" or
association relationship; It is an association that
represents a part
-
whole or part
-
of relationship


Aggregation

can occur when a class is a collection
or container of other classes, but where the
contained classes do not have a strong

life cycle
dependency

on the
container



UML

Database Systems

Aggregation Class

Relationships between objects of two classes,


with attributes on
relationships



UML

Database Systems

Composition


Composition

is a stronger variant of the "owns a" or
association relationship; composition is more
specific than aggregation


Lifecycle responsibility: If the container is
destroyed, normally every instance that it contains
is destroyed as well


UML

Database Systems

Association
vs

Composition
vs

Aggregation:
Practical Perspective


Y
ou
can use straight Associations for 80%+ of the
relationships you're likely to want to model


Composition probably accounts for most of the
remaining scenarios


Dependency can be useful in some circumstances


You can get by quite happily without ever using
Aggregation


UML

Database Systems

Class Level Relationship: Generalization


The generalization relationship ("is a") indicates
that one class (the

subclass) is considered to be a
specialized form of the other class (the

super type)
and superclass is considered as ‘generalization'

of
subclass


This means that any instance of the subtype is also
an instance of the
superclass


UML

Database Systems

Example UML


UML