DBMS Concepts

cuttlefishblueData Management

Dec 16, 2012 (4 years and 5 months ago)

136 views



Introduction

Database Management System

[Topics

DBMS & Database

Data Modals

RDBMS & SQL

Relationships

Codd’s Rules]


What is a Database

Database
: a collection of organized used by the applications/systems of an organization.

A database consists of:

Data

Re
lationships between the data

Constraints on the data


or restrictions e.g. Quantity must be greater than 0

A schema


describes all the objects in the database (e.g. a table has columns, a column has a data
-
type and a size)


-

so can say that a database i
s a self
-
describing collection of data elements, and relationships
between those elements.



What is a DMBS?

A
DBMS

(
D
ata
b
ase
M
anagement
S
ystem) is a collection of interrelated data and a set of programs
to access that data.

A DBMS provides an environment

that allows storage and retrieval of data & information, and
provides ways of carrying out database administration tasks.


Functions provided by a DBMS (parentheses indicate what we will cover in this course that will
address this issue):


Storage & retri
eval



can be done independent of internal structures of the db

Catalog



describes all the data items stored in the db, which are accessible to users


includes data
definitions e.g. for a column, what is the data
-
type and what is the size

Shared update



to support concurrency i.e. when more than 1 user are updating the database at the
same time

Recovery



if the db is damaged, need to be able to restore a working copy. A DBMS provides
backup and restore functions. It is usually possible to schedule a b
ackup to occur on a regular basis
e.g. every night or every 4 hours

Security



access restricted to authorised users; users assigned permissions to carry out certain
actions (e.g. to update or delete data); usually password
-
protected access. Data can also

be
encrypted for further protection

Integrity



mechanisms to ensure data integrity and referential integrity. Data types, formats, check
constraints and key constraints all used for this

Data independence



the manipulation of the data is independent o
f where the data is physically
stored


in other words, data manipulation works with logical view of the data and the process that
is manipulating the data does not need to know
where

or
how

the data is stored

Utility services



provides ways to import &
export data, query the data etc


Database Models

You looked at a number of
data models

in the DBMS course


a data model defines a
structure

for storing and manipulating data in a database.



The relational model is the one most in use today.


1.1

Relational M
odel

A collection of relations


a relation is a table; a table has attributes (columns) and rows (records).
A table corresponds to an entity in the application e.g. a Department, an Employee. Each row in the
table represents an instance of the entity.

Eac
h row is uniquely identified by one or more of the attribute values in it


this is called the
primary key.

Each column in a table has a unique name.

A column has a defined data type e.g. integer, character

Relationships link rows in different tables, base
d on key values e.g. a Department has Employees.

Relationships can be 1
-
to
-
1, 1
-
to
-
many or many
-
to
-
many.

SQL provides way of manipulating tables, using the relationships between them, and of storing and
retrieving data from the database


Example:

Tables

De
partment

(Department Number, Name, Manager)

Employee

(Employee Number, Name, Salary, Department Number)

Relationships

Department has many employees

Department has 1 Manager


The Relational Model was preceded by these models:


1.2

Network Model

Records organis
ed into tables

Relationships modelled by linking records together in chains e.g. Department
-
> Employee1
-
>
Employee2
-
> Employee3

Data is stored in files that contain records


1.3

Hierarchical Model

A tree structure

Data is stored in files that contain record
s


Department



Manager





Employee 1

Employee 2

Employee 3



Another type of model that has emerged since the Relational Model:


1.4

Object
-
oriented Model

Same concepts as OO programming


the database is seen as a collection of objects; objects that
have

similar properties are grouped into a class

So, an entity is represented as a class



A class has attributes and behaviour

The value of an attribute can be an object that is an instance of another class in the model

An object is an instance of class

Subclas
ses inherits attributes from the parent class


Example:

Department class



has attributes Department Number, Name, Manager

Employee class



has attributes Employee Number, Name, Salary

An instance of the Department class has one or more Employee members.

Manager is a sub
-
class of Employee


it inherits the attributes of an Employee.


RDBMS

In this course, we will be working with databases that follow the relational model, and we will be
using one or more DBMS systems to work with those databases.

The
RDBMS

(
R
elational
D
ata
b
ase
M
anagement
S
ystem) is now a core part of many information
systems, particularly e
-
Commerce/internet applications and client
-
server systems.

We will also be learning how to use SQL (Structured Query Language) to store, manipulate and
r
etrieve data in an RDBMS.


This section looks at some of the aspects of relational databases.


1.5

SQL


Standards

Any DBMS should have a language for defining the structures in the database


this is called the
DDL

(Data Definition Language). This is used by
the database designer/administrator to build the
database.

It should also have a language for manipulating the data in the database


this is called the
DML

(Data Manipulation Language). This language is used by users of the database to insert, retrieve,
m
odify and delete data in the database.


In a relational DBMS, the DDL and DML are unified into SQL (Structured Query Language). SQL
is the generic language for querying relational databases.


The standards for SQL are defined by
ANSI

(American National St
andards Institute) and are also
supported by the ISO (International Standards Organization).

The current standard is called
SQL
-
92

(as it was updated in 1992). There is also a newer set of
standards called
SQL
-
99
, but this has not yet been implemented in m
any DBMS packages.


An RDBMS package should conform to the SQL
-
92 standards; future releases will conform to the
SQL
-
99 standards.


We will be making use of Microsoft SQL Server 2000, which is SQL
-
92 compliant.


Other RDBMS packages available are Oracle (
current version is 9i), MySQL (an open source
package) and PostgreSQL.


Some makers of RDBMS packages implement their own additions to SQL, to support their own
product and its features. Some are as follows:


Microsoft SQL Server has T
-
SQL


Transact
-
SQL

O
racle

has
PL/SQL



Procedural Language/SQL

PostgreSQL

has
PL/pgSQL



Procedural Language/postgreSQL





1.6

Sets & Rows

In the relational data model, SQL programs operate on logical
sets

of data.


This is different to other database manipulation languages (e.g.

Xbase or VB)


which use a
row
processing
or
procedural programming
style.

In these DMLs, the programmer must tell the program exactly what to do with the data, on
record/row at a time:

Program loops through list of records

Each record processed one at a

time, or logic applied to each record, one at a time.


SQL uses
set processing



in a SQL query, data is always selected FROM a table.

The data is selected from a
set

called a table.



The programmer just has to tell the database
what

is required; not
how

to process each
individual row in the set.

This is sometimes called

declarative programming



because the programmer declares what data is
necessary


e.g. “Give me all employees in the Computer Science Department”.


Criteria can be specified to narrow t
he data down to a more specific set e.g. “Give me all employees
in the Computer Science Department who are not eligible for incentive.”



1.7

Atomicity

In the relational model, data is shown logically as a two
-
dimensional table that describes a single
entity e
.g. a Department.


Data is displayed in
columns

and
rows
.


A column describes a specific
attribute

of the entity e.g. the name of a Department.

A row describes a specific entity (or instance of the entity) e.g. the Computer Science Department.


The specifi
c values of each attribute should contain one, and only one, value. This is known as
atomicity

If the table is constructed in such a way that the intersection of a row and column can contain more
than one value, then the model needs to be changed so that t
he columns can have only one value.

This occurs when there is a many
-
to
-
many relationship between two tables.


1.8

Domain/Datatypes

The values in a column also have rules of behaviour


the values in a given column must share a
common
domain



also called
dat
atype
.

For example, the DepartmentNumber field should only contain whole numbers (integers), no alpha
-
characters.

Therefore, this field would be defined as having an
integer

datatype.



The values in a column can also be controlled by applying
rules



for
example, a SQL rule could
specify that the DepartmentNumber be greater than 0 and less than 1000.


Relationships between Relational Tables


Relationships link the rows in different tables, based on key values.

Consider some of the tables in the
Pubs

databa
se, as shown in
Figure
1

Let us look at some of these
relationships. In this diagram (created using the SQL Server diagram wizard), the key symbols
indicate a ‘1’ side of a relationship.






Figure
1



som
e of the tables in the Pubs database, showing the relationships between them


5.1 1
-
to
-
many

In a 1
-
to
-
many relationship between table A and table B, a row in table A can have many matching
rows in table B. A row in table B can have only one matching row i
n table A.


In the diagram above, there is a 1
-
to
-
many relationship between
Publishers

and
Titles
. Each
Publisher produces many Titles, but a Title can come from only one Publisher. The relationship is
between the primary key of
publishers.pub_id

and the f
oreign key
titles.pub_id
.

A 1
-
to
-
many relationship is created when only one of the related columns is a primary key or has
unique values.


This type of relationship is most commonly used in relational databases.



1.2

Many
-
to
-
many

In a many
-
to
-
many relationshi
p between table A and table B, a row in table A can have many
matching rows in table B and vice verse.

In the pubs example, a Title can have many Authors. An Author can also write many Titles. So this
is a many
-
to
-
many relationship.

This type of relationsh
ip is modelled by introducing a third table between the related tables. The
primary keys from the related tables become foreign keys in the new table.

We then have a 1
-
to
-
many relationship between each of the related tables and the third table. The
third
table is called a
junction

or
intersection

table.

In this example, the junction table is
titleauthor
. It forms a link between the
titles

and
authors

tables.

The primary key of the junction table is the combination of both the foreign keys


au_id

and
title
_id

in this case.




1.3

1
-
to
-
1

In a 1
-
to
-
1 relationship between table A and table B, a row in table A can have no more than one
matching row in table B and vice verse.

In the above tables, there is a 1
-
to
-
1 relationship between
Publishers.pub_id

and
pub_info.p
ub_id



because a Publisher has one set of publisher information (pub_info). Conversely, a pub
-
info record
is for only one corresponding Publisher.

A 1
-
to
-
1 relationship is created if the column values are unique in each table


which means they
are primar
y keys or have a uniqueness constraint on them.


Consider this relationship


it would also be possible to store the
logo

and
pr_info

fields in the
publishers table.

Because this is often the case, 1
-
to
-
1 relationships are not used very frequently.

Howeve
r, they can be used in the following scenarios:

Where the data applies to only some of the records in one of the tables e.g. if only some Publishers
have a logo and associated pr_info.

Where it may be necessary to be able to quickly and easily delete the e
xtra data about the entity in
one of the tables e.g. the
pr_info

table could be deleted if the pr_info data was no longer required.
This is easier than deleting columns from the Publishers table.


Codd’s Rules for a Truly Relational Database System


E.F. C
odd was one of the originators of the Relational Model.

He developed Twelve Principles of Relational Databases. If a DBMS package does not meet all of
these criteria, then it is not fully relational.

These rules do not apply to applications development, b
ut they do determine if the database engine
is truly
relational
.

A good understanding of these principles helps a database programmer in the proper development
and design of relational databases (RBDs).


1.

Information is represented logically in tables.

2.

Dat
a must be logically accessible by table, primary key, and column.

3.

Null values must be uniformly treated as ‘missing information’, not as empty strings,
blanks or zeroes.

4.

Metadata (data about the database) must be stored in the database just as regular data

is.

5.

A single language must be able to define data, views, integrity constraints, authorization,
transactions and data manipulation (DDL


Data Definition Language, DML


Data
Manipulation Language).

6.

Views must show the updates of their base tables and vic
e versa.

7.

A single operation must be able to retrieve, insert, update or delete data.

8.

Batch and end
-
user operations are logically separate from physical storage and access
methods. Batch and end
-
user operations can change the database schema without having
to recreate it or the applications built upon it.

9.

Integrity constraints must be available and stored in the RDB metadata, not in an
application program.

10.

The DML of the relational system should not care where or how the physical data is
distributed and shou
ld not require alteration if the physical data is centralized or
distributed.

11.

Any row processing done in the system must obey the same integrity rules and
constraints that set
-
processing operations do.