Databases

righteousgaggleData Management

Jan 31, 2013 (4 years and 6 months ago)

126 views

MD240

Data and Databases

Agenda


Hierarchy of Data


Representing the Real World as Data


Simple Data Management Approaches


Database Management Systems (DBMSs)


Database Design


Database Jobs


Data Hierarchy

Logical Structure of Data

Hierarchy of Data

Hierarchy of data for a computer
-
based file

Record

Field

Field

Byte

Byte

Bit

Bit

Record

Field

Field

Byte

Byte

Bit

Bit

File

Database

File

Logical Structure of Data

Hierarchy of Data


Logical Structure: how we as users view
information stored in databases


Character
: smallest unit of information


Field
: group of related characters


Record
: group of related fields


File
: group of related records


Database
:
group of logically associated files


Data Warehouse
: aggregation of databases from
across a whole organization


Data Mart
: subset of Data Warehouse … oriented
toward data needs of a functional area

Representing the Real World

as Data

Representing the Real World as Data

What Data Are Businesses Interested In?


Entity


a person, place, thing, or event on which we
maintain information


Attribute


characteristic or quality describing a particular
entity


corresponds to a field


Key Field


a file that uniquely identifies that record


an identifier field

Translating Real
-
World Objects
into Stuff Stored in Databases


Entity

HAS


Attributes

OF WHICH SOME ARE


Unique Identifiers


Record

HOLDS


Fields

ARE SORTED BY


Key Field

Real World Thing

Database Representation

Simple Data Management
Approaches

Simple Approaches to Managing
Data


Text File


text strings contain fields in a record


fields in each line of text are separated by “TAB” or “,”


ex: files created by Microsoft’s Notepad (*.txt)

Simple Approaches to Managing
Data


Spreadsheet


rows contain fields in a record

Why Not Use Simple Approaches?

Problems of Data in Text Files and Spreadsheets


Text Files and Spreadsheets do not pass the
“ACID Test”


ACID

test
-

for an adequate database system


Atomicity
:

all changes [intended to be made to your database]
should be made, or none should be made (i.e. partial transactions
should not be made!)


Consistency
:

when making a transaction, the database should
change from one valid state to another (i.e. it should never allow
illegal transactions, where illegal is defined by the user)


Isolation
:

transactions are invisible to other transactions


Durability
:

once completed, transactions should be permanent
and should be able to survive system and media failures (i.e.
physical storage problems)


Why Not Use Simple Approaches?

Data Problems Arising from Environment


Problems Arising From End
-
User Computing
Environment


Data Redundancy
:
the same piece of information could be
duplicated by end users in several files


Data Inconsistency
:
the actual values across various copies of the
data may no longer agree


Data Isolation
:
data files are organized differently by different end
-
users, stored in different formats, and often physically inaccessible to
other applications


Data Integrity
:
data values must often meet integrity constraints


Security
:
new applications may be added to the system on an ad hoc
basis, text files can be easily opened and read by anyone, leading to
security holes


Application/Data Independence
:
application should not have to
be developed with regard to how the data are stored (end
-
user should
be able to access data without knowing how database is storing it)

Better Data Management
Practices

Databases and Database
Management Systems (DBMSs)


Database


an organized logical grouping of related files


designed to eliminate
data redundancy
,
data isolation
, and
data
inconsistency



Database Management Systems (DBMSs)


a program that provides access to a database


permits an organization to centralize data, manage them efficiently,
and provide access to the stored data by application programs


should satisfy
ACID test
, and
eliminate security
,
data integrity
problems
,
application/data independence

Example of a Database


DEMO


MS Access example database: “Northwind Traders
Database” contains ...


Tables
: categories, customers, employees, etc.


Queries
: Invoices, Category Sales, Sales by Year, etc.


Data Entry Forms
: Categories, Customers, etc.


Report Generators
: Customer Mailing Labels, Sales by Year

Example of a Database

Specialized Types of Databases


Geographical Information Database


contains locational data for overlaying on maps or
images


data source for a Geographic Information System (GIS)


Knowledge Database


stores decision rules used to evaluate situations and helps
users make decisions like an expert


data source for a Knowledge Based System


Multimedia Database


stores data on many media


sounds, video, images,
graphic animation, and text

DBMS Components

DBMS Components


Three major components of a DBMS


Data Definition Language (DDL)


Data Manipulation Language (DML)


Data Dictionary

Data Definition Language (DDL)


Programming language used to specify content
and structure of database


ex:
SQL

can be used to generate a database


DBMS user defines the structure of a database
through the …


Database Schema


the logical description of the entire database and the listing of
all the data items and the relationships among them


Database Subschema


the specific set of data from the database that is required by
each application

Data Manipulation Language (DML)


DML provides users with the ability to
retrieve
,
sort
,
display
, and
delete

the contents
of the database


Languages used to manipulate data


ex:
Structured Query Language (SQL)


combines both DML and DDL features


offers the ability to perform complicated searches with
relatively simple statements


ex:
Query
-
by
-
example (QBE)


allows user to select a table and chooses the fields to be
included in the answer as the example she provided

Example of SQL


DDL:

create table mailing_list (


email varchar(100) not null primary key,


name varchar(100)

);


DML:

insert into mailing_list (name, email)


values (‘Greg Heim’, ‘heimgr@bc.edu’);

select * from mailing_list;

Data Dictionary


Data Dictionary


a file that stores


definitions of data elements


data characteristics such as usage, physical
representation, ownership, authorization, and
security

Types of Database Models


Hierarchical Database Model


Relates data by rigidly structuring data into an
inverted “tree”


Database records contain:


A single root or master field (Key)


identifies the type location, or ordering of the records


A variable number of subordinate fields


defines the rest of the data within a record


Found in old DBMSs

Network Database Model


Rigid relationships among data through a linked
-
list structure in which subordinated records can be
linked to more than one parent


“Pointers” are used to define the structure of the
database


link subordinates and parents


possible to create many
-
to
-
many relationships using the
network database model


Found in old DBMSs

Types of Database Models

Relational Database Model


Based on the simple concept of tables in order to capitalize
on characteristics of rows and columns of data, which is
consistent with real
-
world business situations


Theory of RDBMS
: relational algebra


Reality of RDBMS
:

“This all you need to know to be a Caveman
Database Programmer: A relational database is a big spreadsheet that
several people can update simultaneously.”

-

Philip Greenspun


Relations :
table


Record :
row of data


Field :
column of data

Relational Database Model

Ex: IBM DB2

Associative Database Model


Emerging database modeling approach


Developed in UK


Incorporated into Lazy Software’s
Sentences DBMS


Database is designed through specifying
sentences related to the data, such as

Employee

has a

Birthday


Associative Database Model

ex: Lazy Software’s Sentences DBMS

Other Database Models


Object
-
Oriented Model


Treats data records as “objects”


Benefit:

both program and data are “object
-
oriented”, meaning that
you use one consistent development philosophy


Object
-
Relational Model


Hybrid of Relational Model and O
-
O Model


Benefit:

performance benefits of both relational and object
approaches


Semi
-
Structured Model


Data and information about data (schema) are stored together


Ex: The WWW is a semi
-
structured database

Relative Importance of Database
Models

1.
Relational Databases (RDBMS)

2.
Object
-
Relational Databases (ORDBMS)

3.
Object Oriented Database (OODBMS)

4.
???

5.
???

6.


Most

Users

Few

Users

Designing Databases:

Conceptual Model vs. Physical
Implementation

Creating Databases


To create a database, designers must develop both a



Conceptual Design


Abstract model of the database from the user or business
perspective


Specified by database programmers and end
-
user needs


Describes how the data elements in the database are to be grouped



Physical Design


Shows how the database is actually arranged on direct access
storage devices


Managed by Database Administrators (DBAs)

Physical vs. Logical Data View


Logical View


represents data in a format that is meaningful to a user
and to the software programs that process that data


end
-
user programs need to know this in order to
retrieve and save data



Physical View


deals with the actual, physical arrangement and location
of data in the direct access storage devices (DASD)


end
-
users shouldn’t have to know this

Physical vs. Logical Data View
Hiding the Physical View from the User


Technologies for hiding the physical view


Open Database Connectivity (ODBC)


Java Database Connectivity (JDBC)


Data Access Objects (DAO)


ActiveX Data Objects (ADO)


Advantages


Hides the structure of the database from the end
-
user


Allows end
-
user programs to access data via a single, consistent
interface to all databases


End user (program) only needs to know …


Name and location of database


Logical view of database (what data is in the database)


How to query data using SQL

Physical vs. Logical Data View
Hiding the Physical View from the User

ex: Windows 2000’s ODBC Administration

Conceptual Design

Conceptual Design


General Approach


Decide
which data are relevant

to your business
problem


Use a
visual modeling tool

to lay out the
structure of the data


Use database
normalization

methods to
eliminate data redundancy as best as possible


Build and deploy actual database

A Common Modeling Approach:

Entity
-
Relationship Modeling


Entity

is something that can be identified in the users’ work
environment


Instance of an entity

is the representation of a particular entity


Attributes

describe the entity’s characteristics


Identifiers

are attributes that identify entity instances


Relationships

can include many entities

Entity
-
Relationship Diagram

Table

Class

Student

Professor

Name

ID Number

Major

Department

Class Taught

Name

Attribute

Entity

Relationship

1:N

One to many

Entity
-
Relationship Diagram

Basis for Database Contents


Based on the E
-
R Diagram, the database would contain …


Student data


Student ID Number


unique identifier


Name


Major


possibly other fields


Professor data


Faculty ID Number


unique identifier


Name


Department


Class Taught


possibly other fields


Relation
: ID Numbers provide a means for linking students and
professors so the “one student can have many professors”

Entity Relationship Diagram

Entity Relationship Diagram

ChilliSource Database Design Studio

Data Structure Diagram

ChilliSource Database Design Studio

Data Definition Language

ChilliSource Database Design Studio

Normalizing Relational Databases


Issue:



initial database design may contain redundant data (BAD)


Objective:


we want to eliminate redundant data elements


want to eliminate functional dependencies


a relationship between or among attributes, where, given the value
of one attribute, we can obtain the value of another attribute


Method:


break big, redundant tables into sets of small, stable tables


Several levels of normalization


1NF (first normal form), 2NF, etc.

Normalizing Relational Databases


Microsoft Access

Physical Database
Organization

Physical Database Organization

Physical Layout


Physical Layout in a Single Computer/Location


RAID (
R
edundant
A
rrays of
I
nexpensive
D
isks
) storage


database partitioned and backed up across multiple
physical hard drives


if one drive fails, you pull it out, throw it away, and
plug in a new one (without losing any data)


several levels of RAID storage (RAID 0, RAID 1,
RAID 5, etc.)

Physical Database Organization
Data Warehouses


Aggregate’s together organization’s data


Designed to support DSS, EIS, online analytical process
(OLAP), and other end
-
user activities.


Can provide an “executive view” of data and a unified
corporate picture to the end
-
users by combining the data
from many operational systems and incompatible databases
without affecting the performance of the running
operational systems


“Data Mart”: subset of data warehouse, usually focused on
functional needs

Data Warehouse

Physical Database Organization

Database Topology


Centralized Database

has all the related files in one
physical location


Distributed Database

has complete copies of a database,
or portions of a database, in more than one locations


Replicated Database

has complete copies of the entire
database in several locations


Partitioned Database

is subdivided, so that each location
has a portion of the entire database

Distributed Database

Get a Job, Get a Job ...

Database Jobs

Which Database to Learn?

National job openings for database technology

(Source: SkillMarket
-

http://www.mshiltonj.com/sm/)

Database Jobs

Which Database to Learn?


Very Impressive


Used for Enterprise Applications


Oracle 8i/9i (+ PL/SQL query language)


MS SQL Server 7 (+ Transact
-
SQL query language)


IBM DB2


MySQL


PostgreSQL


Less Impressive


Mainly Desktop Use


MS Access


Filemaker Pro


Adabas (in Sun StarOffice)

Database Jobs

Employment in Database Management


Database Design


Database Implementation


Database Administrators (“DBA”)


responsible for ensuring that the database fulfills that
user’s business needs, in terms of functionality as well
for the data itself


huge responsibility … great pay


24/7 type of job … responsible when database gets
corrupted or system crashes

Database Jobs

Employment in Database Management


Types of DBA


Systems DBA/Operational DBA


Monitors all of the instances and servers and makes structural changes
to the databases themselves, by adding users, tablespaces, etc.


Architectural DBA


Works closely with an application development team to provide an
optimal database solution


Application DBA


Codes in SQL or PL/SQL (plus Java, C, HTML, etc.) to provide end
-
user solutions or foundations for application development teams


Data DBA


Specializes in data integrity and database cleansing issues, especially
within a data warehouse or data mart environment


(Source:
Oracle DBA 101
, Theriault et al., 2000)

Database Jobs

Employment in Database Management


Do you have what it takes to be a DBA?


Are you able to function at top efficiency with little or no sleep


often for
days at a time?


Do you like getting phone calls in the middle of the night that aren’t
wrong numbers?


Do you enjoy being the “invisible” person in your organization, that is,
until something goes wrong?


Do you enjoy life under a microscope?


Do you enjoy long lines of people outside your door?


Do you enjoy spending hours on the telephone with technical support?


Do you love solving puzzles that make no sense?


Is your favorite meal Twinkies and Coca
-
Cola?


Do you love being on a
perpetual steep learning curve
?


Do you enjoy the
challenges of moving between computer platforms
?


Do you love
teaching yourself about computer applications

(e.g., TCL,
PL/SQL, Java, etc.?


(Source:
Oracle DBA 101
, Theriault et al., 2000)

Jobs in Database Management

Web
-
DBA

Compensation (2000)


Source:
arsdigita.com

(Cambridge)


limited programming experience, non
-
CS major: $70,000/year


CS major, 1 year experience: $100,000


3
-
5 years programming experience, skills in project management:
$115,000
-
$125,000


Source:
computerworld.com, 14th Annual Salary
Survey

(September 2000)


Entry Level Database Analyst: $65,000 on average, range:
$59,000
-
$87,000



Free training:


http://www.arsdigita.com/boot
-
camp/

Jobs in Database Management

Enterprise DBAs (2001
-
2002)


Skills


Oracle Certified Professional, Oracle Certified Master


Microsoft Certified Database Administrator


Working knowledge of SQL language and data models


Critical to have experience working with business users


Responsibilities


Enterprise Resource Planning (ERP) databases


Enterprise E
-
Commerce Systems and associated databases


Salary


Average starting salary: $83,000
-
$114,000


Maximum junior DBA salary: $150,000


Maximum senior DBA salary: $200,000


(Moad, J.,
eWeek
, “Following the Data to a DBA Job,” Dec. 13, 2001, p. 32)