slides

voltaireblingΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

70 εμφανίσεις

DATABASE

SYSTEMS

Lecture 21

Dr. Adam P. Anthony

Overview


Database Fundamentals


System components


Schemas


Database Models


The Relational Model


Survey of Database Products


Data Mining


Social Issues

What is a Database?


Non
-
Technical: A computer program that stores
data


Examples of Databases using this definition?


Technical: A computer program that stores
structured
data
in an organized fashion, and provides the
ability to grant
limited, or restricted access
to
different
users

based on their
data requirements.


Examples of Databases using this definition?


Anything ruled out that we came up with above?

Database Systems


Initially: each department
maintained their own
“database”


Sometimes just a text or
Excel file


Problems with duplication?


Problems with sharing
information?


Now: One database holds
everything, and each
department is given access
only to the data they need


Schemas


Schema


full description of all the database,
including what information is provided and how it
interacts


Used to maintain and program the database


Provides a nice overview of what data a business collects
and (perhaps) how they use that data


Subschema


Partial description of only a few relevant
items


Anything in a Subschema will also appear in the database’s
Schema


Based on providing only what the user needs to do their job


Provides basic security: payroll record example

The Database Management System


Software applications can be enhanced by working with a
database


Especially on the web! (Google, Amazon, etc.)


Database Management System (DBMS): has all the
necessary code for organizing, manipulating data


When application needs data, it just asks the DBMS to provide it

More on Database Management
Systems


Database Management System

(DBMS): A software
layer that manipulates a database in response to requests
from applications


Schemas and
subschemas

are created, managed by the
DBMS


Can have a different subschema for each application that
accesses the DBMS


Distributed Database:

A database stored on multiple
machines


Data is shared between each machine over a network


DBMS will mask this organizational detail from its users


Data independence:

The ability to change the
organization of a database without changing the
application software that uses it

Database Models


Bits and bytes are hard to talk about!


Want to reflect the organization of our data without
thinking about the complex programming techniques
needed to keep our data organized


A
Database Model
provides a conceptual view of the
database that gives us a ‘general idea’ of how the
data is organized


Flat File Model (not really a DB)


Hierarchical Model


Object
-
Oriented Model


Relational Model

The Relational Database Model


Pretty Simple, on the surface


A
Relation
is just a rectangular table of data


An
Attribute

is any
column

in that table


A
Tuple

is any
row

in the table


There is complexity not in the table, but in how two
different tables can
interact


If two
relations

have a common
Attribute
, then we can
combine
tuples

from the two tables

to get a more
complex piece of data!

9
-
10

A
relation containing employee
information

Designing a Good Relational
Database


When we set up the DBMS, we could just put all our
data in one relation, like a big Excel file!


But that’s no fun!


And, it can cause problems


Redundancy:
When the same data is repeated
unnecessarily


Deletion Anomalies:

When deleting a
tuple

results in
the loss of important information


We can prevent some of these problems using a multi
-
relation design!

9
-
12

A
relation containing redundancy

9
-
13

An
employee database consisting
of three relations

The Employee Database Schema

Employee

Empl

ID

Name

Address

SSN

Job

Job ID

Job Title

Skill Code

Dept

Assignment

Empl

ID

Job ID

Start Date

Term Date

Combining Data From Multiple Tables


In which departments has G. Jerry Smith worked?

Designing a Relational Database


Given a data problem:


What are the entities/objects in this problem?


How do those entities interact?


Make one relational table for each entity


Make one relational table for each interaction

Identifying/Avoiding Redundancy


Sometimes it’s obvious in the design


Two columns with same/similar names


Sometimes you need to focus on how the data will be
used


Common perpetrators:



Addresses

more than one person can live at same
address


Organizations

more than one person belongs to a single
organization


Owners

a person/company can own more than one asset


Creators

a creator may have many creations

In
-
Class Example (from book)


Chapter Review, question 18 (similar to assigned
problem 20)


Design a relational database containing information
about music performers, their recordings, and the
composers of the music they recorded (Avoid
redundancies similar to those in Figure 9.4)

SQL (Structured

Query Language)


Developed at IBM as a part of their Relational DBMS
project


Basic language humans can use to express, with
precision, the data that they want from each (separate)
relational table, and how it should be combined:

select
EmplId
, Dept

from ASSIGNMENT, JOB

where
ASSIGNMENT.JobId

=
JOB.JobId


and
ASSIGNMENT.TermData

=

*


Survey of Database Products


MySQL


Microsoft Access


Microsoft SQLServer


Oracle DB


MySQL


Free, Open
-
Source database product


Little support from creators, minimal software for managing the system


Must know a great deal about database design fundamentals in order to
use successfully on a large scale


Can be done, but requires expert knowledge


Good Choice for:


temporary data solutions


Zero
-
budget scenarios (but only if you have expert users to run it!)


Hiring people to run the system costs money!


Internet applications


Facebook
??


Poor Choice for:


Non
-
Expert users


MASSIVE data problems (but
Facebook
??)

Microsoft Access


Cheap, business
-
oriented database product


Emphasis on user
-
friendliness, ease of development


Packaged with ‘data templates’ that might automatically fit your data needs, no
design required!


very nice application
-
development features that are popular in the business
world


Limits on amount of data that can be stored in a single database


Data may be as complex as on any other DBMS, just limited in size


Good choice for:


Small business settings


Non
-
Expert Users


Temporary data solutions


Poor choice for:


Large, enterprise
-
level data solutions


Sharing data with a large number of people

Microsoft SQLServer and Oracle DB


Large, expensive enterprise
-
level DBMS’s


Distributed DBMS support


Provides both expert
-
level and non
-
technical tools
for interaction with DBMS


SQLServer more
-
so
than Oracle


Good For:


Everything (if you can afford them!)


Large jobs


Bad for:



Low
-
budget scenarios (except for SQLServer Express!)


Data Mining


Data Mining = Buzz word that covers any action that seeks to detect
a previously unknown pattern from a large data set


Encompasses many areas of Computer Science, Mathematics, Statistics


Class Description:


Divide data set into pre
-
determined groups (e.g., Mexican food lovers
and Italian food lovers)


Characterize each group by identifying common patterns


“What type of person enjoys Mexican food?”


Class Discrimination:


Start with an initial data set, generate a class description for each
group


Given a new (incomplete) item, predict which group it will fall into:


“Will this new customer prefer Mexican or Italian food?”


Data Mining, Continued


Cluster Analysis:


Similar to Class Description but now the groups aren’t
predetermined!


Must find the most distinct groups AND describe them at the
same time!


Much more difficult


Association Analysis


Find patters
between
groups!


Big in market research


“Are Mexican food lovers also lovers of imported beers?”


Social Impact of Databases


Problems


Massive amounts of personal data are being collected


Often without knowledge or meaningful consent of affected people


Data merging produces new, more invasive information


Errors are widely disseminated and hard to correct


Remedies


Existing legal remedies often difficult to apply


Negative publicity may be more effective