Lecture 2: Relational Model

basesprocketΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

74 εμφανίσεις

CS511 Advanced Database Management Systems

1



Lecture 2:

Relational Model



Aug. 30 2006

ChengXiang Zhai

Most slides are adapted from Kevin Chang’s lecture slides

CS511 Advanced Database Management Systems

2


Data Model


Mathematical formalism providing:


notation & structure for describing data


set of operations used to manipulate the data


A way of organizing a collection of facts pertaining to
a system being managed


A way of thinking about the world and the
phenomena


Fundamental “paradigm” of data management

CS511 Advanced Database Management Systems

3


The State of the World: Pre Rel. Model


Network/hierarchical DBMS, 1960’s


IDS network DBMS: Bachman at GE, 1961


IMS hierarchical DBMS: IBM in 1968


Still exists today!


CODASYL approach to data management, 1960’s


CODASYL: Conf. Of Data System Languages, set up by US DOD, to
standardize software applications


COBOL (comm. bus. oriented lang.) defined by CODASYL


ruled the business data processing world


incorporating prior data definition languages


DBTG (Database Task Group, under CODASYL), 1971


closely aligned with COBOL


DBTG Report would standardize network model


(Bachman got Turing award in 1973 for the network model)

CS511 Advanced Database Management Systems

4


Network Model: DBTG Report


Network DB:


a collection of
record
s


record = collection of fields


similar to an entity in E
-
R model


records connected by binary, many
-
to
-
one
link
s


similar to binary relationships in E
-
R model


simulate one
-
to
-
one, many
-
to
-
many by many
-
to
-
one

CS511 Advanced Database Management Systems

5


Network Model: Implementation


Student record linked to enrollment record






A lot of linkage pointers


ring
-
structured ptrs implements many
-
one links


Data manipulation is thus navigational

Johnson



CS001

A+

CS308

B

CS511 Advanced Database Management Systems

6


DBTG Query Example

SQL
: select name from student where dept = “CS”


DBTG
:

student.dept = “CS”;

find

any

student
using

dept;

while
DB
-
status

= 0 do

begin

get

student;

print (student.name);

find

duplicate

student
using

dept;

end

CS511 Advanced Database Management Systems

7


DBTG Query Example: Predicates

SQL
: select name from student where dept = “CS” and
grade <= “C”


DBTG:

student.dept = “CS”;

find

any

student
using

dept;

while
DB
-
status

= 0 do

begin

get

student;

if student.grade <= “C”



print (student.name);

find

duplicate

student
using

dept;

end

CS511 Advanced Database Management Systems

8


DBTG Query Example: Navigation

SQL
: select E.grade from student S, enrollment E


where S.name = “Johnson” and E.id = S.id


DBTG
:

student.name = “Johnson”;

find

any

student
using

name;

find first

enrollment
within

StudentEnroll

while
DB
-
status

= 0 do

begin

get

enrollment;

print (enrollment.grade);

find

next

enrollment
within
StudentEnroll;

end

CS511 Advanced Database Management Systems

9


What’s Wrong?


CS511 Advanced Database Management Systems

10


What’s Wrong?


Data dependence: Application program highly
depends on the data representation


As a result,


Programming is difficult and complex


Application can become incorrect once there’s a
change in data representation


Just like programming in assembly languages (as
opposed to high
-
level programming languages)



CS511 Advanced Database Management Systems

11


Data Dependence


Ordering dependence


Applications may rely on a particular ordering of the
stored data


Indexing dependence


Applications may rely on the availability of certain
indices, but indices are semantically redundant and
only necessary for “optimization”


Access path dependence


Applications would hard code access paths to data, so
would rely on the continued existence of the used
access paths

CS511 Advanced Database Management Systems

12


Codd’s Seminal Paper


Concepts of data independence and declarative queries


argues for more natural data
-
centric modeling


argues for declarative queries


Complete definition of relational model


data structure based on relations


algebra for manipulating data


Formal theoretic notions


expressive power, redundancy, and consistency


starting research on “database theories”

CS511 Advanced Database Management Systems

13


DBMS: Levels of Abstraction??


Physical implementation


storage structures, access methods


Logical data model


conceptual data structure and manipulation


Views


different portions of the database



?? Who should see each different level?

CS511 Advanced Database Management Systems

14


Relational Model Provide
Independence?


Ordering dependence?


Indexing dependence?


Access path dependence?

CS511 Advanced Database Management Systems

15


A Relational Model of Data


Relations


given sets S
1
, S
2
,

, S
n

(not distinct)


relation R is a subset of the Cartesian product S
1

x S
2

x


x S
n


S
j

is jth domain of R,
n

is degree of R


Relations as tables


each row represents an n
-
tuple of R


ordering of rows is immaterial


all rows are distinct


ordering of columns is significant


label each column with the name of the corresponding domain

CS511 Advanced Database Management Systems

16


Data Manipulation


Relational algebra


operations


Relational calculus


semantics in terms of logics


Essential Beauty of Relational Model:


Named Relations

Expressible Relations

CS511 Advanced Database Management Systems

17


Operations on Relations


Usual set operations:


since relations are sets of homogeneous tuples

CS511 Advanced Database Management Systems

18


Operations on Relations: Relation
Deriving


Permutation


interchange the columns of an n
-
ary relation


Projection


select columns and remove any duplication in the rows


Join


selectively combining tuples in two relations


as a

class


of new relations that losslessly take some columns
from either source relations
--

why such a def suffices for here?


Composition


join two relations and remove join columns


Restriction


filter one relation with another

CS511 Advanced Database Management Systems

19


Algebra: Questions


What’s missing in this set of operators?


Is it minimal?


How is it different from “current” algebra?

CS511 Advanced Database Management Systems

20


The Algebra Primitives: NOW


Selection


Projection


Set union


Set difference


Cartesian product


Renaming

CS511 Advanced Database Management Systems

21


Redundancy and Consistency


Redundancy


redundant if something can be “derived” from others


foundation: what operations allowed in “derivation”


Consistency


data snapshot must satisfy some constraints



Set the landscape for database theory research


e.g., normal forms; normalization

CS511 Advanced Database Management Systems

22


What’s Good about the Relational
Model?


Simplicity!


Mathematically complete data model


Declarative query languages


queries can be automatically compiled, executed, and
optimized without resorting to low
-
level programming

CS511 Advanced Database Management Systems

23


Unexpected Benefits


Client
-
server architecture


SQL request/response enables high
-
level, compact
exchange between clients and server


clients: input and output, application logics


server: data processing


Parallel processing: relations in and out


pipeline: piping the output of one op into the next


partition: N op
-
clones, each processes 1/N input


Graphical user interfaces


relations fits the spreadsheet (table) metaphor


CS511 Advanced Database Management Systems

24


The Rising of Relational Model


Codd’s paper in 1970


resistance even within IBM


First implementations, 1973


System R at IBM San Jose Lab


INGRES at UC.Berkeley


The “Great Debate” in 1975 SIGMOD conf.



Codd got Turing award in 1981


CS511 Advanced Database Management Systems

25


The Great Debate (1975, SIGMOD)


COBOL/CODASYL


Relational


too mathematical (to understand)



Relational


COBOL/CODASYL


too complicated (to program)

CS511 Advanced Database Management Systems

26


Relational Model/System Impact


Codd’s paper published in 1970


First implementations, 1973
--


System R at IBM San Jose Lab, 1974
-
1978


INGRES at UC.Berkeley, 1973
-
1977


System R influence:


IBM DB2


HP ALLBASE


Oracle: started from published spec. of System R


INGRES:


member later funded Sybase


evolved into Microsoft SQL server by buying code from Sybase

CS511 Advanced Database Management Systems

27


What Have Changed Over the Years?


Changed by implementations


First implementations (System R and INGRES) started
1973, 74


Changed over time

CS511 Advanced Database Management Systems

28


What Have Changed Over the Years?


Columns identified by position instead of names


ordering of columns was significant


names (as role.domain) were simply “user interface”


Rows were distinct
--

set semantics of relation


SQL: “select distinct” to eliminate duplicates

CS511 Advanced Database Management Systems

29


What Have Changed Over the Years?


Semantically defined non
-
algebraic join


Codd’s joins can be “plural”


joins are now defined “operationally”

CS511 Advanced Database Management Systems

30


What Have Changed Over the Years?


Non
-
simple domains: i.e. complex objects


allowed only built
-
in data types


new: object
-
relational DB, multimedia DB


Generations of relations: temporal aspect


temporal databases


e.g.: query GPA at the end of year 2000

CS511 Advanced Database Management Systems

31


Problems with the Relational Model


Too simple?


unable to handle beyond business data processing?


Data is often hierarchical/complex in nature


normalization is unnatural decomposition of data for
storage, to be assembled by joins at query time

CS511 Advanced Database Management Systems

32


Network/Hierarchical Model Coming
Back?


Web is a huge navigational database!


XML is both navigational and hierarchical


<student>


<name>John Smith</name>



<dept>CS</dept>



<enrollments>




<enrollment>





<course>CS311</course>


<grade>A+</grade>


</enrollment>




… … //
more enrollments



</enrollment>


<student>

CS511 Advanced Database Management Systems

33


All About Paradigm Shift


Bachman:

offline sequential access
--
> online navigation

This revolution in thinking is changing the programmer from a stationery
viewer of objects passing before him in core into mobile navigator who is
able to probe and traverse a database at will.




The Programmer as Navigator, 1973 Turing Lecture


Codd:

navigation over records & links
--
> declaration over relations

The most important motivation for this research work is to provide a sharp and
clear boundary between the logical and physical aspects of data
management. … We call this data independence objective.


Relational Databases: A Practical Foundation for Productivity,


1981 Turing Award Lecture

What will be the next paradigm shift?

CS511 Advanced Database Management Systems

34


What You Should Know


Relational model, relational algebra


What is the fundamental difference between the
relational model and the previous data models?


What are some major advantages of the relational
model?


What are some limitations of the relational model?

CS511 Advanced Database Management Systems

35


Carry Away Messages


Raise important research questions


See deficiencies in the current state of the world (data
dependency)


Propose a change to the world that would address
some of the deficiencies (declarative queries)


Leverage principled/mathematical tools (relational
algebra)