C20.0046 - NYU Stern School of Business

righteousgaggleData Management

Jan 31, 2013 (4 years and 9 months ago)

173 views

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

1

C20.0046: Database
Management Systems

Lecture #1

M.P. Johnson

Stern School of Business, NYU

Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

2

What Is a Database?


A large, integrated collection of data


which models a real
-
world
enterprise:



Entities


students, courses, instructors, TAs



Relationships


Hillary is currently taking C20.0046


Barack is currently teaching C20.0046


John is currently TA
-
ing C20.0046 but
took

it last semester



A
Database Management System (DBMS)

is a
software package that stores and manages DBs

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

3

Databases are everywhere: non
-
web


criminal/terrorist: TIA



NYPD’s CompStat


Tracking crime stats by precinct



airline bookings



Retailers: Wal
-
Mart, etc.


when to re
-
order, purchase patterns, data
-
mining



Genomics

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

4

Databases are everywhere: web


retail: Amazon, etc.


data
-
mining: Page You Made



search engines



searchable DBs: IMDB, tvguide.com, etc.



Web2.0 sites:


flickr = images + tags



CMS systems (Wikis, blog & forum software, etc.)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

5

Databases involved in ordering a pizza?

1.
Pizza Hut’s DB


2.
Credit card records


3.
CC


approval by
credit agencies


4.
phone company’s records


(“Pull his LUDs, Lenny.”)


5.
Caller ID


Error
-
checking, anticrime


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

6

Your wallet is full of DB
records



Driver’s license


Credit cards


NYUCard


Medical insurance card


Social security card


Money (serial numbers)


Photos (ids on back)


Etc…

“You may not be interested in
databases, but databases are
interested in you.”
-

Trotsky

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

7

Example of a Traditional DB App


Suppose we build a system



We store:


checking accounts


savings accounts


account holders


state of each of each person’s accounts

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

8

Can we do without a DBMS?

Sure! Start by storing the data in files:


checking.txt savings.txt customers.txt




Now write C or Java programs to implement
specific tasks…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

9

Doing it without a DBMS...


Transfer $100 from George’s savings to
checking:

Read savings.txt

Find&update the line w/“George”


balance
-
= 100

Write savings.txt

Read checking.txt

Find&update the line w/“George”


balance += 100

Write checking.txt

Write a C program to do the following:

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

10

Problems without an DBMS...

1.

System crashes:






Same problem
even if reordered


High
-
volume


(Rare


frequent)


2.

Simultaneous access by many users


George and Dick visit ATMs at same time


Lock checking.txt before each use

what is the problem?


Read savings.txt

Find&update the line w/ “George.”

Write savings.txt

Read checking.txt

Find&update the line w/ “George”

Write checking.txt

CRASH!

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

11

Problems without a DBMS...

3.

Large data sets (100s of GBs, or TBs, …)



No indices


Finding “George” in huge flatfile is expensive



Modifications intractable without better data
structures


“George”


“Georgie” is very expensive


Deletions are very expensive

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

12

Problems without an DBMS...

5.

Security?


File system may lack security features


File system security may be coarse


6.

Application programming interface (API)?


Interfaces, interoperability


7.

How to query the data?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

13

In homebrew system, must support


failover/rovery


concurrent use


deal with large datasets?


security


interop?


querying in what?





DBMS as application


Q: How does a DBMS solve these problems?


A: See third part of course, but for now…


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

14

One big issue: Transaction processing


Grouping of several queries (or other DB operations)
into one
transaction



ACID test

properties


Atomicity


all or nothing


Consistency


constraints on relationships


Isolation


concurrency control


simulated solipsism


Durability


Crash recovery


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

15

Atomicity & Durability


Avoiding inconsistent state


A DBMS prevents this outcome


xacts are
all or nothing



One simple idea: log progress of
and plans
for

each xact


Durability: changes stay made (with log…)


Atomicity: entire xact is
committed

at once

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

16

Isolation


Many users


concurrent execution


Disk access is slow (compared to CPU)




don’t waste CPU


keep running



Interweaving actions of different user programs




but can
lead to inconsistency:


e.g., two programs simultaneously withdraw from the same
account



For each user, should look like a single
-
user system


Simulated solipsism

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

17

Isolation


Contrast with a file in two Notepads


Strategy:
ignore multiple users


whichever saves last wins


first save is overwritten



Contrast with a file in two Words


Strategy:
blunt isolation


One can edit


To the other it’s read
-
only

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

18

Consistency


Each xact (on a consistent DB) must leave it
in a consistent state



can define
integrity constraints



checks that the defined claims about the data



Only xacts obeying them are allowed


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

19

A level up: data models


Any DBMS uses a data model: collection of
concepts for describing data



Relational data model: basically universal


Oracle, DB2, SQLServer, other SQL DBMSs


Relations
: table of rows & columns


a rel’s schema defines its fields



Though some have OO extensions…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

20

Data Schemas


Schema: description of partic set of data,
using some data model



“Physical schema”


Physical files on disk


Schema


Set of relations/tables, with structure


Views

(“external schema”)


Virtual tables generated for user types

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

21

Schema e.g.: college registrar


Schema:


Students
(ssn: string, name: string, login: string, age: int,
gpa: real)


Courses
(cid: string, cname: string, credits: int)


Enrolled
(sid:string, cid:string, grade: string)



Physical schema:


Relations stored as unordered text files.


Indices on first column of each rel



Views:


My_courses
(cname: string, grade: string, credits: int)


Course_info
(ssn: string, name: string, status: string)


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

22

How the programmer sees the DBMS


Start with SQL DDL to
create tables
:






Continue with SQL to
populate tables:

CREATE TABLE Students (


Name CHAR(30)


SSN CHAR(9) PRIMARY KEY NOT NULL,


Category CHAR(20)

);

INSERT INTO Students

VALUES('Hillary', '123456789', 'undergraduate');

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

23

How the programmer sees the DBMS








Ultimately files, but complex

SSN

Name

Category

123
-
45
-
6789

Hillary

undergrad

234
-
56
-
7890

Barak

grad








Students:

CID

CName

C20.0046

Databases

C20.0056

Advanced Software



Courses:

SSN

CID

semester

123
-
45
-
6789

C20.0046

Spring,
2004

123
-
45
-
6789

C20.0056

Spring,
2004

234
-
56
-
7890

C20.0046

Fall, 2003







Takes:

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

24

Querying:
S
tructured
Q
uery
L
anguage


Find all the students who have taken C20.0046:




Find all the students who C20.0046
previously
:





Find the students’
names
:




SELECT SSN FROM Takes

WHERE CID='C20.0046';

SELECT SSN FROM Takes

WHERE CID='C20.0046' AND
Semester='Fall, 2005';

SELECT Name FROM Students, Takes

WHERE Students.SSN=Takes.SSN AND

CID='C20.0046' AND Semester='Fall, 2005';

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

25

Database Industry


Relational databases are based on
set theory



Commercial DBMSs: Oracle, IBM’s DB2,
Microsoft’s SQL Server, etc.


Opensource: MySQL, PostgreSQL, etc.



DBAs manage these


Programmers write apps (CRUD, etc.)



XML (“semi
-
structured data”) also important


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

26

The Study of DBMS


Primary aspects:


Data modeling


SQL


DB programming


DBMS implementation



This course covers all four (tho less of #4)



Also will look at some more advanced areas


XML, websearch, column
-
oriented DBs, RAID,
RegExs, MapReduce…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

27

Course outline


Database design:


Entity/Relationship models


Modeling constraints



The relational model:


Relational algebra


Transforming E/R models to relational schemas



SQL


DDL & query language


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

28

Course outline


Programming for databases


Some DB implementation


Indexes, sorting, xacts


Advanced topics…



May change as course progresses


partly in response to audience



Also “current events”


Slashdot/whatever, Database Blog, etc.

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

29

Textbook:
Database Management Systems


by Raghu Ramakrishnan &
Johannes Gehrke


3 edition (August 14, 2002)


Available:


NYU bookstore


Amazon/BN (may be cheaper)


Amazon.co.uk (may be cheaper
still)


Links on class page


Difficult but good

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

30

SQL Readings


Many SQL references available online



Good online (free) SQL tutorials include:



A Gentle Introduction to SQL (
http://sqlzoo.net/
)



SQL for Web Nerds
(
http://philip.greenspun.com/sql/
)


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

31

Communications


M. P. Johnson


mjohnson@stern


Office hours: after class



To receive class mail:


Activate account:
http://start.stern.nyu.edu


Forward mail:
http://simon.stern.nyu.edu



M.P. Johnson, DBMS, Stern/NYU, Spring 2008

32

Communications


Web page:
http://pages.stern.nyu.edu/~mjohnson/dbms/


syllabus


course policies


antecedent courses



Blackboard web site


Some materials will be available here


Discussion board


send general
-
interest messages here to benefit all


Go to
http://sternclasses.nyu.edu


Click on C20.0046


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

33

Grading


Prerequisites:


Light programming experience


A bit of “mathematical maturity”


Interest in IT/CS


Requirements & base score:


Homework 15%: O(3)


Project: 30%
-

see below…


Midterm (closed book/notes): 20%


Final (closed book/notes; likely
2hrs in class
): 25%


Class participation/pop
-
quizzes: 10%


Stern Curve


Consistent class attendance is required


Absences will seriously affect your total grade.


Final score = base score


2
n
-
1


where n = # missed quizzes (if n>0)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

34

The Project: design end
-
to
-
end DB web app


data model


Identify entities & their relationships




relations



creation of DB in Oracle/MySQL


Population with real(alistic) data



web app for accessing/modifying data


Identification of “interesting” questions & actions


Produce DBMS interface



Work in pairs (/threes)


Choose topic on your own


Start forming your group today!


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

35

Collaboration model


Homework and exams done
individually


Project done with
your team members only
, though
can in general use any tools



Non
-
cited use of others’ problem solutions, code,
etc. = plagiarism


See Stern’s stern academic honesty policy


Contact me if you’re at all unclear before a particular
case


Cite any materials used if you’re at all unclear
after

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

36

On
-
going Feedback


Don’t be afraid to ask questions


Some parts will be abstract/mathematical



Topic selection will be partly based on
student interest


M.P. Johnson, DBMS, Stern/NYU, Spring 2008

37

So what is this course about, really?


Languages: SQL (some XML …)


Data modeling


Some theory! (rereading…)


Functional dependencies, normal forms


e.g., how to find most efficient schema for data


Some DB
MS

implementation (algs & data structs)


Algorithms and data structures (in the latter part)


e.g., indices make data much faster to find


how?


Lots of DB implementation and hacking for the
project

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

38

For next time


Get the book



Skim chapter 1


Start reading chapter 2

M.P. Johnson, DBMS, Stern/NYU, Spring 2008

39

For right now/tonight: email survey


Send to SOMEWHERE

1.
name

2.
previous cs/is/math/logic courses

3.
previous programming experience

4.
career plans: programmer, DBA, MBA, etc.

5.
why taking class/what you’re interested in
learning about