DBS Lecture#1 - Course Overview - CSIS - BITS Pilani

photofitterInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

307 εμφανίσεις

Database Systems:
Course Overview

Professor Navneet Goyal & K Hari Babu

Department of Computer Science & Information Systems

BITS, Pilani

©
Prof. Navneet Goyal, BITS, Pilani

Text Book


Hector

G

Molina,

Jeffrey

D
.
Ullman

&

Jennifer

Widom
.


Database

Systems



The

Complete

Book,

Pearson

Education,

2002
.




Home

Page
:


http
:
//www
-
db
.
stanford
.
edu/~ullman/dscb
.
html

©
Prof. Navneet Goyal, BITS, Pilani

Reference Books


Ramakrishna R. & Gehrke J.


Database Management Systems, 3e, Mc
-
Graw Hill,
2003.


http://www/cs.wisc.edu/~dbbook



Silberschatz A, Korth H F, & Sudarshan S.


Database System Concepts, 5e, TMH, 2005.


http://www.db
-
book.com


http://www.mhhe.com/silberschatz



Elmarsi R, & Navathe S B.


Fundamental of Database System, 5e, Pearson
Education, 2008.


http
:
//www
.
aw
.
com/cssupport

©
Prof. Navneet Goyal, BITS, Pilani

Course Website


http://csis/faculty/goel/Database Systems

©
Prof. Navneet Goyal, BITS, Pilani

Topics


Evolution of Databases


Data, Database, DBMS, & DBS


Data Modeling


Relational Databases


Schema Design & Normalization


Query Languages


Storage & Indexing


Query Processing & Optimization


Concurrency


Crash Recovery


Advanced Topics

©
Prof. Navneet Goyal, BITS, Pilani

Databases Everywhere!!!


DBMS contains information about a particular enterprise


Collection of interrelated data


Set of programs to access the data


An environment that is both
convenient

and
efficient

to use


Database Applications:


Banking: all transactions


Airlines: reservations, schedules


Universities: registration, grades


Sales: customers, products, purchases


Online retailers: order tracking, customized recommendations


Manufacturing: production, inventory, orders, supply chain


Human resources: employee records, salaries, tax
deductions


Databases touch all aspects of our lives

Tsunami of Data


Telecom data (


4.6 bn mobile subscribers)


There are 3 Billion Telephone Calls in US each day,

30 Billion emails daily, 1 Billion SMS, IMs.


IP Network Traffic: up to 1 Billion packets per hour per router.
Each ISP has many (hundreds) routers!


WWW


Weblog data (160 mn websites)


Email data


Satellite imaging data


Social networking sites data


Genome data


CERN’s LHC (15 petabytes/year)

Tsunami of Data


In 2005, mankind created 150 exabytes of
data


In 2010, it will create 1200 exabytes*






* 2008 study by International Data Corp. (IDC)


Tsunami of Data


No. of
pics

on
Facebook


15
bn

unique photos


60
bn

photos stored (4 sizes)


Imageshack

(20
bn
)


Photobucket

(7.2
bn
)


Flickr

(3.4
bn
)


Multiply (3
bn
)

©
Prof. Navneet Goyal, BITS, Pilani

Biggest OLTP System


SABRE


Sabre

is a
computer reservations
system
/global distribution system
(GDS) used by
airlines
,
railways
,
hotels
,
travel agents

and other
travel companies


Used by more than 200 airlines

©
Prof. Navneet Goyal, BITS, Pilani

DBMS



Is it a Dry Area?


The area of DBMS is a microcosm of
computer science in general


The issues addressed and the
techniques used span a wide
spectrum including


Languages


Object
-
orientation & other
programming paradigms

©
Prof. Navneet Goyal, BITS, Pilani

DBMS



Is it a Dry Area?


Compilation


Operating systems


Concurrent programming


Data structures


Algorithm


Parallel & distributed computing


User interfaces


Expert systems & AI


Statistical techniques & Dynamic
programming

Reference: DBMS by Raghurama Krishna & Gherke, 3e

©
Prof. Navneet Goyal, BITS, Pilani

Basic Definitions


Database
: A collection of related data.


Data
: Known facts that can be recorded and
have an implicit meaning.


Mini
-
world
: Some part of the real world about
which data is stored in a database. For example,
student grades and transcripts at a university.


Database Management System (DBMS)
: A
software package/ system to facilitate the
creation and maintenance of a computerized
database.


Database System
: The DBMS software
together with the data itself. Sometimes, the
applications are also included.

©
Prof. Navneet Goyal, BITS, Pilani

DBMS Functionalities


Define a database : in terms of data types,
structures and constraints


Construct or Load the Database on a
secondary storage medium


Manipulating the database : querying,
generating reports, insertions, deletions
and modifications to its content


Concurrent Processing and Sharing by a set
of users and programs


yet, keeping all
data valid and consistent


Crash Recovery

©
Prof. Navneet Goyal, BITS, Pilani

File System vs. DBMS


A company has 500 GB of data on
employees, departments, products, sales,
& so on..


Data is accessed concurrently by several
employees


Questions about the data must be
answered quickly


Changes made to the data by different
users must be applied consistently


Access to certain parts of the data be
restricted

©
Prof. Navneet Goyal, BITS, Pilani

File System vs. DBMS


Data stored in operating system files


Many drawbacks!!!

©
Prof. Navneet Goyal, BITS, Pilani

File System vs. DBMS


These drawbacks have prompted the
development of database systems


Database systems offer solutions to
all the above problems?


©
Prof. Navneet Goyal, BITS, Pilani

Advantages of a DBMS


Program
-
Data Independence


Insulation between programs and data:

Allows
changing data storage structures and operations
without having to change the DBMS access programs.


Efficient Data Access


DBMS uses a variety of techniques to store & retrieve
data efficiently


Data Integrity & Security


Before inserting salary of an employee, the DBMS can
check that the dept. budget is not exceeded


Enforces access controls that govern what data is
visible to different classes of users

©
Prof. Navneet Goyal, BITS, Pilani

Advantages of a DBMS


Data Administration


When several users share data , centralizing the
administration offers significant improvement


Concurrent Access & Crash Recovery


DBMS schedules concurrent access to the data in such
a manner that users think of the data as being
accessed by only one user at a time


DBMS protects users from the ill
-
effects of system
failures


Reduced Application Development Time


Many important tasks are handled by the DBMS


©
Prof. Navneet Goyal, BITS, Pilani

Major Players


Oracle


9i,10g, 11i


Microsoft


SQL SERVER 200x


IBM DB2


MySQL AB


PostgreSQL


Oracle 44.3% vs. 21% for IBM vs.
18.5% for Microsoft

Some Recent Topics


MapReduce* & Parallel Databases:
Friends or Foes?

Michael Stonebraker et. al.

Comm. of the ACM, Jan. 2010, Vol. 53(1)


*
MapReduce: Simplified Data Processing on Large Clusters


Jeffrey Dean and Sanjay Ghemawat @ Google Inc. 2004

©
Prof. Navneet Goyal, BITS, Pilani

Some Recent Topics


MapReduce paradigm for large
-
scale, massively parallel, data
-
access & processing


Will it relegate Relational technology to
the status of a legacy technology?


MapReduce actually complements DBMSs
since DBs are not designed for ETL tasks,
a MapReduce speciality


Connection with Facebook?

©
Prof. Navneet Goyal, BITS, Pilani

©
Prof. Navneet Goyal, BITS, Pilani

Benchmarking DBs


The term transaction is often applied to a
wide variety of business and computer
functions. Looked at as a computer
function, a transaction could refer to a set
of operations including disk read/writes,
operating system calls, or some form of
data transfer from one subsystem to
another

©
Prof. Navneet Goyal, BITS, Pilani

Benchmarking DBs


While TPC benchmarks certainly involve the
measurement and evaluation of computer
functions and operations, the TPC regards a
transaction as it is commonly understood in the
business world: a commercial exchange of
goods, services, or money. A typical transaction,
as defined by the TPC, would include the
updating to a database system for such things
as inventory control (goods), airline reservations
(services), or banking (money).


©
Prof. Navneet Goyal, BITS, Pilani

Benchmarking DBs


In these environments, a number of customers
or service representatives input and manage
their transactions via a terminal or desktop
computer connected to a database. Typically,
the TPC produces benchmarks that measure
transaction processing (TP) and database (DB)
performance in terms of how many transactions
a given system and database can perform per
unit of time, e.g., transactions per second (tpsC)
or transactions per minute (tpmC)

©
Prof. Navneet Goyal, BITS, Pilani

Benchmarking DBs


Results

©
Prof. Navneet Goyal, BITS, Pilani

Levels of Abstraction


Databases provide users with an
abstract view of data


©
Prof. Navneet Goyal, BITS, Pilani

Relational Query
Languages


Query languages
:

Allow manipulation
and
retrieval of data

from a database.


Relational model supports simple,
powerful QLs:


Strong formal foundation based on logic.


Allows for much optimization.


Query Languages
!=

programming
languages!


QLs not expected to be “Turing complete”.


QLs not intended to be used for complex
calculations.


QLs support easy, efficient access to large data
sets.

©
Prof. Navneet Goyal, BITS, Pilani

Formal Relational
Query Languages


Two mathematical Query
Languages form the basis for “real”
languages (e.g. SQL), and for
implementation:


Relational Algebra
:

More
operational
,
very useful for representing execution
plans.


Relational Calculus
:

Lets users
describe what they want, rather than
how to compute it. (
Non
-
operational,
declarative
.)

©
Prof. Navneet Goyal, BITS, Pilani

The SQL Query
Language


SQL has been influenced by both
Relational Algebra (RA) & Relational
Calculus (RC)


More so by RC, particularly Tuple
relational Calculus (TRC)


The other variant of RC is Domain
Relational Calculus (DRC) which has
greatly influenced Query By Example
(QBE)

©
Prof. Navneet Goyal, BITS, Pilani

The SQL Query
Language


SQL consists of:


DDL (Data Definition Language)


Create conceptual schema


DML (Data Manipulation Language)


Relational operators


Insert, Delete, Update


VDL (View Definition Language)


Specify user views & their mapping to the
conceptual schema (in most DBMSs, done by
DDL)


SDL (Storage Definition Language)


File organization


Indexes


©
Prof. Navneet Goyal, BITS, Pilani

The SQL Query
Language


SDL is being removed from SQL


DMLs


High
-
level or nonprocedural (declarative)


Can be entered at the SQL > or can be embedded in a
general purpose programming language


Can specify & retrieve many records uin a single DML
statement & are hence called
Set
-
at
-
a
-
time

DMLs


Low
-
level or procedural


Must be embedded in a general purpose
programming language


Typically retrieves individual records or objects from
the DB & processes each separately


Therefore it need PL constructs like looping


Record
-
at
-
a
-
time
DMLs


©
Prof. Navneet Goyal, BITS, Pilani

The SQL Query
Language

DMLs


Whenever DML statements are embedded in a PL,
that language is called as the
host language

and
the DML is called the
Data Sublanguage


In object DBs, the host language & data
sublanguage form one integrated language


for
eg. C++ with some extensions to support
database functionality


Some RDBMSs also provide integrated languages


for eg. ORACLE’s PL/SQL.

©
Prof. Navneet Goyal, BITS, Pilani

Operations on
Relations


Restrict


Project


Join


Divide



Union


Intersection


Difference


Product

Closure

Property

Relational Operations

Set Operations

©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing

1.
Parsing and translation

2.

Optimization

3.

Evaluation


©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing

©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing


Parsing and translation


Translate the query into its internal form.


Translation is similar to the work performed
by the parser of a compiler


Parser checks syntax, verifies relations


Parse tree representation


This is then translated into RA expression


©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing


Query Execution Plan


In SQL, a query can be expressed is several ways


Each SQL query can itself be translated into RA expression
in many ways


An RA expression only partially tells you how to evaluate a
query


Several ways to evaluate RA expression


Annotate RA expression with instructions specifying how to
evaluate each operation


Annotation may state the algorithm to be used for a specific
operation or the particular index to use


Annotated RAE is called an
evaluation primitive


Sequence of primitive operations is a QEP

©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing


Evaluation


The query
-
execution engine takes a query
-
evaluation plan, executes that plan, and returns
the answers to the query.

©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing


Example



select
balance



from
account



where
balance

< 2500


RAEs



balance

2500
(

balance
(
account))



balance
(

balance

2500
(
account))


E.g., we can use an index on
balance

to find
accounts with balance < 2500,


or can perform complete relation scan and
discard accounts with balance


2500

©
Prof. Navneet Goyal, BITS, Pilani

Steps in Query
Processing

Query Execution Plan

©
Prof. Navneet Goyal, BITS, Pilani

Query Optimization


Different QEPs for a given query can
have different costs


Users not expected to write their
queries in a way that suggests the most
efficient QEP


It is the system’s responsibility to
construct a QEP that minimizes the cost


This is
Query Optimization

©
Prof. Navneet Goyal, BITS, Pilani

Query Optimization


For optimizing a query, the Query
Optimizer must know the cost of each
operation


Cost is hard to compute


Depends on many parameters such as
actual memory available to the
operation


Systems work with rough estimates

©
Prof. Navneet Goyal, BITS, Pilani

Query Optimization


Ideally:

Want to find best plan.
Practically:

Avoid worst plans!

Overall System Structure

Figure taken from Silberschatz: Databse
System Concepts, 5e, McGraw Hill

Q & A

Thank You