select

townripeΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 4 μήνες)

187 εμφανίσεις

Advanced Database Techniques

Martin.Kersten @ cwi.nl

Stefan Manegold@cwi.nl


Sandor Heman @ cwi.nl

Jennie Zhang @ cwi.nl

Romulo Goncalves @cwi.nl

Administrative details


The website evolves as during the course


Exam material is marked explicitly


Lab work deadlines are strict



Email is the preferred way to communicate


Tomorrow the assistants will be available in
person between 11:00
-
12:00, room REC
-
P.123

Relational systems


A database system should simplify the
organization, validation, sharing, and bookkeeping
of information


Prerequisite knowledge


Relational data model and algebra


Data structures (B
-
tree, hash)


Operating system concepts


Using a SQL database system


What is your practical experience
?


[Ruby on Rails expertise needed]

Applications


Bread
-
and
-
butter applications?


Web
-
shop


Banking systems


Inventory systems


Production systems


Shopping systems


Government systems


Health systems


Multimedia systems


Science systems …

Advanced Applications


Bread
-
and
-
butter applications ???


Banking systems


What happens if you install a stock trading system
which should handle >100K transactions/minute


How to derive trading advice using compute
intensive applications


How to warn thousands of users about their trading
opportunity


…. Need for parallel, distributed main
-
memory
database technology…

Advanced application requirements


Bread
-
and
-
butter applications


Inventory applications


How to install a battlefield inventory systems


How to deliver goods just in time?


How to keep track of moving objects/persons ?



… need for sensor
-
based database support and
RFID tags … need for a new DBMS ?…

Advanced Applications


Production systems


How to interact with component suppliers


How to manage the production workflow


How to avoid bad production steps


How to maintain a database with 12000 tables
(SAP)



… need for interoperability between autonomous
systems… datamining and knowledge discovery…

Advanced Applications


Health information systems


How to monitor your health over 30 years


How to enable quick response to a heart attack



…need for interoperable database systems …

HELP

The Ambient Home

HELP

The Ambient Home

911 called

MonetDB DataCell

MonetDB DataCell

911 called

nucleus

A Shared Tuple Space

using an SQL DBMS

Advanced applications


Government systems


Security


Biometric data management issues, finger/image
matching


Public safety


Forensics, manipulate complex objects using
proprietary algorithms



…need for extensible database technology…need
to support unstructured data…

Advanced Applications


Science systems


The new accelerator in CERN


how to handle >1PTByte files


The Sloan Digital Skyserver schema is 200
pages and the catalogued data 2.5Tb


How to query this efficiently



..need for P2P and … a novel way to organize
data…

LOFAR central processor specs


Streaming Data


Input: 320 Gbit/s


Internally within correlator: 20 Tbit/s


Into storage: 25 Gbit/s = 250 TByte/day


Final products: 1
-
3 TByte/day



High Performance Computing


Correlation: 15 Tflops


Pre processing and filtering: 5 Tflops


Off
-
line processing (calibration, analysis): 5
-
10 Tflops


Visualisation, control, scheduling etc: 2 Tflops



Storage


On
-
line temporal storage: 500 TByte


Archive: PByte range of data stored in Grid

Technological challenges


Data is often not structured as tables


XML and XQuery



Data does not always fit on one system


Distributed and parallel databases



Querying is more like world
-
wide searching


Continuous and streaming queries



A database tells more than facts


Datamining and knowledge discovery

Code bases


Database management systems are BIG software
systems


Oracle, SQL
-
server, DB2 >1 M lines


PostgreSQL 300K lines


MySQL 500 K lines


MonetDB 200
-
800 K lines


SQLite 40K lines



Programmer teams for DBMS kernels range from
a few to a few hundred

Performance components


Hardware platform


Data structures


Algebraic optimizer


SQL parser


Application code


What is the total cost of execution ?


How many tasks can be performed/minute ?


How good is the optimizer?


What is the overhead of the datastructures ?

Not all are equal

Gaining insight


Study the code base (inspection + profiling)


Often not accessible outside development lab



Study individual techniques (data structures +
simulation)


Focus of most PhD research in DBMS


Detailed knowledge becomes available, but ignores
the total cost of execution.



Study as a functional black box


Analyse a small application framework


The Jack The Ripper Project


Study the snippet of the database technology and
design an XQuery and SQL application



What is the schema?



What are the queries?



What are unorthodox solutions?

Learning points


My poor knowledge on relational database? Read
the chapters on SQL and relational algebra.
Knowledge on data structures comes in handy.



Database systems are much more than
administrative bookkeeping systems

Learning points



Advanced application challenge the technology
provided by a DBMS



Many techniques do not easily scale in size,
complexity, functionality



Effectiveness of a DBMS is determined by
many tightly interlocked components