townripeData Management

Jan 31, 2013 (5 years and 5 months ago)


Advanced Database Techniques

Martin.Kersten @


Sandor Heman @

Jennie Zhang @

Romulo Goncalves

Administrative details

The website evolves as during the course

Exam material is marked explicitly

Lab work deadlines are strict

Email is the preferred way to communicate

Tomorrow the assistants will be available in
person between 11:00
12:00, room REC

Relational systems

A database system should simplify the
organization, validation, sharing, and bookkeeping
of information

Prerequisite knowledge

Relational data model and algebra

Data structures (B
tree, hash)

Operating system concepts

Using a SQL database system

What is your practical experience

[Ruby on Rails expertise needed]


butter applications?


Banking systems

Inventory systems

Production systems

Shopping systems

Government systems

Health systems

Multimedia systems

Science systems …

Advanced Applications

butter applications ???

Banking systems

What happens if you install a stock trading system
which should handle >100K transactions/minute

How to derive trading advice using compute
intensive applications

How to warn thousands of users about their trading

…. Need for parallel, distributed main
database technology…

Advanced application requirements

butter applications

Inventory applications

How to install a battlefield inventory systems

How to deliver goods just in time?

How to keep track of moving objects/persons ?

… need for sensor
based database support and
RFID tags … need for a new DBMS ?…

Advanced Applications

Production systems

How to interact with component suppliers

How to manage the production workflow

How to avoid bad production steps

How to maintain a database with 12000 tables

… need for interoperability between autonomous
systems… datamining and knowledge discovery…

Advanced Applications

Health information systems

How to monitor your health over 30 years

How to enable quick response to a heart attack

…need for interoperable database systems …


The Ambient Home


The Ambient Home

911 called

MonetDB DataCell

MonetDB DataCell

911 called


A Shared Tuple Space

using an SQL DBMS

Advanced applications

Government systems


Biometric data management issues, finger/image

Public safety

Forensics, manipulate complex objects using
proprietary algorithms

…need for extensible database technology…need
to support unstructured data…

Advanced Applications

Science systems

The new accelerator in CERN

how to handle >1PTByte files

The Sloan Digital Skyserver schema is 200
pages and the catalogued data 2.5Tb

How to query this efficiently

..need for P2P and … a novel way to organize

LOFAR central processor specs

Streaming Data

Input: 320 Gbit/s

Internally within correlator: 20 Tbit/s

Into storage: 25 Gbit/s = 250 TByte/day

Final products: 1
3 TByte/day

High Performance Computing

Correlation: 15 Tflops

Pre processing and filtering: 5 Tflops

line processing (calibration, analysis): 5
10 Tflops

Visualisation, control, scheduling etc: 2 Tflops


line temporal storage: 500 TByte

Archive: PByte range of data stored in Grid

Technological challenges

Data is often not structured as tables

XML and XQuery

Data does not always fit on one system

Distributed and parallel databases

Querying is more like world
wide searching

Continuous and streaming queries

A database tells more than facts

Datamining and knowledge discovery

Code bases

Database management systems are BIG software

Oracle, SQL
server, DB2 >1 M lines

PostgreSQL 300K lines

MySQL 500 K lines

MonetDB 200
800 K lines

SQLite 40K lines

Programmer teams for DBMS kernels range from
a few to a few hundred

Performance components

Hardware platform

Data structures

Algebraic optimizer

SQL parser

Application code

What is the total cost of execution ?

How many tasks can be performed/minute ?

How good is the optimizer?

What is the overhead of the datastructures ?

Not all are equal

Gaining insight

Study the code base (inspection + profiling)

Often not accessible outside development lab

Study individual techniques (data structures +

Focus of most PhD research in DBMS

Detailed knowledge becomes available, but ignores
the total cost of execution.

Study as a functional black box

Analyse a small application framework

The Jack The Ripper Project

Study the snippet of the database technology and
design an XQuery and SQL application

What is the schema?

What are the queries?

What are unorthodox solutions?

Learning points

My poor knowledge on relational database? Read
the chapters on SQL and relational algebra.
Knowledge on data structures comes in handy.

Database systems are much more than
administrative bookkeeping systems

Learning points

Advanced application challenge the technology
provided by a DBMS

Many techniques do not easily scale in size,
complexity, functionality

Effectiveness of a DBMS is determined by
many tightly interlocked components