Module #4

engineerbeetsΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

151 εμφανίσεις

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

1

Правительство Российской Федерации



Федеральное государственное автономное учреждение высшего
профессионального образования

«Национальный исследовательский университет
«Высшая школа экономики»



Факультет
Бизнес
-
информатика
Отделение
Программная
инженерия


Программа дисциплины
"Перспективные базы данных"

(
-

на английском языке
-
)



для направления 231000.68 "Программная инженерия"
подготовки магистра



Автор программы
доцент, к.т.н.
А.Д. Брейман
abreyman
@
hse
.
ru




Рекомендована секцией УМС

Одобрена на заседании кафедры

по бизнес
-
информатике

Управления разработкой программного

обеспечения (УРПО)

Председатель Ю.В. Таратухина

Зав. кафедрой СМ. Авдошин


«

_____

»

___________________

2011 г.

«

____

»

______________________

2011 г.


Утверждена Ученым Советом

НИУ ВШЭ

Ученый секретарь В. А. Фомичев

«

____

»

___________________

2011 г.



Москва



Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

2

I. Summary


Author of the Program:

Assoc. Prof., PhD. Alexander D. Breyman
(responsible lecturer)


General Information about Training Course:

The course is offered to students of the Master Program
«System and Software Engineering
(SSE)»
(code

231000.68)

under chief subjects
«Methods and Theory of Software Engineering))
and
(Management of Software Developments
in the
School of Software Engineering
, Faculty of Business
Informatics of the National
Research University

-

Higher School of Economics (
HSE
).

This mandatory course belongs to special subject

curricula unit
(
М
.
2
.
Б

unit/ Base module
[Special subject
disciplines
M.2]
of

2011
-
2012

academic year's working syllabus)
covered by the list of train
ing courses of
master's program (1
st

academic
year).


It is a double module course, which is delivered in modules #3 and #4 of the first academic year
.

Number of credits is
5
. Total course length is
180
academic hours including
6
0

auditory hours (
3
0

Lecture
(L)
hours

and

3
0
Seminar
(S)
hours) and
1
2
0

Self
-
study (
SS
) hours.
Academic control forms
are one home assignment, one test, and written exam
s

after module

#
3
and #
4
.


Prerequisites:

The course is based on the knowledge of foundations of discrete mathematics (including logic and set
theory), computer science, and computer programming. The students are
assume
d to be familiar with
the topic of databases and have a proper knowledge of th
e database design and implementation,
including entity
-
relationship data modeling, relational model, algebra and calculus, functional
dependencies and normalization theory, relational query languages, including SQL
.


Course Objective
s
:

The objective of A
dvanced databases course delivery is to form professional competencies related to
design and implementation of
non
-
relational
databases
, including object
-
oriented, object
-
relational
,
deductive,
multidimensional and semistructured

database models. Students will get a grasp on
strengths and weaknesses of wide spectrum of approaches to data storage, search and retrieval,
resulting in informed choice of database model.

Abstract:

The course is based on “Advanced databases” course of Eindhoven University of Technology
(Eindhoven, Netherlands),
Faculty of Math and Computer Science
, author and principal lecturer


Prof. Dr. Toon Calders.

While Databases course of bachelor curricula co
vered many of the core concepts behind database
modeling, design and implementation, there are many other considerations that should be addressed
for successful scientific or industry career in this field. The main objective of this course is to expand
stu
dents’ view and introduce advanced topics, including object
-
oriented extensions, deductive
databases and appropriate query languages, data warehouses and online analytical data processing,
and XML. The additional topics covered in this course will help you

become more proficient in
choosing appropriate technologies for projects and will expand your knowledge base so that you have
a better understanding of the field. By the end of the course, students should have a solid grasp on
business intelligence tools
and XML, which will prove to be invaluable as them progress further in
computer science studies.

This course studies different database models and their properties. The models that will be discussed
are:


Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

3



Object
-
oriented databases;



Deductive databases (
D
at
alog)
;



Data warehousing and online analytical processing (ROLAP, MOLAP and HOLAP)
;



The XML data model (query languages XQuery and XPath, DTDs)
.

For these conceptual models the course will concentrate on the following points:

Why was the
database model
introduced? Which of the shortcomings of other models does it address?

What are
the most important concepts and notions for the database model?

How is the model implemented?
Which are the main techniques? The importance of understanding the internals of a

particular
database model cannot be overemphasized as it is closely connected to its limitations.


Training Objectives:

After taking this course the student should have achieved the following objectives:

Knows the
shortcomings and restrictions of relation
al data model
.

Can reason about
expressibility

of
relational query languages using notion of locality.

Knows data storage methods usable for object
-
oriented program systems, including pure object
database systems and object
-
relational mappers, its advanta
ges and disadvantages.

Knows models and methods of organization of d
eductive

databases using Datalog language.
Knows
implementation and optimization techniques for Datalog translator.

Knows the reference architectures of data warehouses and is aware of th
e basic functionality offered
by available commercial and free data warehousing systems.
Master methods and tools for creating
analytical database solutions. Can choose dimensions for multidimensional databa
se, group them
into hierarchies and

define aggregates
. Knows principles of view choice for materialization.

Knows
XML

document model, DTD and XML Schema. Can write and understand XPath and
XQuery expressions. Can create and interpret XLST transformations.

Students should be able to understa
nd the language of studies models, choose and use appropriate
models and programming languages, implement systems using chosen models, methods and tools.


I
I
.
Topic
-
wise Curricula Plan




Title of the topic


Total Hours

Classroom hours

Self
-
study


Lectures

Seminars

Module

#
3

(
hours: 20
)

1

Introduction and
o
verview of the
c
ourse.

14

2

2

10

2

Expressibility

of the relational
algebra
and SQL
.

14

2

2

10

3

Object
-
oriented databases
.

28

4

4

20

4

Deductive databases.

24

2

2

2
0

Module #3 totals

80

1
0

1
0

60

Module

#
4

(
hours:
40
)


5

Data warehousing and online
analytical processing
.

46

8

8

30

6

Semistructured data

and

XML.

52

10

1
2

30

7

Conclusion
.

2

2



Module #
4

totals

100

20

20

60


Total:

180

3
0

3
0

1
2
0


I
II
.
Base Textbook(s) and Recommended
Readings

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

4


Printed Sources:



Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp.



Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition,
Prentice Hall, 2009.


1248pp.



Libkin L.
Expressive power of SQL
. // Theoretical Computer Science, 296(3):379
-
404. 2003.



Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley, 2010.


1200 pp.



Blaha M.
Patterns of Data Modeling

(Emerging Directions in Database Systems and
Applications), CRC Press, 2010.


261pp.



Cattell R.G. et al.
The Object Data Standard: ODMG 3.0
, Morgan Kaufmann, 2000.


280pp.



Edlich S. et al.
The Definitive Guide to db4o
, Apress, 2006.


512pp.



Rattz J., F
reeman A.
Pro LINQ: Language Integrated Query in C# 2010
, Apress, 2010.


840pp.



Marguerie F. et al.
LINQ in Action
, Manning Publications, 2008.


600pp.



Kuate P.H. et al.
NHi
bernate in Action
, Manning Publications, 2009.


400pp.



Nilsson U., Maluszynski J
.
Logic, Programming and Prolog
, 2
nd

Edition, John Wiley and Sons,
2000.

294pp. (Also online:
http://www.ida.liu.se/~ulfni/lpp/
)



Golfarelli M., Rizzi S.
Data Warehouse Design: Modern Principles and
Methodologies
,
McGraw
-
Hill Osborne Media, 2009.


480pp.



Inmon W. H., Krishnan K.
Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design
, Technics Publications, 2011.


216pp.



Celko J.
Joe Celko's Analytics and OLAP in SQL
, Morgan Kau
fmann, 2006.


208pp.



Smith B.C., Clay C.R.
Microsoft SQL Server 2008 MDX Step by Step
, Microsoft Press, 2009.


400pp.



Melton J., Buxton S.
Querying XML: XQuery, XPath, and SQL/XML in context
, Morgann
Kaufmann, 2006. 848pp.



Ben
-
Gan I. et al.
Inside Micros
oft SQL Server 2008: T
-
SQL Programming
, Microsoft Press,
2009.


832 pp.



Хантер, Д.
XML. Базовый курс.



К.:
Диалектика, 2009.
-

1344 с.



Мейер, Б. Объектно
-

ориентированное конструирование программных систем.


М.:
Русская Редакция, 2005.
-

1198 с.



Бергер
,
А
.
Microsoft

SQL

Server

2005
Analysis

Services
.
OLAP и многомерный анализ
данных.



СПб.:
БХВ
-
Петербург, 2007.
-

905 с.



Братко, И.
Алгоритмы искусственного интеллекта на языке PROLOG.



М.:
Вильямс,
2004.
-

637 с.


IV
.
Forms

of

Control


Current

control:
attendance record, seminar
-
based knowledge control, group project control;

Intermediate control: written test by the end of Module
3
, group project pre
sentation by the end of
Module 4
;

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

5

Final control: exam by the end of Module 4;

The final course grade is a

sum of the following elements:


1)

attendance record (A);

2)

practice activities (reports, discussions, business cases) (S);

3)

group project (P);

4)

written test (T);

5)

exam (E).


The overall and accumulated course grades Go and Ga (10
-
point scale each)

are calculated as
follows:

Ga = 0.15A + 0.15S + 0.3P + 0.4T;

Go = 0.1A + 0.1S + 0.2P + 0.2T + 0.4E.


The overall and accumulated course grades Go and Ga (10
-
point scale each) include results achieved
by students in their attendance record A, practice acti
vities S, group project P, written test T and
exam E; it is rounded up to an integer number of points. The rounding procedure accounts for
students' practice activities during seminars. Intermediate assessment retakes are not allowed.

Conversion of the con
cluding rounded grade to five
-
point scale grade is done in accordance with the
following table:


Summary Table

:

Correspondence of ten
-
point

(
10
)

to five
-
point

(
5
)

system's marks


Ten
-
point scale

[10]

Five
-
point scale

[5]

1

-

unsatisfactory

2

-

very bad

3

-

bad

Unsatisfactory
(UnS)

-

2

4

-

satisfactory

5

-

quite satisfactory

Satisfactory
(S)

-

3

6

-

good

7

-

very good

Good
(G)

-

4

8

-

nearly excellent

9

-

excellent

10

-

brilliantly

Excellent
(E)

-

5



V. Course Contents


Topic 1. Introduction and
overview of course.

Lecture o
utline:



Structure of course.



Restrictions of relational model
.



Object
-
oriented databases.



Deductive databases.



Data warehousing and OLAP.



Semistructured data and XML.



Project.



Grading.


Practical studies:

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

6

1.

Relational query lang
uages refresher



Formulate queries for given typical relational database using relational algebra, SQL
and Datalog
.



Express graph
-
theoretic queries (e.g.: give all nodes that do not have any incoming
edges) in either SQL or relational algebra
.

Main referenc
e/books/reading:

1.

Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp.

Additional reference/books/reading:

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 2009.


1248pp.

2.

Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.


Topic
2
:
Expressibility of the

relational

algebra and SQL
.

Lecture o
utline:



Why relational model is insufficient.



Expressivenes of relational algebra.



Limitations of relational query languages

concerning inexpressibility of certain queries
classes, e.g. transitive closure and other recursive queries.




Gaifman locality.



Historic
al overview of non
-
relational models.



Nested relational algebra.



Nesting in SQL dialects.



Extending SQL with recursion.

Practical studies:

1.

Locality



Construct the Gaifman graph for given database and
identify the spheres and
neighborhoods for given
tuples.



Construct recursive SQL queries.



Determine the locality ran
k

for given query.

2.

Limitations of the relational model



Write an SQL query using a recursive view definition
.



Express queries using nested relational algebra
.



Rewrite nested relational
algebra expressions into flat ones
.

Main reference/books/reading:

1.

Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp. [Chapter
9
]

2.

Libkin L.
Expressive power of SQL
. // Theor
etical Computer Science,
296(3), pp.379
-
404,

2003.

3.

Libkin L. On
the forms of locality over finite models
. // Logic in Computer Science, 1997.
LICS’97. Proceedings of 12
th

Annual IEEE Symposium, pp.204
-
215, 1997.

4.

Hella L., Libkin L., Nurmonen J.
Notions of locality and their logic
al characterizations
over finite models
. // Journal of Symbolic Logic, 64(4), pp.1751
-
1773, 1999.

Additional reference/books/reading:

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

7

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 2009.


1248pp.

2.

Elm
asri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.

3.

Ben
-
Gan I. et al.
Inside Microsoft SQL Server 2008: T
-
SQL Programming
, Microsoft
Press, 2009.


832 pp.


Topic
3
. Object
-
oriented databases.

Lectures o
utline:



Applications that require storage and manipulation of complex data.



Object
-
oriented programming languages for complex objects manipulation.



Mapping objects to tuples in relations.



Impedance mismatch.



Extending SQL with complex types: collections
, structures, inheritance, references.



Methods for complex types.



Notion of persistence.



Persistent programming languages.



Persistent objects.



Object identity.



Query languages for object
-
oriented databases.



Object
-
relational databases.



Object
-
relati
onal mapping.

Practical studies:

1.

Object
-
oriented databases



Define ODMG
-
compliant database schema
.



Write queries for db4o system.

2.

Object
-
relational mappers



Write LINQ queries for Microsoft SQL Server
.



Write JDO
-
enabled java programs.



Create a .NET classes with NHibernate
-
based persistence.


Main reference/books/reading:

1.

Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp. [Chapter
22
]

2.

Cattell R.G. et al.
The Object Data Standard:
ODMG 3.0
, Morgan Kaufmann, 2000.


280pp.

3.

Jordan D., Russell C.
Java Data Objects
. O’Reilly Media, 2003.


3
8
4
pp.

Additional reference/books/reading:

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 200
9.


1248pp.

2.

Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.

4.

Ben
-
Gan I. et al.
Inside Microsoft SQL Server 2008: T
-
SQL Programming
, Microsoft
Press, 2009.


832 pp.


5.

Edlich S. et al.
The Definitive
Guide to db4o
, Apress, 2006.


512pp.

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

8

6.

Rattz J., Freeman A.
Pro LINQ: Language Integrated Query in C# 2010
, Apress, 2010.


840pp.

7.

Marguerie F. et al.
LINQ in Action
, Manning Publications, 2008.


600pp.

8.

Bauer C.

et al.
NH
i
bernate in Action
, Manning Publica
tions, 2009.


400pp.

9.

Van den Bussche J.

Simulation of the nested relational algebra by the flat relational
algebra, with an application to the complexity of evaluating powerset algebra
expressions
. Theor. Comput. Sci.
,

254(1
-
2), pp.
363
-
377
.



2001.


Topic
4
. Deductive databases.

Lecture o
utline:



Logic programming and Prolog.



Combining rules and facts in one database.



Intensional and extensional relations.



Datalog syntax.



Semantics of the Datalog program.



Non
-
recursive Datalog programs.



Negation.




Safety of rule.



Model
-
theoretic semantics.



Recursive Datalog programs.



Fixpoint evaluation.



Stratified programs and strata.



Aggregation.



Equivalence of relational algebra and safe Datalog with negation, without recursion and
aggregation.



Evaluation and optimization of Datalog programs: avoiding repeated and unnecessary
inferences, filtering with “magic sets”, indexing, materialization.

Practical studies:

1.

Datalog
without recusrion



Write a Datalog program that defines intensional views for
certain kinship types
.



Give a relational algebra expressions for views defined by given Datalog programs.



Give a minimal model of a Datalog program.



Reason about safeness of Datalog program.



Give a stratified model contents of a Datalog program.

2.

Recursive
Datalog



Show steps of naïve and semi
-
naïve implementation of recursion for computing
given intensional view.



Give a relational algebra expressions for views defined by given Datalog programs.

Main reference/books/reading:

1.

Silberschatz A., Korth H.F., Suda
rshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp. [Chapter ]


2.

Ramakrishnan R., Ullman J.
A Survey of Research on Deductive Databases
. // Journal of
Logic Programming

23(2)
,
pp. 125
-
149,
199
5
.

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

9

3.

Bancilhon F., Maier D., Sagiv

Y., and Ullman J.
Magic Sets and Other Strange Ways to
Implement Logic Programs
. // In Proceedings ACM SIGACT
-

SIGMOD
-
SIGART
Symposium on Principles of Database Systems (PODS’86), pp. 1
-
15, 1986.

4.

Minker

J
.
Logic and Databases: A 20 Year Retrospective
.
//
In D. Pedreschi andC. Zaniolo
(eds.), Proceedings International Workshop on Logic in Databases (LID’96), S
pringer
-
Verlag, New York, pp. 5
-
52, 1996.

5.

Nilsson U., Maluszynski J.
Logic, Programming and Prolog
, 2
nd

Edition, John Wiley and
Sons, 2000.

294pp. (A
lso online:
http://www.ida.liu.se/~ulfni/lpp/
)

Additional reference/books/reading:

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 2009.


1248pp.

2.

Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.

3.

Naqvi S.,
S. Tsur

S
.
A Logic Language for Data and Knowledge Bases
. Computer Science
Press, New York, 1989
.

4.

Mumick

I.S.
, Finkelstein

S.J.
, Pirahesh

H.
,
Ramakrishnan

R.

Magic is Relevant
.

// In

Proceedings of the 1990 ACM SIGMOD international conference on Management of data.


Topic 5
. Data warehousing and OLAP.

Lecture o
utline:



Requirements to data management from decision support systems.



Historical,
summarized, integrated data.



Statistical and analytical queries.



Business intelligence applications.



Three
-
tier architecture.



Extract
-
transform
-
load process.



Data warehouse, data mart.



Online analytical processing (OLAP).



Conceptual models for decision support.



Multidimensional view on the data.



Cross
-
tabulation.



Data cubes.



Operations with data cubes: roll
-
up, drill
-
down, pivot, slice & dice, select.



Query languages for supporting OLAP.



SQL extensions: Group by cube, group by rollup.



Multidimensional expressions (MDX).



Data explosion problem.



View materialization: optimal set of views.



Partial order on views, cost model, greedy algorithm.



Relational OLAP (ROLAP): Star schema, snowfla
ke schema, snowflake constellation.



Multi
-
dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense
dimensions.



Indexing of dimensions: b
-
tree, bitmap, join indexes.



Hybrid OLAP (HOLAP).

Practical studies:

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

10

1.

Data warehousing and datacubes



Give

a relational schema for certain data warehouse with sales data
.



Express datacube aggregation in SQL:1999
.



Count datacube cells for given relation in dense and sparse settings
.



Determine sizes of views for given datacube
.



Choose views to materialize by app
lying the greedy algorithm
.

2.

ROLAP, MOLAP and HOLAP




Choose set of views

for materialization for demographic database based on uniform
access path distribution assumption
.



Choose set of views for materialization for demographic database based on known
wor
kload.



Propose suitable physical organization for given datacube.



Construct bitmap
-
indexes and use it for query answering.

Main reference/books/reading:

1.

Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp. [Chapter
20
]


2.

Golfarelli M., Rizzi S.
Data Warehouse Design: Modern Principles and Methodologies
,
McGraw
-
Hill Osborne Media, 2009.


480pp.

3.

Celko J. Joe Celko's
Analytics and OLAP in SQL
, Morgan Kaufmann, 2006.


208pp.

4.

Smith B.C., Clay C.R.
Microsoft SQL Server 2008 MDX Step by Step
, Microsoft Press,
2009.


400pp.

5.

Chaudhury S., Dayal U.
An overview of data warehousing and OLAP technology
. //
SIGMOD Record,
v.26 n.2, p
p
.507
-
508, 1997
.

6.

Harinarayan V., Rajaraman A., and Ullman J.
Implementing D
ata Cubes Efficiently
.
// In
Proceedings of the 1996 ACM SIGMOD international conference on Management of data

(
SIGMOD '96
), pp. 205
-
216, 1996.


7.

Jensen

C.
, Pedersen

T.
, Thomsen

C.

Multidimensional Databases and Data Warehousing
,
Morgan & Claypool Publisher
s
,

2010
.


111p.

Additional reference/books/reading:

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 2009.


1248pp.

2.

Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.

3.

Rainardi V.
Building a Data Warehouse: With Examples in SQL Server
. Apress, 2008.


541pp.


4.

Inmon W. H., Krishnan K.
Building the Unstructured Data Warehouse: Architecture,
Analysis, and Design
, Technics Publications
, 2011.


216pp.

5.

Kimball R., Ross M.
The Data Warehouse Toolkit
, Wiley, 2002.


447pp.


Topic
6
. Semistructured data.

XML.

Lecture o
utline:



Semistructured data for human and machine consumption.



XML: tags, elements, attributes, values.



Well
-
formed and valid documents.

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

11



Namespaces.



XPath: axes, node
-
tests, predicates.



XQuery: expressions, functions.



FLWOR expressions.



XQuery data model.



DTD and XML Schema.



Simple and complex types

of XML Schema
.



Light XQuery.



Extensible Stylesheet
Language Transformations (XSLT): templates, parameters, variables.



XML data management.

Practical studies:

1.

X
ML and XPath



Correct errors and build tree representation of XML document.



Write XPath expressions for selection of nodesets.



Check
and prove
equ
ivalence of XPath expressions.

2.

XQuery

expressions and functions



Explain results of given XQuery expressions
.



Write XQuery expressions for given queries
.



Write XQuery functions that will create
new
document on base
of existing
document
.



Write XQuery
functions for transform attributes into elements
.



Write XQuery function than computes a transitive closure of a graph.

3.

Advanced XQuery



Check and prove equivalence of XQuery expressions.



Write a XQuery expression for complex transformation of XML document.

4.

Efficiently querying large XML repositories



Build annotated for indexing element tree for XML document.



Write expressions for axe navigation in annotated element tree.



Divide nodes of annotated element tree into equivalence classes.



Encode edge
-
labelled t
ree for storing it in a relational table.



Write an SQL queries for computing the results of given path expressions.

Main reference/books/reading:

1.

Silberschatz A., Korth H.F., Sudarshan S.

Database System Concepts
, 6
th

ed, McGraw
-
Hill,
2010.


1376pp. [Cha
pter
23
]


2.

Melton J., Buxton S.
Querying XML: XQuery, XPath, and SQL/XML in context
,
Morgann Kaufmann, 2006. 848pp.

3.

Ben
-
Gan I. et al.
Inside Microsoft SQL Server 2008: T
-
SQL Programming
, Microsoft
Press, 2009.


832 pp.

4.

Gou
G,
Chirkova

R.

Efficiently
Querying Large XML Data Repositories: A Survey
.
//
IEEE
Transactions on Knowledge and Data Engineering
.
v.
19,

no.
10, 2007
.

Additional reference/books/reading:

1.

Garcia
-
Molina H., Ullman J., Widom J.
Database Systems: The Complete Book
, 2
nd

Edition, Prentice Hall, 2009.


1248pp.

2.

Elmasri R., Navathe S.B.
Fundamentals of Database Systems
, 6
th

ed., Addison Wesley,
2010.


1200 pp.

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

12

3.

Thuraisingham B. XML
Databases and th
e

Semantic Web
. CRC Press, 2002.


300pp.


Topic
7
.
Conclusion
.

Lecture o
utl
ine:

The last theory lecture will conclude the course with a summary of the material covered during the
course. Students will get the opportunity to ask questions. We will review and discuss some past
exams.


V
I
.
Assignment topics


Home assignment for
group project
:

The course Advanced databases includes a group project, compulsory to all students. Students will
work in groups of up to 5 students on either one of the suggested topics, or on a subject of their own
choice.

Projects can be of two types:

1.

Application: build an application using the techniques studied in the course. Motivate the choices
you made during the project; e.g., why did you choose to store user data in XML, why do you use an
object oriented database instead of a relational one, etc
. The grade is not only determined by the
complexity of your application, but also (and even more) by the appropriate use of the right
technologies.

2. Research: study some of the topics more in
-
depth; e.g.: one of the topics is OLAP and it is argued
that

such systems can more efficiently handle ad
-
hoc analytical queries. Test this by constructing a
database and benchmark the performance of certain types of queries in both systems. What is the
influence of a particular indexing technique? Other example: ta
ke one of the research papers and
validate the results of the paper by repeating the experiments; make a critical analysis, show
weaknesses and opportunities for the approach proposed in the paper.


The only restriction on the subject of the project is tha
t it needs to involve the techniques that have
been studied the course; i.e., the subject needs to relate to at least one of the following topics:

1. Object
-
oriented databases (this include persistent programming languages) or object
-

relational extensions

to relational databases,

2. Datalog or other rule
-
based deduction engines,

3. Semistructured data (XML),

4. Data warehousing and Online Analytical Processing (OLAP)


In the project students can use available implementations and tools; i.e., you do not
have to write
your own XQuery processor or your own Datalog engine. You can use existing database management
systems, such as MySQL, Postgress, MonetDB for relational sup
-

port, Galax, MonetDB/XQuery for
XPath and XQuery functionality, available Java and P
HP libraries, Object
-
oriented databases;
Objectivity, ObjectStore, etc.


A good guideline to take into account when formulating a project proposal is that the workload is
expected to be between 30 to 40 hours per group member. Please notice that the main w
orkload
should be on course
-
related topics; e.g., spending two weeks on making a dazzling GUI might turn
out to be less effective for getting a good grade than working a couple of days on improving the
database structure.


To successfully complete the proj
ect, several deliverables need to be provided:

Project proposal. Latest dat
e proposals will be accepted: 01.04.2012
. This document must

contain the following information:

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

13

1. Group members;

2. Project description:

(application
) Which data are you planning to use? What will be the functionalities of the application?
Make a detailed list of the functionalities.

(research) which papers will you discuss? What are the claims you will test? What experiments will
you consider? How wi
ll you evaluate the results?

3. A planning; how long do you expect to work on each of the components? 4. Overview of the
technologies you are planning to use.

The complete project proposal report is expected to be around 2 pages (excluding figures).


Final report. Due date:
01.06.2012
.

(application) This document describes the inside of the application; how is the data stored? What are
the DTDs/schema’s that have been used? Which queries are used by the system in order to retrieve
the data? In general
, all technical details of which you think they show your ability to use the material
that was lectured.

(research) Which papers have been read? Give a short summary in your own words. List the claims
that have been tested. Explain the experimental setup.

Discuss the outcome. There are no strict lower
or upper bounds on the length of this document, but a good guideline is around 15 pages, including
illustrative queries, XML fragments, screen shots, etc. Please note that not only completeness but
certainly
also succinctness is an important quality for a good report.


Written test
:

The written test is a testing assignment based on the covered topics.


V.
Topics for course
final
quality assessment


Exam

topics
:



Expressivenes of relational algebra.



Limitations of relational query languages.



Gaifman locality.



Nested relational algebra.



Extending SQL with recursion.



Mapping objects to tuples in relations.



Extending SQL with complex types: collections, structures, inheritance, references.



Persistent programming languages.



Query languages for object
-
oriented databases.



Object
-
relational
mapper
.



Intensional and extensional relations.



Semantics of the Datalog program.



Notion of s
afety of
datalog
rule.



Model
-
theoretic semantics.



Recursive

Datalog programs.



Fixpoint evaluation.



Stratified programs and strata.



Evaluation and optimization of Datalog programs: avoiding repeated and unnecessary
inferences
.



Evaluation and optimization of Datalog programs:

filtering with “magic sets”.

Course «Advanced Databases» Syllabus
[ 2011
-
2012

ac. year

]

---

page

14



Evaluation and optimization of Datalog programs: indexing

and

materialization.



Requirements to data management from decision support systems.



Extract
-
transform
-
load process.



Conceptual models for decision support.



Multidimensional view on the data.



Ope
rations with data cubes: roll
-
up, drill
-
down, pivot, slice & dice, select.



Query languages for supporting OLAP.



SQL extensions: Group by cube, group by rollup.



Multidimensional expressions (MDX).



View materialization: optimal set of
views, p
artial order

on views, cost model, greedy
algorithm.



Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation.



Multi
-
dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense
dimensions.



Indexing of dimensions: bitmap

indexes.



Ind
exing of dimensions:


join indexes.



Hybrid OLAP (HOLAP).



XML: tags, elements, attributes, values.



Well
-
formed and valid documents.



XPath: axes, node
-
tests, predicates.



XQuery expressions
.



XQuery
functions.



FLWOR expressions.



XQuery data model.



DTD
.



XML Schema
: s
imple and complex types.



Extensible Stylesheet Language Transformations (XSLT): templates, parameters, variables.


The author of program: _____________________ A.D.Breyman