CS 730R: Advanced Database Systems

wrendeceitInternet and Web Development

Oct 21, 2013 (4 years and 19 days ago)

89 views

CS 730R: Advanced Database Systems


and their application to biomedical problems

Spring 2011


Fusheng Wang



Center for Comprehensive Informatics

Challenges


New applications and data types:


Spatial data and temporal data


XML data over the Web


Complex biomedical data


Semantic Web and
ontologies


“Big data”


What’s the limitation of RDBMS and SQL? At what
extent can they can be extended?


What are the effective models, languages and
indexing methods to support new data types?


How to manage and integrate biomedical data?


Semantics enabled data management?


How to scale DBMSs (if possible) to manage and
query big data? How about MapReduce?


Center for Comprehensive Informatics

Course Introduction


The course covers recent advances in DBMSs and
their application to biomedical problems


Extensibility and extensions of database systems


XML database and
XQuery

language


Spatial databases and medical applications


Temporal data modeling and queries


Biomedical data management and integration


Semantics enabled data management


Parallel and distributed databases


Center for Comprehensive Informatics

Course Information


Schedule: MW 4:00
-
5:15pm


Prerequisites


Basic data structures and database background (CS 377)


Familiar with Java preferred but not required


Grading: Homework + Project (no exams)


Project driven


Course projects from biomedical research environments:
real data and databases


Students will be mentored on projects


IBM DB2 used for all projects


Successful completion will lead to publications




Center for Comprehensive Informatics

Projects


Optimization of spatial queries for large scale
biomedical database


Comparative study of parallel database and
MapReduce


Semantic enabled queries for an XML based
biomedical database


Design and develop a relational and spatial
database for Annotation and Image Markup
Standard (AIM)


Modeling and implementation of pathology image
database based on latest DICOM standard








Center for Comprehensive Informatics

Rotation Projects and Student Job


These projects could also be taken as rotation
projects



Student job opening: advanced biomedical
database research and development


Data modeling, database extensions, query
optimization, maintaining of the databases, and
scaling the database to
petabytes

of data with cluster
infrastructure


Center for Comprehensive Informatics


Course Wiki:


https://web.cci.emory.edu/confluence/display/CS730R




Questions?




Center for Comprehensive Informatics






Topics


Center for Comprehensive Informatics

Advanced SQL Queries and Database
Extensibility


Advanced SQL queries


OLAP queries


Recursive queries


Database extensibility


Object relational databases


User defined functions


Stored procedures and PL/SQL


Database extenders




Center for Comprehensive Informatics

XML Data Management


Introduction of XML


XML query languages:
XPath

and
XQuery


Native XML databases


XML data indexing methods


XML for biomedical applications



Center for Comprehensive Informatics

Spatial Data Management


Spatial logical models and query languages


Spatial access methods


Spatial joins


Spatial databases for biomedical imaging
applications



Center for Comprehensive Informatics

Temporal Data Management


The structure of time and temporal data types


Temporal logics


Temporal modeling and databases in XML


Temporal management of RFID data


Temporal modeling and reasoning for biomedical
applications



Center for Comprehensive Informatics

Semantic Data Modeling and Management


Overview of Semantic Web


Biomedical
ontologies


Metadata and common data elements


Use of
ontologies

and CDEs in biomedical data
integration



Center for Comprehensive Informatics

Biomedical Data Management and
Integration


Biomedical data management overview


SciPort: an extensible platform for biomedical data
management and integration


PAIS: developing data model standards and high
performance databases for analytical medical
imaging


Biomedical data integration overview


caGrid
-

cancer Biomedical Informatics Grid



Center for Comprehensive Informatics

Parallel and Distributed Databases


Introduction to parallel databases and distributed
databases


DB2 data partitioning


Overview of MapReduce


Integration of SQL with MapReduce