Download - Courses - University of California, Berkeley

townripeData Management

Jan 31, 2013 (4 years and 8 months ago)

247 views


2008.11.04 SLIDE
1



IS 257


Fall 2008


JDBC and Java Access to DBMS

&

Introduction to Data Warehouses

University of California, Berkeley

School of Information

IS 257: Database Management


2008.11.04 SLIDE
2



IS 257


Fall 2008


Lecture Outline


Review:


Object
-
Relational DBMS


OR features in Oracle


OR features in PostgreSQL


Extending OR databases (examples from
PostgreSQL)


Java and JDBC


Introduction to Data Warehouses


2008.11.04 SLIDE
3



IS 257


Fall 2008


Lecture Outline


Object
-
Relational DBMS


OR features in Oracle


OR features in PostgreSQL


Extending OR databases (examples from
PostgreSQL)


Java and JDBC


Introduction to Data Warehouses


2008.11.04 SLIDE
4



IS 257


Fall 2008


Object Relational Data Model


Class, instance, attribute, method, and
integrity constraints


OID per instance


Encapsulation


Multiple inheritance hierarchy of classes


Class references via OID object
references


Set
-
Valued attributes


Abstract Data Types



2008.11.04 SLIDE
5



IS 257


Fall 2008


Object Relational Extended SQL (Illustra)


CREATE TABLE tablename {OF TYPE
Typename}|{OF NEW TYPE typename}
(attr1 type1, attr2 type2,…,attrn typen)
{UNDER parent_table_name};


CREATE TYPE typename (attribute_name
type_desc, attribute2 type2, …, attrn
typen);


CREATE FUNCTION functionname
(type_name, type_name) RETURNS
type_name AS sql_statement


2008.11.04 SLIDE
6



IS 257


Fall 2008


Object
-
Relational SQL in ORACLE


CREATE (OR REPLACE) TYPE
typename AS OBJECT (attr_name,
attr_type, …);



CREATE TABLE OF typename;


2008.11.04 SLIDE
7



IS 257


Fall 2008


Example


CREATE TYPE ANIMAL_TY AS OBJECT
(Breed VARCHAR2(25), Name
VARCHAR2(25), Birthdate DATE);


Creates a new type


CREATE TABLE Animal of Animal_ty;


Creates “Object Table”


2008.11.04 SLIDE
8



IS 257


Fall 2008


Constructor Functions


INSERT INTO Animal values
(ANIMAL_TY(‘Mule’, ‘Frances’,
TO_DATE(‘01
-
APR
-
1997’, ‘DD
-
MM
-
YYYY’)));


Insert a new ANIMAL_TY object into the
table


2008.11.04 SLIDE
9



IS 257


Fall 2008


PostgreSQL Classes


The fundamental notion in Postgres is that of a
class, which is a named collection of object
instances. Each instance has the same
collection of named attributes, and each attribute
is of a specific type. Furthermore, each instance
has a permanent object identifier (OID) that is
unique throughout the installation. Because SQL
syntax refers to tables, we will use the terms
table and class interchangeably. Likewise, an
SQL row is an instance and SQL columns are
attributes.


2008.11.04 SLIDE
10



IS 257


Fall 2008


Creating a Class


You can create a new class by specifying the
class name, along with all attribute names and
their types:


CREATE TABLE weather (


city varchar(80),


temp_lo int,
--

low temperature


temp_hi int,
--

high temperature


prcp real,
--

precipitation


date date

);


2008.11.04 SLIDE
11



IS 257


Fall 2008


PostgreSQL


Postgres can be customized with an arbitrary
number of user
-
defined data types.
Consequently, type names are not syntactical
keywords, except where required to support
special cases in the SQL92 standard.


So far, the Postgres CREATE command looks
exactly like the command used to create a table
in a traditional relational system. However, we
will presently see that classes have properties
that are extensions of the relational model.


2008.11.04 SLIDE
12



IS 257


Fall 2008


Inheritance

CREATE TABLE cities (


name text,


population float,


altitude int
--

(in ft)

);


CREATE TABLE capitals (


state char(2)

) INHERITS (cities);




2008.11.04 SLIDE
13



IS 257


Fall 2008


Inheritance


In Postgres, a class can inherit from zero
or more other classes.


A query can reference either


all instances of a class


or all instances of a class plus all of its
descendants


2008.11.04 SLIDE
14



IS 257


Fall 2008


Non
-
Atomic Values
-

Arrays


The preceding SQL command will create a class
named SAL_EMP with a text string (name), a
one
-
dimensional array of int4 (pay_by_quarter),
which represents the employee's salary by
quarter and a two
-
dimensional array of text
(schedule), which represents the employee's
weekly schedule


Now we do some INSERTSs; note that when
appending to an array, we enclose the values
within braces and separate them by commas.


2008.11.04 SLIDE
15



IS 257


Fall 2008


PostgreSQL Extensibility


Postgres is extensible because its operation is catalog
-
driven


RDBMS store information about databases, tables, columns,
etc., in what are commonly known as system catalogs. (Some
systems call this the data dictionary).


One key difference between Postgres and standard
RDBMS is that Postgres stores much
more

information
in its catalogs


not only information about tables and columns, but also
information about its types, functions, access methods, etc.


These classes can be modified by the user, and since
Postgres bases its internal operation on these classes,
this means that Postgres can be extended by users


By comparison, conventional database systems can only be
extended by changing hardcoded procedures within the DBMS
or by loading modules specially
-
written by the DBMS vendor.


2008.11.04 SLIDE
16



IS 257


Fall 2008


Rules System


CREATE RULE name AS ON event


TO object [ WHERE condition ]


DO [ INSTEAD ] [ action | NOTHING ]



Rules can be triggered by any event
(select, update, delete, etc.)



2008.11.04 SLIDE
17



IS 257


Fall 2008


Views as Rules


Views in Postgres are implemented using the
rule system. In fact there is absolutely no
difference between a


CREATE VIEW myview AS SELECT * FROM
mytab;


compared against the two commands

CREATE TABLE myview (same attribute list as
for mytab);

CREATE RULE "_RETmyview" AS ON SELECT
TO myview DO INSTEAD


SELECT * FROM mytab;


2008.11.04 SLIDE
18



IS 257


Fall 2008


Extensions to Indexing


Access Method extensions in Postgres


GiST: A Generalized Search Trees


Joe Hellerstein, UC Berkeley



2008.11.04 SLIDE
19



IS 257


Fall 2008


Indexing in OO/OR Systems


Quick access to user
-
defined objects


Support queries natural to the objects


Two previous approaches


Specialized Indices (“ABCDEFG
-
trees”)


redundant code: most trees are very similar


concurrency control, etc. tricky!


Extensible B
-
trees & R
-
trees
(Postgres/Illustra)


B
-
tree or R
-
tree lookups only!


E.g. ‘WHERE movie.video < ‘Terminator 2’


2008.11.04 SLIDE
20



IS 257


Fall 2008


GiST Approach


A generalized search tree. Must be:


Extensible in terms of queries


General (B+
-
tree, R
-
tree, etc.)


Easy to extend


Efficient (match specialized trees)


Highly concurrent, recoverable, etc.


2008.11.04 SLIDE
21



IS 257


Fall 2008


GiST Applications


New indexes needed for new apps...


find all supersets of S


find all molecules that bind to M


your favorite query here (multimedia?)


...and for new queries over old domains:


find all points in region from 12 to 2 o’clock





find all text elements estimated relevant to a query
string


2008.11.04 SLIDE
22



IS 257


Fall 2008


Lecture Outline


Review


Object
-
Relational DBMS


OR features in Oracle


OR features in PostgreSQL


Extending OR databases (examples from
PostgreSQL)


Java and JDBC


Introduction to Data Warehouses


2008.11.04 SLIDE
23



IS 257


Fall 2008


Java

and

JDBC


Java is probably the high
-
level language
used in instruction and development today
one of the earliest “enterprise” additions to
Java was JDBC


JDBC is an API that provides a mid
-
level
access to DBMS from Java applications


Intended to be an open cross
-
platform
standard for database access in Java


Similar in intent to Microsoft’s ODBC


2008.11.04 SLIDE
24



IS 257


Fall 2008


JDBC Architecture


The goal of JDBC is to be a generic SQL
database access framework that works for
any database system with no changes to
the interface code

Oracle

MySQL

Postgres

Java Applications

JDBC API

JDBC Driver Manager

Driver

Driver

Driver


2008.11.04 SLIDE
25



IS 257


Fall 2008


JDBC


Provides a standard set of interfaces for
any DBMS with a JDBC driver


using
SQL to specify the databases operations.

Resultset

Statement

Resultset

Resultset

Connection

PreparedStatement

CallableStatement

DriverManager

Oracle Driver

ODBC Driver

Postgres Driver

Oracle DB

Postgres DB

ODBC DB

Application



2008.11.04 SLIDE
26



IS 257


Fall 2008


JDBC Simple Java Implementation

import java.sql.*;

import oracle.jdbc.*;


public class JDBCSample {



public static void main(java.lang.String[] args) {



try {



// this is where the driver is loaded



//Class.forName("jdbc.oracle.thin");



DriverManager.registerDriver(new OracleDriver());



}


catch (SQLException e) {



System.out.println("Unable to load driver Class");



return;


}


2008.11.04 SLIDE
27



IS 257


Fall 2008


JDBC Simple Java Impl.



try {



//All DB access is within the try/catch block...



// make a connection to ORACLE on Dream



Connection con = DriverManager.getConnection(


"jdbc:oracle:thin:@dream.sims.berkeley.edu:1521:dev",






“mylogin", “myoraclePW");



// Do an SQL statement...



Statement stmt = con.createStatement();



ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");





2008.11.04 SLIDE
28



IS 257


Fall 2008


JDBC Simple Java Impl.


// show the Results...



while(rs.next()) {



System.out.println(rs.getString("NAME"));



}






// Release the database resources...



rs.close();



stmt.close();



con.close();


}


catch (SQLException se) {



// inform user of errors...



System.out.println("SQL Exception: " + se.getMessage());



se.printStackTrace(System.out);


}


}

}


2008.11.04 SLIDE
29



IS 257


Fall 2008


JDBC


Once a connection has been made you
can create three different types of
statement objects


Statement


The basic SQL statement as in the example


PreparedStatement


A pre
-
compiled SQL statement


CallableStatement


Permits access to stored procedures in the
Database


2008.11.04 SLIDE
30



IS 257


Fall 2008


JDBC Resultset methods


Next() to loop through rows in the resultset


To access the attributes of each row you
need to know its type, or you can use the
generic “getObject()” which wraps the
attribute as an object


2008.11.04 SLIDE
31



IS 257


Fall 2008


JDBC “GetXXX()” methods

SQL data type

Java Type

GetXXX()

CHAR

String

getString()

VARCHAR

String

getString()

LONGVARCHAR

String

getString()

NUMERIC

Java.math.
BigDecimal

GetBigDecimal()

DECIMAL

Java.math.
BigDecimal

GetBigDecimal()


BIT

Boolean

getBoolean()

TINYINT

Byte

getByte()


2008.11.04 SLIDE
32



IS 257


Fall 2008


JDBC GetXXX() Methods

SQL data type

Java Type

GetXXX()

SMALLINT

Integer (short)

getShort()

INTEGER

Integer

getInt()

BIGINT

Long

getLong()

REAL

Float

getFloat()

FLOAT

Double

getDouble()

DOUBLE

Double

getDouble()

BINARY

Byte[]

getBytes()

VARBINARY

Byte[]

getBytes()

LONGVARBINARY

Byte[]

getBytes()


2008.11.04 SLIDE
33



IS 257


Fall 2008


JDBC GetXXX() Methods

SQL data
type

Java Type

GetXXX()

DATE

java.sql.Date

getDate()

TIME

java.sql.Time

getTime()

TIMESTAMP

Java.sql.Timestamp

getTimeStamp()


2008.11.04 SLIDE
34



IS 257


Fall 2008


Large Object Handling


Large binary databytes can be read from a
resultset as streams using:


getAsciiStream()


getBinaryStream()


getUnicodeStream()

ResultSet rs = stmt.executeQuery(“SELECT IMAGE FROM PICTURES WHERE


PID = 1223”));

if (rs.next()) {


BufferedInputStream gifData = new BufferedInputSteam(


rs.getBinaryStream(“IMAGE”));



byte[] buf = new byte[4*1024]; // 4K buffer


int len;


while ((len = gifData.read(buf,0,buf.length)) !=
-
1) {



out.write(buf, 0, len);


}

}


2008.11.04 SLIDE
35



IS 257


Fall 2008


JDBC Metadata


There are also methods to access the
metadata associated with a resultSet


ResultSetMetaData rsmd = rs.getMetaData();


Metadata methods include…


getColumnCount();


getColumnLabel(col);


getColumnTypeName(col)


2008.11.04 SLIDE
36



IS 257


Fall 2008


JDBC access to MySQL


The basic JDBC interface is the same, the
only differences are in how the drivers are
loaded



public class JDBCTestMysql {


public static void main(java.lang.String[] args) {


try {



// this is where the driver is loaded



Class.forName("com.mysql.jdbc.Driver").newInstance();


}


catch (InstantiationException i) {



System.out.println("Unable to load driver Class");



return;


}


catch (ClassNotFoundException e) {



System.out.println("Unable to load driver Class"); …




2008.11.04 SLIDE
37



IS 257


Fall 2008


JDBC for MySQL



try {



//All DB access is within the try/catch block...



// make a connection to MySQL on Dream



Connection con = DriverManager.getConnection(


"jdbc:mysql://localhost/


(this is really one line)

MyDatabase?user=MyLogin&password=MySQLPW");



// Do an SQL statement...



Statement stmt = con.createStatement();



ResultSet rs = stmt.executeQuery("SELECT NAME FROM DIVECUST");












Otherwise everything is the same as in the
Oracle example


For connecting to the machine you are running
the program on, you can use “localhost” instead
of the machine name


2008.11.04 SLIDE
38



IS 257


Fall 2008


Demo


JDBC for MySQL


Demo of JDBC code on Harbinger


Code will be made available on class web
site



2008.11.04 SLIDE
39



IS 257


Fall 2008


Lecture Outline


Review


Object
-
Relational DBMS


OR features in Oracle


OR features in PostgreSQL


Extending OR databases (examples from
PostgreSQL)


Java and JDBC


Introduction to Data Warehouses


2008.11.04 SLIDE
40



IS 257


Fall 2008


Overview


Data Warehouses and Merging
Information Resources


What is a Data Warehouse?


History of Data Warehousing


Types of Data and Their Uses




2008.11.04 SLIDE
41



IS 257


Fall 2008


Problem: Heterogeneous Information Sources

“Heterogeneities are
everywhere”


Different interfaces


Different data representations


Duplicate and inconsistent information

Personal

Databases

Digital Libraries

Scientific Databases

World

Wide

Web

Slide credit: J. Hammer


2008.11.04 SLIDE
42



IS 257


Fall 2008


Problem: Data Management in Large Enterprises


Vertical fragmentation of informational
systems (vertical stove pipes)


Result of application (user)
-
driven
development of operational systems

Sales Administration

Finance

Manufacturing

...

Sales Planning

Stock Mngmt

...

Suppliers

...

Debt Mngmt

Num. Control

...

Inventory

Slide credit: J. Hammer


2008.11.04 SLIDE
43



IS 257


Fall 2008


Goal: Unified Access to Data

Integration System


Collects and combines information


Provides integrated view, uniform user interface


Supports sharing

World

Wide

Web

Digital Libraries

Scientific Databases

Personal

Databases

Slide credit: J. Hammer


2008.11.04 SLIDE
44



IS 257


Fall 2008


The Traditional Research Approach

Source

Source

Source

. . .

Integration System

. . .

Metadata

Clients

Wrapper

Wrapper

Wrapper


Query
-
driven (lazy, on
-
demand)

Slide credit: J. Hammer


2008.11.04 SLIDE
45



IS 257


Fall 2008


Disadvantages of Query
-
Driven Approach


Delay in query processing


Slow or unavailable information sources


Complex filtering and integration


Inefficient and potentially expensive for
frequent queries


Competes with local processing at sources


Hasn’t caught on in industry

Slide credit: J. Hammer


2008.11.04 SLIDE
46



IS 257


Fall 2008


The Warehousing Approach

Data

Warehouse

Clients

Source

Source

Source

. . .

Extractor/

Monitor

Integration System

. . .

Metadata

Extractor/

Monitor

Extractor/

Monitor


Information
integrated in
advance


Stored in WH
for direct
querying and
analysis

Slide credit: J. Hammer


2008.11.04 SLIDE
47



IS 257


Fall 2008


Advantages of Warehousing Approach


High query performance


But not necessarily most current information


Doesn’t interfere with local processing at
sources


Complex queries at warehouse


OLTP at information sources


Information copied at warehouse


Can modify, annotate, summarize, restructure, etc.


Can store historical information


Security, no auditing


Has

caught on in industry

Slide credit: J. Hammer


2008.11.04 SLIDE
48



IS 257


Fall 2008


Not Either
-
Or Decision


Query
-
driven approach still better for


Rapidly changing information


Rapidly changing information sources


Truly vast amounts of data from large
numbers of sources


Clients with unpredictable needs

Slide credit: J. Hammer


2008.11.04 SLIDE
49



IS 257


Fall 2008


Data Warehouse Evolution

TIME

2000

1995

1990

1985

1980

1960

1975

Information
-

Based

Management

Data

Revolution

“Middle

Ages”

“Prehistoric

Times”

Relational

Databases

PC’s and

Spreadsheets

End
-
user

Interfaces

1st DW

Article

DW

Confs.

Vendor DW

Frameworks

Company

DWs

“Building the

DW”

Inmon (1992)

Data Replication

Tools

Slide credit: J. Hammer


2008.11.04 SLIDE
50



IS 257


Fall 2008


What is a Data Warehouse?


A Data Warehouse is a


subject
-
oriented,


integrated,


time
-
variant,


non
-
volatile

collection of data used in support of
management decision making
processes
.”

--

Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11


2008.11.04 SLIDE
51



IS 257


Fall 2008


DW Definition…


Subject
-
Oriented:


The data warehouse is organized around the
key subjects (or high
-
level entities) of the
enterprise. Major subjects include


Customers


Patients


Students


Products


Etc.


2008.11.04 SLIDE
52



IS 257


Fall 2008


DW Definition…


Integrated


The data housed in the data warehouse are
defined using consistent


Naming conventions


Formats


Encoding Structures


Related Characteristics


2008.11.04 SLIDE
53



IS 257


Fall 2008


DW Definition…


Time
-
variant


The data in the warehouse contain a time
dimension so that they may be used as a
historical record of the business



2008.11.04 SLIDE
54



IS 257


Fall 2008


DW Definition…


Non
-
volatile


Data in the data warehouse are loaded and
refreshed from operational systems, but
cannot be updated by end
-
users


2008.11.04 SLIDE
55



IS 257


Fall 2008


What is a Data Warehouse?

A Practitioners Viewpoint


“A data warehouse is simply a single,
complete, and consistent store of data
obtained from a variety of sources and
made available to end users in a way they
can understand and use it in a business
context.”


--

Barry Devlin, IBM Consultant

Slide credit: J. Hammer


2008.11.04 SLIDE
56



IS 257


Fall 2008


A Data Warehouse is...


Stored collection of diverse data


A solution to data integration problem


Single repository of information


Subject
-
oriented


Organized by subject, not by application


Used for analysis, data mining, etc.


Optimized differently from transaction
-
oriented db


User interface aimed at executive decision
makers and analysts


2008.11.04 SLIDE
57



IS 257


Fall 2008


… Cont’d


Large volume of data (Gb, Tb)


Non
-
volatile


Historical


Time attributes are important


Updates infrequent


May be append
-
only


Examples


All transactions
ever

at WalMart


Complete client histories at insurance firm


Stockbroker financial information and portfolios

Slide credit: J. Hammer


2008.11.04 SLIDE
58



IS 257


Fall 2008


Warehouse is a Specialized DB

Standard DB


Mostly updates


Many small transactions


Mb
-

Gb of data


Current snapshot


Index/hash on p.k.


Raw data


Thousands of users (e.g.,
clerical users)

Warehouse


Mostly reads


Queries are long and
complex


Gb
-

Tb of data


History


Lots of scans


Summarized, reconciled
data


Hundreds of users (e.g.,
decision
-
makers,
analysts)

Slide credit: J. Hammer


2008.11.04 SLIDE
59



IS 257


Fall 2008


Summary

Operational Systems

Enterprise

Modeling

Business

Information Guide

Data

Warehouse

Catalog

Data Warehouse

Population

Data

Warehouse

Business Information

Interface

Slide credit: J. Hammer


2008.11.04 SLIDE
60



IS 257


Fall 2008


Warehousing and Industry


Warehousing is big business


$2 billion in 1995


$3.5 billion in early 1997


Predicted: $8 billion in 1998 [Metagroup]


Wal
-
Mart is said to have the largest warehouse


1000
-
CPU, 583 Terabyte, Teradata system
(InformationWeek, Jan 9, 2006)


“Half a Petabyte” in warehouse (Ziff Davis Internet,
October 13, 2004)


1 billion rows of data or more are updated every day
(InformationWeek, Jan 9, 2006)


Some Government and Scientific database are larger,
however

Slide credit: J. Hammer


2008.11.04 SLIDE
61



IS 257


Fall 2008


Other Large Data Warehouses


Not including Wal
-
Mart and Ebay


(InformationWeek, Jan 9, 2006)


2008.11.04 SLIDE
62



IS 257


Fall 2008


Types of Data


Business Data
-

represents meaning


Real
-
time data (ultimate source of all business data)


Reconciled data


Derived data


Metadata
-

describes meaning


Build
-
time metadata


Control metadata


Usage metadata


Data as a product*
-

intrinsic meaning


Produced and stored for its own intrinsic value


e.g., the contents of a text
-
book

Slide credit: J. Hammer


2008.11.04 SLIDE
63



IS 257


Fall 2008


Next Time


More on Data Warehouses


Introduction to data mining