Using Databases in ICS4U

carenextΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

88 εμφανίσεις

Using Databases in ICS4U

Grant Hutchison

TDSB

Agenda


Motivation


Why teach Database Technologies?


Teaching DBMS using Apache Derby?


Alternative Data Technologies


Big Data /
NoSQL

/ R

Motivation

There is a Data Explosion Occurring…



and is
NOT

about to slowdown!!


Volume

Velocity

Variety


Where is the data coming from?

and many more…

Database Systems (K
-
12)


Non
-
CS Subjects


Limited to MS Access, Excel for many



Computer Science


secondary


primarily data structures / objects and files


Key/Value based


Python dictionaries, App Inventor (
TinyDB
,
TinyWebDB
)


Big Data module has been added to revised IB CS Update

Database Systems Post Secondary

Non
-
CS


Business / Math (statistics/analytics)


Sciences (Bioinformatics)


Computer Science


College


Database courses often integrated


University


Undergraduate


electives


Graduate program speciality


Curriculum Integration


ICS4U

A3.1 demonstrate the ability to read from, and
write to, an external file (e.g., text file, binary
file,
database
, XML file) from within a
computer program
;

Pre
-
DB Units in ICS$U


Object
-
Oriented Design


UML,
CRC


Software Engineering Principles


Implementation


Java
, C++, Python, C#
-

typical languages


Algorithms


design : sort, search


analysis : predict and test runtime expectations



Data


Files
:

reading, writing, parsing (basics only)


Structures : Lists, Stacks, Queues, Trees (basics only)


Scalable data serving components : RDBMS,
NoSQL
, etc…


Data (Preservation and Access)

File based solutions

Objects

can be stored in
flat files
where each
line of data represents an instance of an
object
.



Problem

: data (a simple order)

Bob, Smith, 2 Main Street, Toronto, G,
"Jan 15, 2012",C, Hammer, 2, 9.25, Lumber
(4ft), 3, 10.85, .13, 57.69

When to use a Database in
student projects?


Applications that need to :


store

and
retrieve

data quickly and reliability
while users interact with the provided user
interface (UI).


Problems with simply using Files
for Objects


no query based access


highly structured

Object Design

Would you design a Class with these attributes?


cust_name


cust_street


cust_city


order_num


item_num


description


cost


quantity


total



Aggregate

Customer

Order

Item

Relational Database



1980

New and Hot

Quick adoption

1990

Defacto

Standard DBMS

Object
Relational

Object Oriented
DBMS

Parallel
RDBMS

2000

Relational /
XML Hybrid
DB

Scalable
Transactions
(OLTP)

Data warehouse

In Memory DBs

Open Source
DBs

2010
++

Big Data


Volume


Velocity


Variety

NoSQL

Analytical DBs

Introduction to SQL

Declarative

programming language designed
for querying and managing data stored in
RDBMS (Relational DBMS).


Declarative ?

Describe the
what

and not the
how

!!


SQL Query


SELECT

<attributes>


FROM

<one or more relations>


WHERE

<conditions>


Not case
-
sensitive


Must know the schema of the database


Tools usually used to manage


Entity
-
Relationship Diagrams (ER Diagrams)

SQL

Data Definition Language (DDL)


Create/alter/delete tables and their
attributes


Data Manipulation Language (
DML)


Access one
or more
tables


Insert/delete/modify
tuples/records
in tables

CRUD


SQL operations

Operation

SQL
statement

C

Create

INSERT

R

Read

SELECT

U

Update

UPDATE

D

Delete

DELETE

SCHEMA
-

Table Structure

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

PRODUCT

CREATE TABLE
PRODUCT

(


DESCRIPTION

VARCHAR(40),
PRICE

DECIMAL(9,2),

CATEGORY

VARCHAR (40),
MANUFACTURER

VARCHAR (40))

Table
Name

Attribute /
Column

Data type

Table Structure

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

Hammer

$19.99

Hand Tools

Stanley

Lumber (4
ft
)

$29.99

Raw Materials

ForestWorld

PRODUCT

INSERT INTO
PRODUCT

VALUES (


"Hammer", 19.99, "Hand Tools", "Stanley")


INSERT INTO
PRODUCT

VALUES (


"Lumber (4
ft
) ", 29.99
,
"Raw Materials",

"
ForestWorlds
")

Tables Explained

The
schema

of a table is the table name and its
attributes.


A
key

is an
attribute whose values are
used to
ensure uniqueness of the tuple

or assist in
defining relationships with out tables.

Data Types in SQL

Strings


CH
AR, VARCHAR, LONG VARCHAR, CLOB

Numeric


SMALLINT, INT, BIGINT, DECIMAL, REAL,
FLOAT,DOUBLE

Dates and Times


DATE, TIME, TIMESTAMP

Binary


CHAR FOR BIT DATA, VARCHAR FOR BIT
DATA,
LONG VARCHAR FOR BIT DATA,
BLOB

SQL Query


SELECT

<attributes>


FROM

<one or more relations>


WHERE

<conditions>


Not case
-
sensitive


Must know the schema of the database


Tools usually used to manage


Entity
-
Relationship Diagrams (ER Diagrams)

Simple SQL Query

SELECT

*

FROM

PRODUCT

WHERE

CATEGORY="Hand Tools"

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

Hammer

$19.99

Hand Tools

Stanley

Lumber (4
ft
)

$29.99

Raw Materials

ForestWorld

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

Hammer

$19.99

Hand Tools

Stanley

Simple SQL Query

SELECT

DESCRIPTION

FROM

PRODUCT

WHERE

PRICE >

20

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

Hammer

$19.99

Hand Tools

Stanley

Lumber (4
ft
)

$29.99

Raw Materials

ForestWorld

DESCRIPTION

PRICE

CATEGORY

MANUFACTURER

Lumber (4
ft
)

$29.99

Raw Materials

ForestWorld

The
LIKE

operator

% = any sequence of characters

_ = any single character

SELECT

*

FROM

PRODUCTS

WHERE


DESCRIPTION
LIKE

‘%Lumber%’

Ordering the Results

SELECT

DESCRIPTION, PRICE

FROM

PRODUCT

WHERE

CATEGORY=‘Hand Tools’
AND price >
20

ORDER BY

PRICE


Ordering is ascending, unless you
append the
DESC

keyword.

Process of designing tables to avoid data
duplication and maintain consistency.

Normalization



1NF


Remove multiple items for each tuple/row

2NF


Remove duplicate data for each tuple/row

3NF


Ensure attributes are dependant on the key
and nothing but the key

4NF


Not usually needed

Object Design

Would you design a Class with these attributes?


cust_name


cust_street


cust_city


order_num


item_num


description


cost


quantity


total



Aggregate

Customer

Order

Item

Joins



CUST (
cid
,
fname
,
lname
, street, city, type)

ORDER (
oid
,
cid
,
o_date
, status,
tax_r
, total )

LINEITEM (
oid
,
iid
,
prod_id
, quantity)

PRODUCT (
pid
, description, price)




Bob, Smith, 2
Main,
Toronto, G,

"Jan 15, 2012",C, .13,
57.69,

"Hammer", 2, 9.25,

"Lumber (4ft)", 3, 10.85


Complex Joins


Answer Questions

CUST (
cid
, name, street, city, type)

ORDER (
oid
,
cid
,
o_date
, status,
tax_r
, total )

LINEITEM (
oid
,
iid
,
prod_id
, quantity)

PRODUCT (
pid
, description, price)




SELECT

avg
(
o.total
)

FROM


CUST c, ORDER o

WHERE


c.type

= "P" AND



c.cid

=
o.cid

AND



YEAR(
o.o_date
) = "2012
"


Find the Average Order for preferred customers in 2012.

SQL Aggregation Functions

SELECT

count(*)

FROM

Product

WHERE

year > 1995

SELECT

avg(price)

FROM

Product

WHERE

maker=“Toyota”

There are many more aggregation functions.

Simple Aggregations

Purchase

Product

Date

Price

Quantity

Bagel

10/21

1

20

Banana

10/3

0.5

10

Banana

10/10

1

10

Bagel

10/25

1.50

20

SELECT

Sum(price * quantity)

FROM

Purchase

WHERE

product = ‘bagel’

50 (= 20+30)

Deletions

DELETE FROM

CUST

WHERE

name = "Hutchison"

Careful as DELETE could remove:



NO ROWS


ONE ROW


ALL ROWS

Updates

UPDATE


PRODUCT

SET



price
=
price

* .5

WHERE


category = "Hand Tools"

Apache Derby

Apache derby is an open source, small
footprint, SQL standards
-
based Java relational
database
engine.


How Small?


only 2 MB (derby.jar)

History of

Cloudscape / Apache Derby

1997

Cloudscape
Inc.

1999

Acquired by
Informix

2001

Acquired by
IBM

2004

Open Source
Apache
Derby

Derby Modes

Embedded or Networked

Networked Mode


The Derby server can be started in
its own JVM and then multiple
Java applications can use the same
database.

Connecting Java Apps to Derby


Add
derby.jar

to your
CLASSPATH


Use Standard JDBC APIs to access data


JDBC documentation is in the JDK

Tools for Derby

ij


interactive command line interface to Derby


good for SQL testing and script creation


NOTE: script lines must end with a semicolon

Eclipse plug
-
ins


Add Apache Derby nature to existing Java
projects and you are off and running…

Teaching With Derby


Use
Eclipse


Install Apache Derby Eclipse plug
-
ins


Provide
ij

scripts and sample JDBC source
code for students to explore


Use similar data files from


SORTING and ALGORITHM UNITS


Create indexes on queried columns and
compare results

Alternative Data Technologies

Big Data Systems


clustered file system
(
Hadoop
)


analysis usually using
MapReduce

programs

(not SQL)


Batch oriented


Analytics not transactions









NoSQL


document style of data
query and storage (JSON
usually)


scalable across servers


complex queries can be
challenging


Used primarily for web
apps


source: http://www.kdnuggets.com/2013/06/kdnuggets
-
annual
-
software
-
poll
-
rapidminer
-
r
-
vie
-
for
-
first
-
place.html

source:
http://www.nytimes.com/2009/01/07/technology/busines
s
-
computing/07program.html?pagewanted=all&_r=0


Statistical Analysis / Models


Almost Final Thoughts


Consider teaching a short unit on database
technologies in ICS4U


If you use Java then Apache Derby will be
simple to use (no install required)

Benefits go beyond learning SQL


Ethical


Data mining (companies, governments)


Legal


Software licensing


Open Source community (Apache) and
companies (IBM, Yahoo, Google, etc. )
working together.



Resources