Databases & Data Mining

basesprocketData Management

Oct 31, 2013 (3 years and 8 months ago)

113 views

Databases & Data Mining

CPS 181s

April 3, 2003

Databases in eCommerce

The move to eCommerce is in
part driven by the ability to gather
data that benefits the business

What is a Database?

What is a Database?


A system that stores data


“Persistent”
-

exists beyond immediate
use


Centralized storage


Single or multiple users


Advantages


Reduces redundancy


Reduces inconsistency


Shared


Data representations standards can be
enforced


Enables security restrictions

More Advantages


Integrity maintained


Valid cross
-
references between records


Allows data
-
independent applications


Applications ignorant of how data is stored

DBMS


Database Management System


Examples


Oracle


IBM DB2


Microsoft SQL Server


Sybase


MySQL

DBMS Features


Optimizes queries


Manage memory


Control concurrent data access

Client
-
Server Architecture


2
-
Tier architecture


Client


Application server &
DBMS


Advantages


Rapid development


Mature tools


Less network traffic

Server

(Data Access)

Client

(User Interface)

Business Rules

Client
-
Server Architecture


3
-
tier architecture


Client


Applications


DBMS


Database


Advantages


Distributed processing


Replication


Update multiple DBMS’s


Variety of data sources


Attach transaction priorities


Robust security

Database

Client

Web Server

DBMS

HTTP

URL

HTML

Data

Why Construct an

eCommerce Database?


Time pressure of new economy
business


Pace of data acquisition


Continuous quality improvement


Cost containment


Competitive advantage

eCommerce Data Systems


The collection, analysis, and discerning
interpretation of data are essential for

e
-
business to survive and flourish


A well designed data system can:



Increase market reach


Ensure regulatory compliance


Serve business processes


Help efficient use of resources


Spot emerging trends


Improve customer relations (CRM)

Database Technologies


Static webpages


HTML


Dynamic webpages


Client
-
side scripts (JavaScript)


Server
-
side includes (SSI markers)


Server
-
side scripts (JSP, CGI, ASP, PHP)


Database Construction
Criteria


Flexibility and power


Developer expertise required


Development and testing time


Adaptability to change


Life
-
cycle costs


Operational risks


CPU overhead (computing resources
consumed)


Compatibility


Types of Databases


Flat
-
file


Relational


Object
-
Oriented


Hybrid


Flat
-
File Database


Spreadsheets


Use columns and rows to organize small
pieces of data into lists called tables


No metadata


Lname

Fname

Age

Salary

Employ
Date

Employ
number

Nelson

Williams

45

$5000

6/1/89

0001

Fulcher

Cleo

50

$4500

11/30/89

0002

Fields

Records

(tuples)

Relational Database


Relations are two
-
dimensional data


Reduce data redundancy, duplication of
effort, and storage space


Increase speed and versatility


Microsoft Access, IBM DB2, Oracle,
Microsoft SQL Server, MySQL

Relational Database

Lname

Fname

Age

Salary

Employ
Date

Employ
number

Nelson

Williams

45

$5000

6/1/89

0001

Fulcher

Cleo

50

$4500

11/30/89

0002

HR Table

Dept.

email

Team
Member

Team
Position

Employ
number

3

Nelson@
email.com

yes

Pitcher

0001

1

Fulcher@
email.com

no

0002

Softball Team Table

Key Field

Object
-
Oriented Database


Data assigned to categories called
classes


Each piece of data is an object


Limited query capabilities, but handle
non
-
text data well because enables the
creation of new data types


Store binary large objects efficiently

Hybrid Database


Object
-
relational systems


Handle both text and non
-
text data well


Thin object layer above the relational
structures

What Can be Learned by Data
Mining (patterns in large data)?


Characterization
-

sum characteristics


E.g.
-

traffic over lunch


Prediction
-

value of attribute based on relation to
other attributes


E.g.
-

book orders based on location on Amazon’s
welcome page


Class comparison
-

discover discrimination rules


E.g.
-

comparison of search engine results


What Can be Learned by Data
Mining?…….


Association rules
-

one pattern implies another


E.g.
-

Lunch traffic and Dilbert site hits


Classification
-

learning models


E.g.
-

learn to recognize “fence sitters” and offer them a
coupon


Time Series Analysis


E.g.
-

users who do X and then Y, usually do Z next

Web Mining


Web servers have ability to log all requests


Generate vast amounts of data
-

www.privacy.net/anonymizer


Benefits of web log analysis


Facilitates personalization/adaptive sites


Learn about users


Improve site design


Predict user’s actions (allows prefetching)


Fraud/intrusion detection