new_DecisSupp - Department of Computer Science and ...

lavishgradeSoftware and s/w Development

Nov 25, 2013 (3 years and 11 months ago)

63 views

Decision support
systems for E
-
commerce



Working Definition of DSS

A DSS is an integrated, interactive computer system,
consisting of analytical tools and information
management capabilities, designed to aid decision
makers in solving relatively large, unstructured problems



Decision Making samples



what were the sales volumes by region and product category
for the last year?



How did the share price of computer manufacturers correlate
with quarterly profits over the past 10 years?


Central Issue in DSS

support and improvement of decision making


Management Decision Making

Strategic


CEO, board of directors, top executives


Develop overall strategies of organization

Tactical


Regional managers, plant managers, division
supervisors


Carry out strategic managers plans

Operational


Direct managers, team leaders


Carry out tactical managers plans


Different Technologies are invented to
meet different Decision Making Goals!



The Big Picture: DBs, Data Warehouse,
& OLAP, Data Mining

Data

Warehouse

Extract

Transform

Load

Refresh

OLAP Engine

Analysis

Query

Reports

Data mining

Serve

Operational


DBs

other
sources

Data Storage

OLAP Server

Front
-
End Tools

Evolutionary Step

Technologies

Providers

Data Collection

(1960s)

Computers, tapes,
disks

IBM, CDC

Data Access

(1980s)

Relational
databases, SQL,
ODBC

Oracle, Sybase,
Informix, IBM,
Microsoft

Data Warehousing
& Decision Support
systems

(1990s)

On
-
line analytic
Processing (OLAP),
Multidimensional
databases (Cubes)

Cognos, Arbor,
Pilot, Microstrategy,
ORACLE, IBM

Data Mining

(Present)

Statistics, Machine
Learning, AI

SAS, SPSS, IBM,
ORACLE, Cognos,
Microsoft

Why Build a Data Warehouse?


Separate transactional and analysis systems :


to make
Tactical
or even

Strategic decisions
for
Regional managers or CEOs



Easy formulation of complex queries


Access to historical data (not in operational
systems)


Improved data quality (fewer errors and missing
values)


Access to data from multiple sources, have a
comprehensive data collection




Potential Applications of Data
Warehousing and Mining in EC

Analysis

of

user

access

patterns

and

buying

patterns

Customer segmentation and target marketing

Cross selling and improved Web advertisement

Personalization

Association (link) analysis

Customer classification and prediction

Time
-
series analysis


Typical event sequence and user behavior pattern
analysis

Transition and trend analysis

Data Warehousing

The phrase data warehouse was coined by
William Inmon in 1990

Data Warehouse is a decision support
database that is maintained separately from
the organization’s operational database

Definition: A DW is a repository of integrated
information from distributed, autonomous, and
possibly heterogeneous information sources
for query, analysis, decision support, and data
mining purposes

Characteristics (cont’d)

Integrated


No consistency in encoding, naming conventions,
… among different application
-
oriented data from
different legacy systems, different heterogeneous
data sources


When data is moved to the warehouse, it is
consolidated converted, and encoded

Characteristics (cont’d)

Non
-
volatile


New data is always appended to the
database, rather than replaced


The database continually absorbs new data,
integrating it with the previous data


In contrast, operational data is regularly
accessed and manipulated a record at a time
and update is done to data in the operational
environment

Characteristics (cont’d)

Time
-
variant



Operational database contain current value data.


Operational data is valid only at the moment of
access
-
capturing a moment in time.



The time horizon for the data warehouse is
significantly longer than that of operational systems.


Data warehouse data is nothing more than a
sophisticated series of snapshots, taken as of some
moment in time.


System Architecture




Detector

Detector

Detector

Detector

End User

Legacy

Flat
-
file

RDBMS

OODBMS

. . .

Analysis, Query Reports,

Data Mining

Data Warehouse Back
-
End Tools and Utilities

Data extraction:


Extract data from multiple, heterogeneous, and external
sources

Data cleaning (scrubbing):


Detect errors in the data and rectify them when possible

Data converting:


Convert data from legacy or host format to warehouse
format

Transforming:


Sort, summarize, compute views, check integrity, and
build indices

Refresh:


Propagate the updates from the data sources to the
warehouse


On
-
Line Analytical Processing (OLAP)

Front
-
end to the data warehouse. Allowing
easy data manipulation


Allows conducting inquiries over the data at
various levels of abstractions



Fast

and
easy

because some aggregations
are computed in advance

No need to formulate entire query

OLAP: Data Cube

OLAP uses data in multidimensional format (e.g., data cubes)
to facilitate query and response time.

Date

Country

sum

sum



TV

VCR

PC

1Qtr

2Qtr

3Qtr

4Qtr

U.S.A

Canada

Mexico

sum

Overall sales of
TV’s in the US

in 3rd quarter

OLAP: Data Cube Operations

Slicing
:

Selecting the dimensions of the cube to be viewed.


Example: View “Sales volume” as a function of “
Product ”

by

Country

“by “
Quarter”


Dicing
:

Specifying the values along one or more
dimensions.


Example: View “
Sales volume” for “Product=PC” by

Country

“by “Q
uarter”



OLAP: Data Cube Operations

Drilling down
:
from higher level
aggregation to lower level aggregation or
detailed data (Viewing by
“state” after
viewing by “region” )



Rolling
-
up
: Summarize data by climbing
up hierarchy or by dimension reduction
(E.g., viewing by “region” instead of by
“state”)


Cube Operations Illustrated


Rolling up

Drilling down

Actual Application

Com.1


Query:



“overall & detail production performance”


manufacturer: Com1


products: all products


date interval: 01
-
Jan
-
94 until 01
-
Jan
-
1999


source: USDA

Com.1

Com.1

Com.1

Lot#1

Lot#2

Lot#3

Contract Number 1

Contract Number 2

Contract Number 3

Data Mining

“Data Mining is the exploration and analysis

by automatic or semi
-
automatic means,

of large or small quantities of data

in order to discover
meaningful patterns,
trends and rules
.”

Data Mining

Data Analysis

Database

Statistics

AI & ML

Data Warehouse

OLAP

Data Analysis


Classification



Regression



Clustering



Association



Sequence Analysis


Data Analysis (cont.)

f

X1

X2

X3

Y2


Input Variables

or


Independent Variables

or

Attributes or Descriptors



Output Variables

or


Dependent Variables

or

Classes or Targets

Y1

Y3

Numeric

Categorical

Crisp

Numeric

Categorical

Crisp

Regression

Classification

3, 4.5, 102, …

hot, cold, high, low, …

0, 1, yes, no, …

Modeling


Linear Models

or


Non
-
linear Models

or

A set of rules

Data Analysis (cont.)

Age

Income

Clustering

1, chips, coke, chocolate

2, gum, chips

3, chips, coke

4, …


Probability (chips, coke)

?

Probability (chips, gum)

?


Association

Sequence Analysis

…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…

X
t
-
1

X
t

T

Data Analysis (cont.)



Linear Discriminant Analysis



Naïve Bayes / Bayesian Network



OneR



Neural Networks



Decision Tree (ID3, C4.5, …)



K
-
Nearest Neighbors (IB)



Support Vector Machines (SVM)







K
-
Mean Clustering



Self Organizing Map



Bayesian Clustering



COBWEB







Multiple Linear Regression



Principal Components Regression



Partial Least Square



Neural Networks



Regression Tree (CART, MARS, …)



K
-
Nearest Neighbors (LWR)



Support Vector Machines (SVR)







A Priori



Markov Chain



Hidden Markov Models








Classification




Regression




Clustering




Association & Sequence Analysis


Challenges




Faster, more accurate and more scalable
techniques




Incremental, on
-
line and real
-
time
learning algorithms




Parallel and distributed data processing
techniques




Data mining is an exciting and challenging field with
the ability to solve many complex scientific and
business problems.

Opportunities



Data mining is a ‘
top ten
’ emerging technology




Data mining is finding increasing acceptance in
science and business areas which need to analyze
large amounts of data to discover trends and
patterns which they could not otherwise find.