Data & Text Mining

desertcockatooData Management

Nov 20, 2013 (3 years and 8 months ago)

75 views

Data & Text Mining

Abhay

Ahluwalia
, Chris
Bruck
, Christopher Stanton, Stefanie
Felitto
,
Mike Paulus


BUAD 466: Introduction to Business Intelligence

November 30, 2011

Data Mining Background


Definition


the process
of analyzing data from
different perspectives and summarizing it into
useful information


Data Mining Software (ex. XL Miner) allows
users
to analyze data from many different
dimensions,
categorize it, and summarize the
relationships
identified

The Basics of Data Mining


A
nalyzes
relationships and patterns in stored
transaction data based on open
-
ended user
queries


Classes
: Stored data is used to locate data in
predetermined
groups


Clusters
: Data items are grouped according to logical
relationships or consumer
preferences


Associations
: Data can be mined to identify
associations


Sequential
patterns
: Data is mined to anticipate behavior
patterns and
trends

Text Mining Background


Definition: the
discovery by computer of
previously unknown knowledge in text, by
automatically extracting information from
different written
resources


Goal: to
extract new, never
-
before
encountered
information


Text
mining can expand
the ability of data
mining to
deal with textual
materials


Data are Key to Business Value

DATA
: Measures of variables in categories


Support
Decision Making


Provide
Basis for Forecasting


Important
to


Obtain
data from new sources (text mining)


Integrate
(mash) information from multiple sources


Software Example #1: VAIM (Value
-
Added Information Mash)


MINING
: finding patterns in data (pattern
-
oriented, record
-
oriented searches
)


MASHING
: Integrating information mined from
multiple
resources


Useful
in Hospitals and for Government Campaigns

Software Example #2: IBM SPSS


Assists
in Statistical Analysis in predicting trends


Categorizes
data, Preforms Statistical Analysis


Multiple
Regressions to suggest causality

Software Example #3: XL Miner


Add
-
In on Microsoft Excel Products


Builds
off of software that companies already possess


Assists
in predictive forecasting based on
observed data
trends


Demonstration


Business Value Example #1: Grocery
Store


Data
mining using Oracle


Analyzed buying patterns


Finding lead to changes in Marketing


Increased revenues


Value Example #2
-

University of
Rochester Cancer
Center


Using
KnowledgeSEEKER

software


Studied effect of anxiety of Chemotherapy on
nausea


Analysis helped improved treatment of patients
and improved quality of life.


Value Example #3: MGM Grand Hotel


Analyzed
customer satisfaction and
probability of return stay


Found that the front desk and room where
most important


Focused next 6 months improving


10% improvement in attrition


Increased guest returns and profitability


Business Applications

Pros:


Extracts new information
and Combines human
linguistic capabilities with
the speed and accuracy
of a computer


Can answer the ‘Why?’


Competitive advantage

Cons:


Expensive


Requires Training


Dependent on structure of
warehouses and
repositories


Complications & Concerns


Invasion of Privacy


According to
Lita

van
Wel

and
Lamber

Royakkers

in “Ethical
issues in web data mining”,
privacy is considered lost when
information about an individual
is obtained, used, or spread
without that individual’s
permission

More Complications


Data is made anonymous
before gathered into profiles,
there are no personal profiles;
therefore these applications
de
-
individualize the users by
judging them just by their
mouse clicks


De
-
individualization:
tendency
of judging and treating people
on the basis of group
characteristics instead of on
their own individual
characteristics

More Concerns


Companies can claim to collect the data for
one purpose and use it for another


The growing movement of selling personal data
as a service encourages website owners to
trade personal data obtained from their site


The companies that buy the data make it
anonymous and these companies and assume
ownership of the data that they release

http://
www.youtube.com
/
watch?v
=zdM6vzRHrG0

Even More Complications


Some web mining algorithms might use controversial
characteristics to categorize individuals, such as sex,
race, religion, or sexual orientation


This process could result in the refusal of service or a privilege
to an individual based on his race, religion, or sexual
orientation.


Application Recommendations &
Conclusion


Sync data repositories (VAIM Software)


Training


Use Data Mining and Text Mining together


Group Jeopardy:

Data and

Text
Mining
Background

Business
Applications

Complications

with Mining

From the
Examples

100

100

100

100

200

200

200

200

300

300

300

300

Data and Text Mining Background For
100:

True or False:
Clusters

refer to Data Items
that are grouped according
to logical
relationships or consumer preferences?


True.

Home

Data and Text Mining Background For 200
:

What
is
the name of the
Text Mining
Software
that allows users to analyze data
from different
dimensions,
categorize
it, and summarize the
relationships
it identified, all within a familiar
Microsoft Office Program?


XL Miner

Home

Data and Text Mining Background For 300
:

Name either
2 Pro's or 2 Cons to the
Business Applications of
Data Mining.


Pros: extracts
new info, can answer the why, creates a
competitive advantage


Cons
: expensive
, requires training, dependent on
structure of warehouses
and repositories

Home

Business Applications for 100:

What does
VAIM stand
for?


Value
-
Added Information Mashing

Home

Business Applications for 200:

What is the difference between Text
Mining and Text Mashing?


MINING: finding patterns in data (pattern
-
oriented,
record
-
oriented searches)


MASHING: Integrating information mined from multiple
resources

Home

Business Applications for 300:

What is the greatest benefit of Text Mining
for Businesses?


Extracts new information and Combines human linguistic
capabilities with the speed and accuracy of a computer


Home

Complications for 100:

True or False:
Companies
who buy the data and
make it anonymous are not responsible
for
potential
legal actions against them for using the
data?


False, they are responsible and can have serious legal
actions taken upon them


Home

Complications for 200:

What is
the term used
when the personal
data of individuals is treated
on the basis of
group characteristics
rather than individual
characteristics?


De
-
individualization

Home

Complications for 300:

Which two US Senators introduced
the Commercial Privacy Bill of Rights?


John McCain (R
-
AZ)


John Kerry (D
-
MA)


Home

From the Examples for 100:

When the
grocery store analyzed
men's
buying
trends they found that when men
purchased
diapers
and what other item did they buy?


Beer


Home

From the Examples for 200:

What software did the University of Rochester
Cancer Center use to analyze the affects of
Chemotherapy treatments on nausea?


KnowledgeSEEKER


Home

From the Examples for 300:

What did Text Mining identify as the two most
important areas of the MGM Grand Hotel?


The Front Desk and the Room


Home