Business Intelligence and Analytics: Overview and Examples

greenpepperwhinnySecurity

Nov 3, 2013 (3 years and 9 months ago)

63 views

Business Intelligence and Analytics:
Overview and Examples

Dr.
Hsinchun

Chen

Director, Artificial Intelligence Lab, University of Arizona

hchen@eller.arizona.edu

http://ai.arizona.edu;



The Data Deluge (The Economists, March 2010); internet
traffic 667
Exabytes

by 2013, Cisco; Total amount of
information in 2010, 1.2
Zettabyte

(KB
-
MB
-
GB
-
TB
-
PB
-
EB
-
ZB
-
YB)


BIG DATA


BI䜠C位偕TATI低


BI䜠ANAL奔IC匠


BI䜠
⡓佃I䕔AL⤠IM偁CT



$
3B BI revenue in 2009 (Gartner, 2006
); $9.4B BI software
M&A spending in 2010 and $14.1B by 2014 (Forrester)


IBM spent $14B in BI in five years; $9B BI revenue in 2010
(USA Today, November 2010); 24 acquisitions, 10,000 BI
software developers, 8,000 BI consultants, 200 BI
mathematicians


IBM 慣auired I㈯C佐LINK in ㈰ㄱ


BI & Analytics: The Field

BI & Analytics: Definition and
Components


BI and Analytics refers to: (1) the technologies, systems,
practices and applications that (2) analyze critical business
data to (3) help an enterprise better understand its business
and market.”



Core technologies: data warehousing, Extraction,
Transformation, and Load (ETL); Business Performance
Management (BPM), visual dashboards; enterprise text and
multimedia search; data and text mining, social network
analysis


BI 2.0 research: web analytics, web 2.0, social media
analytics, opinion mining;
in
-
memory and real
-
time
BI; cloud
computing, data/web services;
Hadoop
,
MapReduce
; stream

and mobile data mining

BI Industry and Capabilities (Garter
Report, 2011)

Magic Quadrant for BI Platforms (13 Capabilities)


Integration (e.g., Microsoft, Oracle, SAP)


BI (shared) infrastructure


Metadata management


Development tools, collaboration


Information Delivery (e.g., SAP, Microsoft, IBM/
Cognos
)


Reporting, dashboards


Ad hoc query


Microsoft Office integration


Search
-
based BI (structured and unstructured)


Analysis (e.g., IBM/SPSS, SAS)


OLAP


Interactive visualization


Predictive modeling and data mining


Scorecards

4

Magic Quadrant for Business Intelligence
Platforms


Hype Cycle for Business Intelligence, 2011


BI Hype Cycle (Garter Report, 2011)


On the Rise


Collaborative decision making


Information semantic services


Search
-
based data discovery tools


Natural language question answering


At the Peak


Enterprise metadata repositories


BI
SaaS


Visualization
-
based data discovery tools


Mobile BI


In
-
memory DMBS


Sliding into the Trough


Real
-
time
decisoning


Analytics, content analytics, in
-
memory analytics, text analytics


Open
-
source BI tools


Interactive visualization



7

BI Hype Cycle (Cont’d)


Climbing the Slope


BI consulting and system integration


Business activity monitoring


Column
-
based DBMS


Dashboards, data quality tools


Predictive analytics


Excel as a BI front end


Entering the Plateau


BI platforms


Data
-
mining
workbenchs



8

Sample BI Applications (AI Lab)


Security informatics


Securing cyber space, cyber security, predicting Arab Spring


Information and system security, enterprise risk management


Market intelligence


Data/text/web mining, web 2.0, social media analytics


Big data (volume/variety/velocity/mobility),
Hadoop
, Cloud apps


Healthcare informatics


Healthcare IT integration and solutions, decision support


EHR data/text mining, patient empowerment and social
media

9

10

(1) BI for Security: COPLINK

COPLINK Identity Resolution and
Criminal Network Analysis

11

(2) BI for Market Intelligence (AZ
BizIntel
)


Mass media, social media contents


Text & social media analytics techniques


Finance/accounting/marketing
models (
Tetlock
/Columbia,
Antweiler
/UBC, Das/Santa Clara)


NYU (
Dhar
⤬ Ari穯na (
Dhaliwal

Kelly, Jiang,
Lusch
, Yong⤬ Na瑩onal Taiwan U ⡌i, Hong, Lu)



Bag of words, named entities, proper nouns, topics (1, 2
-
, 3
-

grams)


Sentiment/valence, lexicons, machine learning, stakeholder
analysis, EFLS analysis


Time series models, spike detection, decaying function, trading
windows, targeted sentiment


Econometrics/regression models (R
-
sqr
, p
-
value), 10
-
fold validation
(F, accuracy), simulated trading (cost, frequency, exit)

13




Predefined Data Sources









Data Sources for US Public Companies



SEC/Edgar

NYSE.com

NASDAQ.com

Finance.Yahoo.com

Company Information Database



Ticker

CUSIP

CIK

PERMNO

Company
Keywords

Company
Name




Dynamic Data Sources









Blogs

News

Search
Engines

WSJ

Twitter

Basic
Information

Yahoo Finance
Forums

Company
Websites

Stock
Exchange

10K
Report

Data Collection

Data
Processing

Transformation/Integration




Topics &
Sentiments

Time Series
/ Burst

Risk Model

SNA

Analysis

Analytic Approaches




Finance/Econ
models and
metrics

Cross Media
Analysis

Single Media
Analysis


Predicting
Markets


AZ BIZ INTEL
System Design

Visualization




Static
Figures/Dashboards

Interactive Applications

Simulated
Trading

(3) BI for Healthcare: AZ Smart Health

14

AZ Smart Health Research

Healthcare Decision Support


Symptom
-
Disease
-
Treatment Extraction for Medical Knowledge Re
-
use


Scenario
-
based Association Rule Mining and Result Validation for Effective Healthcare


Outcome Assessment and Medication Compliance to Signify Quality of Care


Temporal Episodes and Disease Progression Modeling for Better Patient Condition Assessment


Patients
-
Like
-
You
-
and
-
Me EHR Search Interface to Accelerate Clinical Decision Making

Patient
-
centered Smart Health


Personalized Healthcare for Chronic and Family Diseases Management


Long Term Medication Effects to Improve New Drug Development


Public Health Modeling and Monitoring for Government Agencies


Patient Social Media to Empower Patients and Improve Self Care at Home

Healthcare Business Analytics


Cost Modeling and Containment


Improving Rate Calculation for the National Health Insurance


Competency and Performance Benchmarking


Quality
-
based Insurance Reimbursement


Workflow Planning and Coordination for Inter
-

and Intra
-

Hospital Process

ARM in Medicine: Symptoms, Diseases,
and Treatments

Patient Statistics: Breast Cancer

0

1092

0
200
400
600
800
1000
1200
M
F
Patient Genders

6

318

618

150

0
200
400
600
800
15 to 24
25 to 44
45 to 64
> 65
Patient Age Groups

335

179

169

152

146

146

125

103

99

86

0
100
200
300
400
Secondary malignant neoplasm of bone and…
Malignant neoplasm of female breast, upper-…
Diabetes mellitus without mention of…
Secondary malignant neoplasm of lung
Secondary malignant neoplasm of liver
Malignant neoplasm of other specified sites of…
Essential hypertension, unspecified
Malignant neoplasm of female breast, upper-…
Secondary and unspecified malignant neoplasm…
Benign neoplasm of breast
Frequent Cooccurred Diagnosis

Consistency of Top Treatment Orders


Department
03: General
Surgery;

Department
BD: Gastrointestinal
surgery


Age
gro異 4: 15 瑯
㈴2

Age
gro異 5: 25 瑯
㐴4

Age
gro異 6: 45 瑯
㘴6

Age
gro異 7: >



Coocc畲red

Diag湯sis 196.3: Seco湤ary a湤 畮s灥cifie搠malig湡湴 湥o灬asm of lym灨
湯摥s;
Coocc畲red

Diag湯sis 198.5: Seco湤ary malig湡湴 湥o灬asm of 扯湥 a湤 扯湥 marrow



Top 20 treatments from aggregated population

Physician

Department

Age Group

Cooccurred Diagnosis



M1130

M1529

M1540

M1585

03

BD

4

5

6

7

196.3

198.5

1

Exemestane (Aromasin) (
諾曼癌素
)

V

V





V

V





V







2

Her
-
2/neu
螢光原位雜交法

(
Her
-
2/neu FISH)







V

V







V



V

V

3

Trastuzumab (Herceptin) (
賀癌平
)

V

V

V

V

V

V



V

V

V

V

V

4

Anastrozole (Anazo) (
安納柔
)





V



V









V



V

5

Zol edronic acid (Zometa) (
卓古祂
)

V

V

V

V

V

V



V

V

V



V

6

Pegyl ated liposomal doxorubicin (Caelyx) (
康利斯微脂利
)

V

V



V

V

V



V

V

V



V

7

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

V

V

V

V

V

V



V

V

V





8

Tamoxifen ci trate (
得適
)







V

V

V



V

V

V





9

Docetaxel (Taxotere) (
剋癌易
)

V

V

V

V

V

V



V

V

V



V

10

Cycl ophosphamide (Endoxan
-
Asta) (
癌得星
)

V

V

V

V

V

V

V

V

V

V



V

11

Vi norelbine (Navelbine) (
溫諾平
)

V

V



V

V

V



V

V





V

12

Docetaxel (Taxotere) (
剋癌易
)

V

V

V

V

V

V



V

V

V



V

13

Epi rubicin HCl (Pharmorubicin RD) (
泛艾黴素
)

V

V

V

V

V

V



V

V

V



V

14

Epi rubicin (Pharmorubicin) ( "
速溶
"
泛艾黴素
)

V

V

V

V

V

V



V

V

V





15

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

V

V

V



V

V



V

V

V



V

16

Epi rubicin (Pharmorubicin) ( "
速溶
"
泛艾黴素
)

V



V





V



V

V

V

V

V

17

Methotrexate sodium inj (Amethopterin) (
滅殺除癌
)



V





V





V









18

Di ssection of axillary l ymphatics (
腋窩淋巴腺清除術
)





V





V





V

V

V



19

Breast tumor biopsy (
乳房腫瘤組織檢查切片術
)





V











V







20

Intravenous chemotherapy 4
-
8 hours (
靜脈化學藥物注射
4
-
8
小時
)

V

V

V

V



V

V

V







V

Treatment Comparison Among
Different Physicians



DOCTOR_NO=M1130

DOCTOR_NO=M1529

DOCTOR_NO=M1540

1

Cael yx 20mg/10ml/vial (
康利斯微脂利
)

Methotrexate Inj 50mg/2ml (
滅殺除癌
)

Zometa Powder For Solution For Infusion 4mg/vial (
卓古

)

2

Aroma si n S.C. Tabl ets 25mg (
諾曼癌素
)

Gemzar 200mg/vial (
健擇
)

Anazo F.C. Tablets (
安納柔
)

3

Navel bine 10mg/1ml/vial (
溫諾平
)

Zometa Powder For Solution For Infusion 4mg/vial (
卓古

)

Taxotere 20mg/0.5ml/vi al (
剋癌易
)

4

Intravenous chemotherapy <1 hours (
靜脈化學藥物
注射
)

FORMOXOL 30mg/5ml/vi al (
伏摩素
)

Taxotere

80mg/2ml/vial (
剋癌易
)

5

Hercepti n 440mg/20ml/vi al (
賀癌平
)

Aroma si n S.C. Tabl ets 25mg (
諾曼癌素
)

Herceptin 440mg/20ml/vial (
賀癌平
)

6

Zometa Powder For Sol uti on For Infusi on 4mg/vi al (
卓古

)

Her捥ptin 440mg/20ml/vial (
賀癌平
)

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

7

FORMOXOL 30mg/5ml/vi al (
伏摩素
)

Na vel bi ne 10mg/1ml/vi al (
溫諾平
)

Granocyte 100ug/vial

(
顆球諾得
)

8

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

Endoxan
-
Asta Injection 200mg/vial(
癌得星
)

Senti nel l ymphadenectomy (
腋窩淋巴腺清除術
)

9

Abi trexate 50mg/2ml/vi al (
必除癌
)

Ca el yx 20mg/10ml/vi al (
康利斯微脂利
)

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

10

Taxotere 80mg/2ml/vial (
剋癌易
)

Taxotere 20mg/0.5ml/vi al (
剋癌易
)

Endoxan
-
Asta Inje捴ion 200mg/vial(
癌得星
)

11

Taxotere 20mg/0.5ml/vial (
剋癌易
)

Taxotere 80mg/2ml/vi al (
剋癌易
)

Pha rmorubi ci n Rapi d Di ssol ati on 10mg ( "
速溶
"
泛艾黴素
)

12

Endoxan
-
Asta Injection 200mg/vial(
癌得星
)

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

Whol e body bone scan (
全身骨骼掃描
)

13

Pharmorubicin Rapid Dissolation 10mg ( "
速溶
"
泛艾黴素
)

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

Pharmorubi ci n 10mg/vi al ( "
速溶
"
泛艾黴素
)

14

Pharmorubicin RD 50mg/vial (
泛艾黴素
)

Intravenous chemotherapy 1
-
4 hours (
靜脈化學藥
物注射
)

Si mul ati on procedure (
模擬定位攝影
)

15

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

Intravenous chemotherapy 4
-
8 hours (
靜脈化學藥
物注射
)

Pha rmorubi ci n RD 50mg/vi al (
泛艾黴素
)

16

Pharmorubicin 10mg/vial ( "
速溶
"
泛艾黴素
)

Ra s i tol Tabl ets 40mg ( Furosemi de) (
來喜妥
)

Breast tumor biopsy examination (
乳房腫瘤組織檢查切
片術
)

17

Gemzar 200mg/vial (
健擇
)

Emetrol Tablets 10mg (Domperidone) (
愈吐寧
)

Intravenous chemotherapy 4
-
8 hours (
靜脈化學藥物
注射
)

18

Intravenous chemotherapy 4
-
8 hours (
靜脈化學藥
物注射
)

Pha rmorubi ci n RD 50mg/vi al (
泛艾黴素
)

Intravenous chemotherapy 1
-
4 hours (
靜脈化學藥物
注射
)

19

Intravenous chemotherapy 1
-
4 hours (
靜脈化學藥
物注射
)

Pharmorubi ci n 10mg/vi al ( "
速溶
"
泛艾黴素
)

Vascul ar expl orati on (
血管探查
)

20

Neurotin Tablets 600mg (
鎮頑癲
)

Sodi um chloride i njection (
氯化鈉注射液
)

Fixed mold
-
large (
固定模具之設計及製作
-

)

Treatment Comparison Among
Different Patient Age Groups


Anazo

F.C. Tablets (
安納柔
) is a treatment for advanced breast cancer in
postmenopausal
women (advanced age).


Abitrexate

(
必除癌
) is a
drug
in the FDA pregnancy risk categories, which has proven
to cause
fetal
risks and abnormalities. Therefore, it is less likely to be prescribed for
patients in
young age
group=5 (i.e., age 25 to 44)



Age group=5

Age group=6

Age group=7

1

Cael yx 20mg/10ml/vial (
康利斯微脂利
)

Her
-
2/neu
螢光原位雜交法

(
Her
-
2/neu FISH)

Anazo F.C. Tablets (
安納柔
)

2

He rce pti n 440mg/20ml/vi al (
賀癌平
)

Aroma si n S.C. Tabl ets 25mg (
諾曼癌素
)

Abitrexate 50mg/2ml/vial (
必除癌
)

3

Zome ta Powder For Sol uti on For I nf usion 4mg/vi al (
卓古祂
)

Hercepti n 440mg/20ml/vi al (
賀癌平
)

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

4

Pharmorubicin 10mg/vial ( "
速溶
"
泛艾黴素
)

Zometa Powder For Solution For Infusion 4mg/vial (
卓古祂
)

Herceptin 440mg/20ml/vial (
賀癌平
)

5

Taxotere 80mg/2ml/vial (
剋癌易
)

Navel bine 10mg/1ml/vial (
溫諾平
)

Sentinel lymphadenectomy (
腋窩淋巴腺清除術
)

6

Taxotere 20mg/0.5ml/vial (
剋癌易
)

Cael yx 20mg/10ml/vial (
康利斯微脂利
)

Tadex 10mg/tab (
得適
)

7

Navel bine 10mg/1ml/vial (
溫諾平
)

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

Zometa Powder For Solution For Infusion 4mg/vial (
卓古祂
)

8

Pharmorubicin RD 50mg/vial (
泛艾黴素
)

Tadex 10mg/tab (
得適
)

Cael yx 20mg/10ml/vial (
康利斯微脂利
)

9

Endoxan
-
Asta Injection 200mg/vial

(
癌得星
)

Taxotere 80mg/2ml/vial (
剋癌易
)

Endoxan
-
Asta Injection 200mg/vial(
癌得星
)

10

Tadex 10mg/tab (
得適
)

Endoxan
-
Asta Injection 200mg/vial(
癌得星
)

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

11

Xel oda Tablets 500mg (
結瘤達
)

Breast tumor biopsy (
乳房腫瘤組織檢查切片術
)

Pharmorubicin Rapid Dissolation 10mg ( "
速溶
"
泛艾黴素
)

12

Radical mastectomy
-
unilateral (
乳癌根除術-

單側
)

Taxotere 20mg/0.5ml/vial (
剋癌易
)

Partial mastectomy
-
unilateral (
部份乳癌根除術-

單側
)

13

Pharmorubicin Rapid Dissolation 10mg ( "
速溶
"
泛艾黴素
)

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

Granocyte 100ug/vial

(
顆球諾得
)

14

CA
-
153 tumor marker (CA
-
153
腫瘤標記
)

Pharmorubicin RD 50mg/vial (
泛艾黴素
)

Taxotere 20mg/0.5ml/vial (
剋癌易
)

15

Intravenous chemotherapy 4
-
8 hours (
靜脈化學藥物注射
)

Pharmorubicin

10mg/vial ( "
速溶
"
泛艾黴素
)

Pharmorubicin RD 50mg/vial (
泛艾黴素
)

16

Granocyte 100ug/vial

(
顆球諾得
)

Abitrexate 50mg/2ml/vial (
必除癌
)

Ta xotere 80mg/2ml/vi al (
剋癌易
)

17

Methotrexate Inj 50mg/2ml (
滅殺除癌
)

Sentinel lymphadenectomy (
腋窩淋巴腺清除術
)

Pharmorubicin 10mg/vial ( "
速溶
"
泛艾黴素
)

18

Intravenous chemotherapy 1
-
4 hours (
靜脈化學藥物注射
)

Pharmorubicin Rapid Dissolation 10mg ( "
速溶
"
泛艾黴素
)



19

Gemzar 200mg/vial (
健擇
)

Intravenous chemotherapy 1
-
4 hours (
靜脈化學藥物注射
)



20

FORMOXOL 30mg/5ml/vial (
伏摩素
)

C
ompensator design and production

(
補償器之設計及製

)



Cancer Community Mapping: Text Mining &
Visualization for Documents and Patient Forums

21

A Brain Neoplasms article
about toddlers

Meningeal Neoplasms and
Brain
Diaseases

subtopics

Breast cancer patient
forum messages

Red Blood Cell and Lymph
Nodes subtopics







4(c): A Chinese SOM Map about Colon Cancer 4(d): A Tag Cloud for PLM Breast Cancer Patients

BI & Analytics Research Opportunities
and Challenges


Opportunities: BIG DATA


BIG 䍏MPUT䅔ION


BIG ANALYTICS


BI䜠(SOCIETAL) IMPCTS (NAE
Grand Challenges: security, healthcare)



Challenges: data deluge (TB/PB)


data variety
(numbers, text, multilingual, multimedia)


data
velocity (mobile, streaming)


data organi穡tion ☠
access (䑂MS,
Hadoop
, IR, image, mobile)


data
analytics (statistical analysis, data/text/web mining)


22

Training the New “Data Scientists”:
Core Knowledge


B
-
School (Management Information Systems):
economics/finance/accounting/marketing, statistical
analysis/modeling, organizational/behavioral


bu獩n敳猠
歮o睬敤g攻e獴慴s獴scs



C
-
School (Computer Science): programming language, data
structure & algorithm, database management system, artificial
intelligence, networking, data mining, web computing & mining


捯mpu瑡瑩on慬 瑥捨niqu敳



I
-
School (Information/Library Science): information
organization, information retrieval, information visualization,
NLP, text mining, HCI


in景rm慴ion pro捥獳cng

23