An Overview of Domain-Driven Data Mining:

boorishadamantΤεχνίτη Νοημοσύνη και Ρομποτική

29 Οκτ 2013 (πριν από 4 χρόνια και 2 μήνες)

111 εμφανίσεις

D
3
M:

Domain
-
Driven Data Mining

An Overview of

Domain
-
Driven Data Mining:


Toward Actionable Knowledge Discovery

(AKD)

Longbing Cao


Faculty of Engineering and Information Technology

University of Technology, Sydney, Australia


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

2

Outline


Why Do We Need D
3
M


What Is D
3
M


The D
3
M Framework


D
3
M Theoretical Underpinnings


D
3
M Research Issues


D
3
M Applications


D
3
M References


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

3

Why Do We Need D
3
M


A common scenario in deploying data
mining algorithms


I find something interesting!


“Many patterns are found”,


“They satisfy technical metric threshold well”


What do business people say?


“So what?”


“They are just commonsense”


“I don’t care about them”


“I don’t understand them”


“How can I use them?”


“Am I wrong? What can I do better for my business mate?”


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

4

Why Do We Need D
3
M


Where is something wrong?


Gap:


academic objectives || business goals


Technical outputs || business expectation


macro
-
level methodological and fundamental issues


Academic: technical interest; innovative algorithms &
patterns


Practitioner: social, environmental, organizational
factors and impact; getting a problem solved properly


micro
-
level technical and engineering issues


System dynamics, system environment, and interaction
in a system


Business processes, organizational factors, and
constraints


Human and domain knowledge involvement


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

5


An example: Problem with association
mining


Existing association rule mining algorithms are
specifically designed to find strong patterns that
have high predictive accuracy or correlation;


While frequent patterns are referred to as
commonsense knowledge
, they can be eager to
discover
new

and
hidden

patterns in databases.


Many patterns are found;


How associations can be taken over by business
people seamlessly and into operationalizable
actions accordingly?


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

6

What Is D
3
M


Next
-
generation data mining
methodologies, frameworks, algorithms,
evaluation systems, tools and decision
support,


Cater for business environment


Satisfy business needs


Deliver business
-
friendly and decision
-
making
rules and actions that are of solid technical and
business significance


Can be understood & taken over by business
people to make decision



aim to promote the paradigm shift from
data
-
centered hidden pattern mining

to
domain
-
driven
actionable knowledge discovery

(AKD)


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

7


Involve and synthesize Ubiquitous
Intelligence


human intelligence,


domain intelligence,


data intelligence,


network intelligence,


organizational and social intelligence,
and


meta
-
synthesis of the above ubiquitous
intelligence


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

8

The D
3
M Framework


AKD
-
based problem
-
solving


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

9


Interestingness & actionability


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

10


Conflicts & tradeoff


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

11


A framework for AKD


Post
-
analysis
-
based AKD


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

12

D
3
M Theoretical Underpinnings


artificial intelligence and intelligent systems,


behavior informatics and analytics,


business modeling,


business process management,


cognitive sciences,


data integration,


human
-
machine interaction,


human
-
centered computing,


knowledge representation and management,


machine learning,


ontological engineering,


organizational and social computing,


project management methodology,


social network analysis,


statistics,


system simulation, and so on.


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

13

D
3
M Research Issues


Data Intelligence:


deep knowledge in complex data structure; mining in
-
depth data patterns, and
mining structured & informative knowledge in complex data


Domain Intelligence:


Domain & prior knowledge, business processes/logics/workflow, constraints, and
business interestingness; representation, modeling and involvement of them in
KDD


Network Intelligence:


network
-
based data, knowledge, communities and resources; information
retrieval, text mining, web mining, semantic web, ontological engineering
techniques, and web knowledge management


Human Intelligence:


empirical and implicit knowledge, expert knowledge and thoughts,
group/collective intelligence; human
-
machine interaction, representation and
involvement of human intelligence


Social Intelligence:


organizational/social factors, laws/policies/protocols, trust/utility/benefit
-
cost;
collective intelligence, social network analysis, and social cognition interaction


Intelligence metasynthesis:


Synthesize ubiquitous intelligence in KDD; metasynthetic interaction (m
-
interaction) as working mechanism, and metasynthetic space (m
-
space) as an
AKD
-
based problem
-
solving system


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

14


How to reach an interest tradeoff


Balance between technical and business
interests


Suppose there are multiple metrics for
each aspect




D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

15


actionable knowledge discovery through m
-
spaces


acquiring and representing unstructured, ill
-
structured and uncertain domain/human knowledge


supporting dynamic involvement of business experts
and their knowledge/intelligence


acquiring and representing expert thinking such as
imaginary thinking and creative thinking in group
heuristic discussions during KDD modeling


acquiring and representing group/collective
interaction behavior and impact emergence


Building infrastructure supporting the involvement
and synthesis of ubiquitous intelligence


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

16

D
3
M Applications


Real
-
world data mining


Our recent case studies


Capital markets


actionable trading agents


actionable trading strategies


Social security


activity mining


combined mining


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

17

Actionable Trading Evidence for
Brokerage Firms


Trading strategy/evidence



Actionable trading evidence





D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

18


Domain factors


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

19


Business interest


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

20


Developing in
-
depth trading strategy


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

21



D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

22



D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

23

Activity mining for Australian
Commonwealth Governmental Debt
Prevention


Impact
-
targeted activity mining


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

24


Impact
-
targeted activity mining


Frequent impact
-
targeted activity
sequences


Impact
-
contrasted activity sequences


Impact
-
reversed activity sequences


Impact
-
targeted combined association
clusters


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

25


Data intelligence


Activity data


Itemset imbalance


Impact imbalance


Seasonal effect


Demographic data


Transactional data



Itemset/tuple selection/construction


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

26


Domain intelligence


Business process/event for activity selection


Domain knowledge


Feature selection


Sequence construction


Impact target


Positive impact


Negative impact


Multi
-
level impacts


Feature/attribute selection


Interestingness definition


New
pattern structures


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

27


Organizational/social factors


Operational/intervention activities


Seasonal business requirement/
interaction changes


Business cost (debt amount/duration)


Business benefit (saving/preventing debt
amount or reducing debt duration)


Deliverable format


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

28


Impact
-
reserved pattern pair


Underlying pattern 1:


Derivative pattern 2:


Impact
-
targeted combined
association clusters



D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

29


Conditional impact ratio (
Cir
)






Conditional Piatetsky
-
Shapiro’s (P
-
S) ratio (
Cps
)


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

30


Interestingness: tech & biz


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

31


The process


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

32


Impact
-
reversed sequential activity
patterns


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

33


Demographic + transactional
combined pattern


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

34

D
3
M References

Books:


Cao, L. Yu, P.S., Zhang, C., Zhao, Y. Domain Driven Data Mining, Springer, 2009.


Cao, L. Yu, P.S., Zhang, C., Zhang, H.(ed.) Data Mining for Business Applications, Springer, 2008.


Workshops:


Domain
-
driven data mining 2008, joint with ICDM2008.


Domain
-
driven data mining 2007, joint with SIGKDD2007.


Special issues:


Domain
-
driven data mining, IEEE Trans. Knowledge and Data Engineering, 2009.


Domain
-
driven, actionable knowledge discovery, IEEE Intelligent Systems, Department, 22(4): 78
-
89, 2007.


Some of relevant papers:


Longbing Cao, Yanchang Zhao, Huaifeng Zhang, Dan Luo, Chengqi Zhang. Flexible Frameworks for Actionable
Knowledge Discovery, submitted to IEEE Trans. on Knowledge and Data Engineering.


Cao, L., Zhang, H., Zhao, Y., Zhang, C. Combined Mining: Discovering More Informative Knowledge in e
-
Government Services, submitted to ACM TKDD, 2008.


Cao, L., Dai, R., Zhou, M.: Metasynthesis, M
-
Space and M
-
Interaction for Open Complex Giant Systems, technical
report, 2008.


Cao, L. and Ou, Y. Market Microstructure Patterns Powering Trading and Surveillance Agents. Journal of Universal
Computer Sciences, 2008 (to appear).


Cao, L. and He, T. Developing actionable trading agents, Knowledge and Information Systems: An International
Journal, 2008.


Cao, L. Developing Actionable Trading Strategies, in edited book: Intelligent Agents in the Evolution of WEB and
Applications, Springer, 2008.


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

35


Some of relevant papers:


Cao, L., Zhao, Y., Zhang, C. (2008), Mining Impact
-
Targeted Activity Patterns in Imbalanced Data,
IEEE Trans. Knowledge and Data Engineering, IEEE, , Vol. 20, No. 8, pp. 1053
-
1066, 2008.


Cao, L., Yu, P., Zhang, C., Zhao, Y., Williams, G.:DDDM2007: Domain Driven Data Mining, ACM
SIGKDD Explorations Newsletter, 9(2): 84
-
86, 2007.


Cao, L., Zhang, C.: Knowledge Actionability: Satisfying Technical and Business Interestingness,
International Journal of Business Intelligence and Data Mining, 2(4): 496
-
514, 2007.


Cao, L., Zhang, C.: The Evolution of KDD: Towards Domain
-
Driven Data Mining, International Journal
of Pattern Recognition and Artificial Intelligence, 21(4): 677
-
692, 2007.


Cao, L.: Domain
-
Driven Actionable Knowledge Discovery, IEEE Intelligent Systems, 22(4): 78
-
89,
2007.


Cao, L., and Zhang, C. Domain
-
driven data mining: A practical methodology, International Journal of
Data Warehousing and Mining (IJDWM), IGI Global, 2(4):49
-
65, 2006.


D
3
M
:

Domain
-
Driven Data Mining

The Smart Lab: datamining.it.uts.edu.au

15
December
2008

Cao, L: D3M at DDDM2008 Joint with ICDM2008

36

Thank you!


Longbing CAO




Faculty of Engineering and IT


University of Technology, Sydney, Australia



Tel: 61
-
2
-
9514 4477


Fax: 61
-
2
-
9514 1807


email:
lbcao@it.uts.edu.au



Homepage:
www
-
staff.it.uts.edu.au/~lbcao/


The Smart Lab:
datamining.it.uts.edu.au