Knowledge Management Challenges in Knowledge Discovery Systems

collardsdebonairManagement

Nov 6, 2013 (4 years and 1 month ago)

67 views

Knowledge Management Challenges
in Knowledge Discovery Systems


Mykola Pechenizkiy
, Seppo Puuronen

Department of Computer Science

University of Jyväskylä

Finland



Alexey Tsymbal

Department of Computer Science

Trinity College Dublin

Ireland



TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

2

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Outline


Introduction


KDD


Selection of DM strategy for a problem at hand


Meta
-
learning


Our goal


To propose a knowledge
-
driven approach
to enhance

the selection of DM strategies in KDSs.


Need for KM


What are the challenges


KM processes wrt problem of DM strategy selection


Further research


Discussion

3

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Knowledge discovery as a process

Fayyad, U., Piatetsky
-
Shapiro, G., Smyth, P., Uthurusamy, R.,

Advances in Knowledge Discovery and Data Mining
, AAAI/MIT Press, 1997.

I

4

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

CRISP
-
DM

http://www.crisp
-
dm.org/


5

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

KDD Process: “Vertical Solutions”

Business
Understanding

Data
Understanding

Data
Preparation

Data
Exproration

Data

Mining

Evaluation

&
Interpretation

Deployment

Experience accumulation

Reinartz, T. 1999,
Focusing Solutions for Data Mining
.

LNAI 1623, Berlin Heidelberg.


6

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

The Search for Scientific Methods and Meta
-
Learning



Adequate scientific methods make induction easier with
a smaller number of examples.


The choice of methods needs to be based on a higher
level induction or on meta
-
learning in the context of
machine learning.




knowledge concerning the most appropriate method for
a given goal can be obtained by induction on the
database of history of science a collection of problems of
different methods, different goals and different degrees
of success”
[Laudan]


Meta
-
learning can produce rules concerning the use of
the alternative strategies, methodological knowledge, or
correct predictions concerning the best rank of strategies
for a new task.

7

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Dynamic Selection of DM Methods


… in KDSs has been under active study


2 contexts of dynamic selection:


multi
-
classifier systems that apply different
ensemble techniques (Dietterich, 1997).



Their general idea is usually to select one classifier
on the dynamic basis taking into account the local
performance (e.g. generalisation accuracy) in the
instance space.


multistrategy learning (Michalski)


applies a strategy selection approach which takes
into account the classification problem
-

related
characteristics (meta
-
data).

8

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Selection of the most appropriate DM technique


Motivation


No Free Lunch theorem;


many empirical studies show


one learning strategy can perform significantly better than another
strategy on a group of problems that are characterised by some
properties (Kiang, 2003).


Problem


Selection is usually not straightforward.


some
knowledge

is required for making a decision about appropriate
techniques’ selection and DM strategy construction for a problem at hand.


We distinguish
2 levels of knowledge
:


the knowledge extracted from data that represents the problem to be
mined by means of applying a DM technique


the higher
-
level knowledge (from the KDS perspective) required for
managing techniques’ selection, combination and application =>
meta
-
knowledge
.

9

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Meta
-
learning


or “learning to learn”


the effort to
automatically induce dependencies:


learning tasks


learning strategies
.


based on the assumptions that it is
possible


to evaluate and compare learning strategies,


to measure the benefits of early learning on
subsequent learning,


to use such evaluations to reason about
learning strategies


select useful ones and disregard the useless or
misleading strategies (Schmidhuber et al., 1996).


10

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

in Meta
-
learning …


in the context of
classifier ensembles
, where only
the data itself is used to make decisions about
method selection,


rather good practical results are shown in experiments
supported by theoretical studies as well;


in dynamic integration of
DM strategies for a data
set at hand
:


a multistrategy approach based on the ideas of
constructive induction and conceptual clustering
(Michalski, 1997)


several studies on automatic classifier selection via
meta
-
learning (Kalousis, 2002)


No practical success!

11

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Meta
-
Learning

Suggested technique

A n
ew data set

Meta
-
model

Collection of
data sets

Collection of
techniques

Meta
-
learning space


Performance criteria


Knowledge
repository

Evaluation

12

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Problems with Meta
-
Learning for DM SS


Representativeness of meta
-
data samples


Meta
-
learning space is large


Computationally expensive to produce meta
-
data
samples


Curse of dimensionality


Many possible irrelevant features wrt
collected/produced meta
-
data


Complexity of statistical measures


Why do we need to spend time to characterize the
dataset if we can use this time to try different DM
approaches and select the best one?

13

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Our goal and focus: KM perspective


to propose a knowledge
-
driven approach
to
enhance

the dynamic integration of DM strategies

in
knowledge discovery systems;


focus on KM aimed to organise
a
systematic process

of knowledge capture and refinement over time
.


We consider the basic
knowledge management
processes

of


knowledge creation and identification,


representation, collection and organization,


sharing and integration,


adaptation and application

with respect to the introduced concept of meta
-
knowledge.

14

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Introducing KM to DM SS


Generally, the problem of knowledge capture, storage, and
dissemination is similar to data and information management in ISs,
and therefore some executives prefer to view KM as a natural
extension to IS functions (Alavi and Leidner, 1999).


Zack (1999)


the most practical way to define KM is to show on the
existing IT infrastructure the involvement of:


(1) knowledge repositories,


(2) best
-
practices and lessons
-
learned systems,


(3) expert networks [these are DM experts], and


(4) communities of practice [these are end
-
users].

Knowledge
Creation &
Acquisition

Knowledge
Organization

&
Storage

Knowledge
Distribution

&
Integration

Knowledge
Adaptation &
Application

Knowledge Evaluation, Validation and Refinement

15

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Transformations of data and knowledge concepts

Knowing that and what

Data

Information

Knowledge

Wisdom

Reality

Capture, Transmission, Representation,
Recording, Storage, Archiving, Deletion

Data Processing

Information Processing

Knowledge
Processing

Entities

Attributes

Knowing how and why

Knowing when, where and what for

(adopted from Spiegler, 2000)

Knowledge is “
justified belief that increases an entity’s capacity for effective
action
” (Nonaka, 1994).

A long history of epistemological debates, and discussion of knowledge from
different perspectives in Polanyi (1962).

16

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Different types of knowing

Knowing

Analysis

Context

that and what

Conceptual

concepts, relationships, i.e. declarative knowledge

how

Functi
onal

hypothesis, i.e. procedural knowledge

where

Spatial

data set characterization

when

Temporal

temporal context

why

Causal

higher
-
level abstraction

who

Organizational

integration, sharing

how much

Economical

benefits, risks, resources

what for

Stra
tegic

business

DM goals, domain knowledge


17

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Knowledge distribution and knowledge integration

4 potential sources of knowledge that has to be
integrated in the repository of KDS system:


(1) knowledge from an expert in data
-
mining, knowledge
discovery, statistics and related fields;


(2) knowledge from a data
-
mining practitioner;


(3) knowledge from laboratory experiments on synthetic
data sets; and, finally,


(4) knowledge from field experiments on real
-
world
problems.


Beside this, research and business communities, and
similar KDSs themselves can organize different trusted
networks, where participant are motivated to share their
knowledge.

18

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Knowledge Repository Lifecycle

(1 of 2)


Since the repository is created it
tends to grow

and
at some point it naturally begins to collapse under its
own weight, requiring major reorganization.


needs for
continuously update
,


some content needs to be deleted (if misleading), deactivated
or archived (if it is potentially useful).


if similar contributions are combined, generalized and
restructured, the content may become less fragmented and
redundant.


The process of
filtering

knowledge claims into
accepted or suppressed is important


when a plenty of claims are produced automatically they
need to be filtered automatically.

19

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Knowledge Repository Lifecycle

(2 of 2)


“knowing when” and “knowing where”

contexts
:


when the environment changes, all of the general rules without
specifying the context could become invalid.


some knowledge should exist that would guide an organization to
change the repository when the environment calls for it.


Some knowledge claims are naturally in
constant competition

with the other claims.


Disagreements within the knowledge repository need to be
resolved by means of
generalization

of some parts and
contextualization

of the others.


In order to increase the quality and validity of knowledge, it
needs to be
continually tested, improved
or
removed
.


Some basic principles of
triggers

can be introduced

20

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Knowledge validity and knowledge quality


The contexts “knowing when” and “knowing where” can be discovered
before

it appears in a real situation.


Active learning


Zooming in and zooming out procedures


Search for balance between generality, compactness, interpretability, and
understandability and sensitiveness to the context, exactness, precision,
and adequacy of (meta
-
)knowledge.


context conditions can be important for knowledge quality estimation


The
quality

of knowledge can be estimated by its ability to help a KDS
produce solutions faster and more effectively.


Knowledge claims have both a
degree of utility

and a
degree of
satisfaction.


To determine the
relative quality

of a validated knowledge claim,
evaluation criteria should be defined:


complexity, usefulness, and predictive power are
well formalised

and
easy
to estimate
;


understandability, reliability of source, explanatory power are rather
subjective

and therefore inaccurate.

21

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Limitations


The goal of KM here is
to make more effective
and efficient use of available DM techniques
.


The most important issues in knowledge
management:


(1) executive/strategic management,


(2) operational management
,


the identification of available knowledge,


seeking ways to capture it in a KM process,


and analysing the ability to design an KM
(sub)system including its tools and applications


(3) costs, benefits, and risks management, and


(4) standards in the KM technology and communication.

22

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Further Research

Knowledge
Creation &
Acquisition

Knowledge
Organization

&
Storage

Knowledge
Distribution

&
Integration

Knowledge
Adaptation &
Application

Knowledge Evaluation, Validation and Refinement


Implementation of presented knowledge
-
driven
framework for a KDS that contains a limited
number of DM techniques of a certain type


Feature extraction techniques and classification
techniques


Evaluation of the framework in practice for real
-
world problems in a distributed environment

23

TAKMA’05 Copenhagen, Denmark August 22
-
26, 2005

Knowledge Management Challenges in Knowledge Discovery Systems
by Pechenizkiy, Tsymbal, Puuronen

Thank You!

Contact Info:

Mykola Pechenizkiy

Department of Computer Science and Information Systems,

University of Jyväskylä, FINLAND

E
-
mail:
mpechen@cs.jyu.fi

Tel.: +358 14 2602472 Fax: +358 14 260 3011

http://www.cs.jyu.fi/~mpechen


Feedback is very welcome:


Questions


Suggestions


Guidelines


Collaboration