Text Mining for Effective Research Management based on Semantic Web Technology

rouleaupromiseSecurity

Nov 5, 2013 (4 years and 5 days ago)

83 views

Text Mining for Effective Research Management based on Semantic Web Technology

Krittawaya

Thongkoo


Second Year Student , College of Arts, Media and Technology ,

Chiang Mai University, Thailand


My Backgrounds


I have completed my bachelor degree in Computer Science and master
degree in Software Engineering.



After I had graduated, I have been working with Department of Modern
Management and Information Technology, College of Arts, Media and
Technology, Chiang Mai University, Thailand as a lecturer for
4
years.



Teaching about Web Programming (by using PHP) , Computer
Programming (by using EXCEL and VBA)and Structural Analysis and Design.



I have studied Ph.D. in Department of Knowledge Management during I
have worked at CAMT.


2

Contents


Research background


Research Problem


Hypothesis


Problem definition


Literature Review


Theory


Purpose of the study


Ideas and Solutions


Educational/Applicable advantages


Research Methods


Population and Sample Data


Data Collection Plan


Research Plan


Progress Diagram

3

Research background



Policy of national research university of the Office of Higher Education
Commission



To improve the quality of educational system.


To produce students following the requirements of the market.


To improve Thailand to be a center for education in the region.


To develop Thai research university.


Loan fund for education.


To solve the problem about unemployment of graduated student.

4

Research background



CAMT research management



CAMT research administration center manages the research management data: research
title, researcher team, methodology and keywords, via filling in the database by
researchers.

5

Research Problem


Chiang Mai University will change status from teaching university to be research university.


Researches and researchers ‘ information are important for decision and planning.


According to
Thanapun

Kulachan’s

research
(Developing Ontology on Commitment of
Universities for Research Management )
.

Keywords in database was grouped by OWL
technology

focusing only on clustering.


Individual Uni.

Strategic

Consortia

Capability

Building

Regional Hub

Research

Universities

World Class

Universities

Global Level

6

QS World University Rankings
2010

http://www.topuniversities.com/university
-
rankings

Quacquarelli

Symonds (QS) is a company specializing in education and study abroad.

7

Asian University Rankings
2010

Rank
2010

Rank
2009

School Name

28

30
=

Mahidol

University

44

35

Chulalongkorn

University

79

81

Chiang Mai University

91

85

Thammasat

University

101
=

109

Prince of
Songkla

University

122

113

Khon

Kaen

University

126

108

Kasetsart

University

http://www.topuniversities.com/university
-
rankings/asian
-
university
-
rankings/overall

8

Hypothesis



Research Management System will automatically update research
database.



CAMT Research Management Center will work systematically.

Research
Paper

Research
Database

Input information by researcher

Research
File

Research
Database

Research
Management
System

Automatically extract information

9

Problem definition


Definitions


“CAMT research management problem”



Problems

with

CAMT

research

management
:



Upon

completion

of

research

projects,

the

researchers

will

input

all

relevant

information

to

the

database

of

the

research

administration

center
.


There

can

be

error

during

the

input

process,

e
.
g
.

typo
.



Some

researchers

may

not

input

their

project

information

to

the

CAMT

database,

but

their

publications

may

appear

in

other

on
-
line

databases

like

SCOPUS
.


The

information

of

university

research

does

not

reflect

the

real

situation
.



Proposed

Solutions
:


System

with

capability

of

automatic

extracting

information

related

to

research

works

of

CAMT

researchers
.



CAMT

research

database

with

capability

of

automatic

updating

of

research

information
.




10

t

Literature Review


Developing Ontology on Commitment of Universities for Research Management
(
Thanapun

Kulachan
,
2008
)













The research ontology was designed from the analysis of collected information. It was found that the
relevant entities for the research ontology consist of: Researcher, Application, Subject, and
Methodology.


11

Literature Review


Developing Ontology on Commitment of Universities for Research Management
(
Thanapun

Kulachan
,
2008
)



The Researcher class describes key researchers in a university.



The Application class describes applications implemented by researchers’ research.



The Subject class describes subjects or research areas of researchers.



The Methodology describes technology, techniques, or tools that researchers used in their research.
These subclasses can be further added depending on research characteristics in each university.



12

Literature Review


Developing Ontology on Commitment of Universities for Research Management
(
Thanapun

Kulachan
,
2008
)



Using research ontology gives the result of searching more
meaningful than using database. As
mention before, when searching for researching information, the system will not find only the
keyword but the meaning of that word will be also looked for.



The result of this research is a framework for developing the research ontology on commitment of
universities that have the same property.



But the information
-
collection phase of this research took a long time because the information
comes from many sources and many people. So the information from online databases (like Scopus)
should be regularly and automatically pulled in order to update the database portion of the Research
Knowledge Management System (RKMS).

13

Literature Review


Developing Ontology on Commitment of Universities for Research Management
(
Thanapun

Kulachan
,
2008
)



Another problem is in identifying keywords, which must be neither too broad nor too narrow.
Overall, the automated extraction of desirable keywords seems infeasible. It also seems infeasible for
a single person to do a good job of extracting keywords across disciplines.



Research information is dynamic. Many researchers may change their research interest and topics
from time to time due to many factors, for examples their support funding, their research resources,
and community needs. So research information should be automatically updated by using text
mining in future work.


14

Literature Review


Mining fuzzy frequent
itemsets

for hierarchical document clustering (Chun
-
Ling
Chen,
2010
)



In this paper present an effective Fuzzy Frequent
Itemset
-
Based Hierarchical Clustering approach,
which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent
Itemset
-
Based Hierarchical Clustering method.



The key terms will be extracted from the document set, and each document is pre
-
processed into the
designated representation for the following mining process.



Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly
-
related fuzzy frequent
itemsets
, which contain key terms to be regarded as the labels of the
candidate clusters.



Finally, these documents will be clustered into a hierarchical cluster tree by referring to these
candidate clusters.


15

Theory


Text Mining


Text Mining, also known as Intelligent Text Analysis, Text Data Mining or Knowledge
Discovery in Text (Feldman and Dagan,
1995
), refers generally to the process of
extracting interesting and non
-
trivial information and knowledge from unstructured text
collections.



Semantic Web Technology


The Semantic Web (R.
Guha

and Rob McCool,
2003
) is an extension of the current Web
in which information is given well
-
defined meaning, enabling programs to understand it.



Ontology is specification of conceptualization (
Kulachan
,
2008
). Ontology can not
only help search information with more efficiently but it also help user search both
words and semantic of itself.



16

Theory


Social Network Analysis


Social Network Analysis is a method that used to study the pattern of interaction of
members in the network by taking communication, connectedness between persons,
group, and network to explain mapping of connectedness.






17


To study and apply Semantic Web Technology as a tool of Text Mining for effective
research management.



To study and apply Social Network Analysis for effective research management.



To construct a prototype of KMS for university research management with text
mining functions in order to automate the research data entry.

Purpose of the study

18

Ideas and Solutions

Result from Text Mining








Text Mining Process








Ontology Analysis Process

Source

(text,
pdf
, doc)








Previous Ontology








New Ontology

Educational/Applicable advantages

(Novelty/academic contribution)


Expected output


A KMS with a capability of automatic extracting research information
belonging to CAMT researchers.


A KMS with a capability of automatic updating the extracted information.


A KMS with Decision Support System (DSS) for research intelligence.



Relevance to beneficiaries


The efficiency in research management is enhanced.


The executives foresee the research trend.


The formation of new research groups to accommodate new research
directions is highly facilitate through the DSS of the KMS.

20


Research Methods


Create prototype
for text mining.

Collect test data
for semantic web
and text mining
from CAMT.

Develop ontology
maintenance
facilities.

Develop Social
Network Analysis.

Demonstration
test.

21



This research will use research information from CAMT research management
center.


Population and Sample Data

22

Data Collection Plan


Data and information in this research will be collected from many reliable sources
such as:


Researchers


Internal faculty research databases


CAMT internal document


National Research University document and website


Commercial on
-
line databases: SCOPUS

23

Research Plan

24

Progress Diagram

Phase
1

Phase
2

Phase
3

Collect file system
content

Automatically analyze
Ontology from file
system content

Compare Ontology to
create new one

&

Develop SNA

-
Find the best solutions to select
real keywords.

-
Apply mathematic to solve the
problem e.g. Fuzzy Frequent
Itemset
-
Based Hierarchical
Clustering

25

Thank you for your attention.

26