Data Centric Knowledge Management System Using Post-Clustering Technique

tribecagamosisΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

390 εμφανίσεις

Available ONLINE
www.vsrdjournals.com





VSRD
-
IJCSIT, Vol. 2 (4
), 201
2
,
1
-
5



____________________________

1
,3
Associate Professor,
2
Professor,
1,2,3
Department of Information Technology,
Sree Vidyanikethan Engineering College,
Tirupathi, Andhra Pradesh, INDIA.

*Correspondence :
srinu_asadi@yahoo.com

R
R
R
E
E
E
S
S
S
E
E
E
A
A
A
R
R
R
C
C
C
H
H
H



A
A
A
R
R
R
T
T
T
I
I
I
C
C
C
L
L
L
E
E
E



Data Centric Knowledge Management System


Using Post
-
Clustering Technique

1
Asadi

Srinivasulu
*
,

2
Ch.D.V. Subba Rao and
3
M. Sreedevi

ABSTRACT

The purpose of Data Centric

Knowledge Management System (DCKMS) is to centralize knowledge generated
by employees working within and functional areas and to organize that knowledge such that it can be easily
accessed, searched, browsed and navigated. It is a one stop shop for findin
g solutions for your problems. It
provides a facility for the employees to register themselves as ‘experts’ as well as search for other ‘experts’
incase of any problem/requirement in their project. It is a one stop shop for finding solutions for your probl
ems
.
This system design is modularized into various categories. This system has enriched UI so that a novice user did
not feel any operational difficulties. This system mainly concentrated in designing various reports requested by
the users as well as high
er with export to excel options.

This paper addresses the expectations, organizational
implications, and information processing requirements, of the emerging knowledge management paradigm. A
brief discussion of the enablement of the individual through the
wide
-
spread availability of computer and
communication facilities is followed by a description of the structural evolution of organizations, and the
architecture of a computer
-
based knowledge management system. The author discusses two trends that are
driv
en by the treatment of information and knowledge as a commodity, increased concern for the management
and exploitation of knowledge within organizations, and, the creation of an organizational environment that
facilitates the acquisition, sharing and appli
cation of knowledge.

Keywords :

D
ata, D
ata
-
Centric, Data Mart, Data Portal, Data Warehouse, Enabled Individual, Information,
Information
-
Centric, Information Management, Knowledge, Knowledge Management, Ontology,
Organizational Structure,

Clustering, Data Mining, Fuzzy C
-
Means Clustering Algorithm, K
-
Means
Clustering Algorithm.

1.

INTRODUCTION

The Data Centric Knowledge Management System is a web based application which allows employees of a
Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
2

of 11


company to share their knowledge with others in th
e company. Also it allows them to search for knowledge
assets when in need. It provides a facility for the employees to register themselves as ‘experts’ as well as search
for other ‘experts’ incase of any problem/requirement in their project. It is a one s
top shop for finding solutions
for your problems. As information technology begins to permeate all aspects of life and the economy turns

decidedly information
-
centric, wealth is increasingly defined in terms of information
-
related

products and the
availabi
lity of knowledge. Under these conditions employment, whether self
-
employment

or organizational
employment is becoming singularly focused on the skills and

capabilities of the individual. In other words
knowledge has become a commodity that has value

far i
n excess of the manufactured products that represented
the yardstick of wealth during the

industrial age.

How this new form of human wealth should be effectively
utilized and nurtured in commercial

a
nd government organizations have in recent times become a

major
preoccupation of

management. Two parallel and related trends have emerged. The first trend is related to the

management and exploitation of knowledge. The question being asked is: How can we capture

and utilize the
potentially available knowledge fo
r the benefit of the organization? The phrase

“…potentially available” is
appropriate, because much of the knowledge is hidden in an

overwhelming volume of computer
-
based data.
What is not commonly understood is that the

overwhelming nature of the stored d
ata is due to current
processing methods rather than volume.

These processing methods have to rely largely on manual methods
because only the human user

can provide the necessary context for interpreting the computer
-
stored data into
information and

knowle
dge. If it were possible to capture information (i.e., data with relationships), rather than

data, at the point of entry into the computer then there would be sufficient context for computer

software to
process the information automatically into knowledge.

This is not just a desirable


2.

RELATED WORK

The
main purpose of functional requirements within the requirement specification document is to define all the
activities or operations that take place in the system. These are derived through interactions with t
he users of the
system. Since the Requirements Specification is a comprehensive document & contains a lot of data, it has been
broken down into different Chapters in this report. The depiction of the Design of the System in UML is
presented in a separate c
hapter. The Data Dictionary is presented in the Appendix of the system. But the general
Functional Requirements arrived at the end of the interaction with the Users are listed below. A more detailed
discussion is presented in this, which talk about the Ana
lysis & Design of the system. Administrator of this
system can add a new employee as well as delete an existing employee and he can view all the existing users of
the system. Administrator can create; delete user logins for different employees. Administrat
or can view
different reports (My Submission report, Ratings reports, document status report etc).



Administrator of this system can add a new employee as well as delete an existing employee and he can
view all the existing users of the system.



Administrato
r can create, delete user logins for different employees
.



A K
-
User/ K
-
Team Member/Reviewer can search for a document based on his criteria (author, technology
etc).

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
3

of 11




A K
-
User/ K
-
Team Member/Reviewer can download a document.



A K
-
User/ K
-
Team Member/Reviewer
can rate a document.



A K
-
User/ K
-
Team Member/Reviewer can submit a document.



A K
-
User/ K
-
Team Member/Reviewer can register as an expert.



A K
-
User/ K
-
Team Member/Reviewer can search for an expert.



A K
-
Team Member
/Reviewer

can evaluate the above documents for initial screening.



A K
-
Team Member can manage the reviewers list
.



A K
-
team Member can assign a document to particular reviewer



A Reviewer can view the list of documents forwarded to him



A Reviewer can publish or reject

a document.


F
ig
.

1

: Context Level Diagram

3.

EXISTING ALGORITHM

Here in the existing system, the company maintains all the knowledge based documents in a separate system
which will be accessible for all employees through LAN and they can post their new documents into this and
access the earlier documents. Searching for

related documents based on author, technology etc is a time taking
process. Managing the documents category wise and restrict them not to be accessible based on the user type
becomes complicated. This system doesn’t restrict unnecessary documents to be po
sted.

DRAWBACKS:



Difficulty in maintaining security levels for the documents.



Difficulty in browsing, navigating and searching for required document.

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
4

of 11




Difficulty in giving ratings for the documents.



Availability of information in this manner is subjected t
o damage.



Difficulty in restricting the employees not to update the documents
.



Difficulty in generating different reports
.

4.

PROPOSED SYSTEM

The proposed system is fully computerized, which removes all the drawbacks of existing system. In the
proposed system
, it allows different employees of the company to upload their knowledge document into this
system which will be verified by next level users to avoid unnecessary documents. Also it allows them to search
for knowledge assets very easily when in need. It pr
ovides a facility for the employees to register themselves as
‘experts’ as well as search for other ‘experts’ incase of any problem/requirement in their project. It provides a
facility for the evaluator to rate the documents posted by the employees.

ADVAN
TAGES:



It provides a facility a to share knowledge documents across the company



It allows the employees to upload and download the documents from their systems



Easy in browsing, navigating and searching for required documents



Provides a facility to restric
t the unnecessary documents to be posted



Provides flexible way in generating different reports



By the following the new approach the information can be accessed from any where

just with a mouse
click. This helps the users by saving lot of time providing the user with the up to date information
Centralized database helps in avoiding conflicts



This project provides a rich user interface for the user to access information with lea
st effort (“look and
feel”).



It allows to rate the documents at different levels



It allows publishing or rejecting the documents.

4.1.

K
-
MEANS

ALGORITHM

Step 1) Put the first K feature vectors as initial centers


Step 2) Assign each sample vector to the cluster

with minimum distance assignment principle.

Step 3) Compute new average as new center for each cluster

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
5

of 11


Step 4) If any center has changed, then go to step 2, else terminate.

4.2.

K
-
MEANS


Fig
.

5

: Applying Clustering Technique Similarity Weight and Filter Met
hod


Fig
.

6

: Results Of Clustering Showing Groups Divided

Into Clusters

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
6

of 11



Fig
.

7 :

Initialization and Input



Fig. 8 :

Final EMST Edges Path


Fig.

1
:
Graph for K
-
Means

K
-
means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem
.K
-
means

is a popular clustering method that uses prototypes (centroid) to represent clusters by minimizing
within
-
cluster errors. The main idea is
to define k centroid, one for each cluster.

This centroid should be placed in a cunning way because of different location causes different result. The next
Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
7

of 11


step is to take each point belonging to a given data set and associate it to the nearest centroid.
After we have
these k new centroid, a new binding has to be done between the same data set points and the nearest new
centroid. Finally, this algorithm aims at minimizing an
objective function
.

The objective function

:



We apply the above algorithm in
our project by taking input attributes like number of assignments submitted;
number of tasks done successfully, number of times had face to face interactions among team members. Now
applying above algorithm results in division of groups into k clusters .Th
e groups in each cluster would have
shown nearly similar behavior hence grouped into same cluster. Now it becomes easy for the facilitator to give
feedback as now he can give feedback to the entire cluster instead of giving to each and every group


5.

RESULTS


Fig
.
: This
Screen Is Login Page
for All Users and Administrator

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
8

of 11



Fig
.
: Administrator
Can Find the Exp
erts for Getting the Assistance


Fig
.
: Admini
strator Can Register As Experts


Fig
.
: This
Screen Shows the K
Team Actions

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
9

of 11


6.

CONCLUSION

The new
system, Data Centric Knowledge Management System has been implemented to cater the needs of
company employees in sharing different knowledge assets effectively with role based access. The present
system has been integrated with the already existing. The da
tabase was put into the My SQL server. This was
connected by JDBC. The database is accessible through Intranet on any location. This system has been found to
meet the requirements of the users and departments and also very satisfactory. The database system

must
provide for the safety of the information stored, despite system crashes or attempts at unauthorized access.

If
data are to be shared among several users, the system must avoid possible anomalous results.

Future
enhancement is
Extendibility provides
high level extendibility. It means it provides all the basic features and
allows us to extend their features very easily without disturbing the existing code. We can make this Internet
application if we desire. We can make this application is suitable to w
ork on any application just by changing
the deployment files. By providing some more features like providing accessibility to internet users to involve in
this process.

7.

REFERENCES

[1]

Srinivasulu Asadi, Dr. Ch.D.V.Subbarao, V. Saikrishna, “Finding the number o
f clusters using Dark Block
Extraction”, IJCA International Journal of Computer Applications

(0975


8887),

Volume 7


No.3,
September, 2010
.

[2]

A
. Ahmad and L. Dey, (2007), A k
-
mean clustering

algorithm for mixed numeric and categorical data’,
Data and Knowle
dge Engineering Elsevier Publication, vol. 63, pp 503
-
527.

[3]

Srinivasulu Asadi, Dr.Ch.D.V.SubbaRao, V.Saikrishna and Bhudevi Aasadi


Clustering the Labeled and
Unlabeled Datasets using New MST based Divide and Conquer Technique
,”
International Journal of
Computer Science & Engineering Technology (
IJCSET
),

(0975


8887),
IJCSET | July 2011 | Vol 1, Issue
6,302
-
306
, ISSN:2231
-
0711, July, 2011
.

[4]

Xiaochun Wang, Xiali Wang and D. Mitchell Wilkes, IEEE

Members, “A Divide
-
and
-
Conquer Approach
for Minimum Spanning
Tree
-
Based Clustering”, IEEE Knowledge and Data Engineering Transactions, vol
21, July 2009.

[5]

Srinivasulu Asadi,

Dr.Ch.D.V.Subba Rao,

O.Obulesu and
P.Sunil Kumar Reddy
, “Finding the Number of
Clusters in Unlabelled Datasets Using Extended Cluster Count Extr
action (ECCE)”, ,” IJCSIT International
Journal of Computer Science and Information Technology
(ISSN: 0975


9646),

Vol. 2 (4) , 2011, 1820
-
1824
, August, 2011
.

[6]

S

Deng, Z He, X Xu, 2005. Clustering mixed numeric and categorical data: A cluster ensemble appr
oach.
Arxiv preprint cs/0509011
.

[7]

Srinivasulu Asadi,

Dr.Ch.D.V.Subba Rao,

O.Obulesu and
P.Sunil Kumar Reddy
,“
A Comparative study of
Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count
Extraction Extended Dark Bl
ock Extraction and Extended Cluster Count Extraction
”,

IJCSIT International
Journal of Computer Science and Information Technology
(ISSN:0975


9646),

Vol. 2(4) , 2011, 1825
-
1831,
August, 2011
.

[8]

S. Guha, R. Rastogi
, and K. Shim, 2000. ROCK: A Robust Clustering Algorithm for Categorical Attributes.
Information Systems, vol. 25, no. 5 : 345
-
366.

Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
10

of 11


[9]

V.V. Cross and T.A. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: assessment and
Applications, Physica
-
Verlag,
New York, 2002.

[10]

M. Kalina, Derivatives of fuzzy functions and fuzzy derivatives, Tatra

[11]

Jiawei Han and Micheline Kamber. “Data Ware Housing and Data Mining.
Concepts and Techniques”,
Third Edition 2007.

[12]

Zhexue Huang; Ng, M.K.;Manage. Inf. Principles Ltd., M
elbourne, Vic.A fuzzy k
-
modes algorithm for
clustering categorical data. vol.7, pp 446
-
452

[13]

Tengke Xiong; Shengrui Wang; Mayers, A.; Monga, E.; Dept. Comput. Sci., Univ. of Sherbrooke,
Sherbrooke, QC, Canada. A New MCA
-
Based Divisive Hierarchical Algorithm
for Clustering Categorical
Data.

[14]

Iam
-
On, N.; Boongeon, T.; Garrett, S.; Price, C.;Aberystwyth University, Aberystwyth. A Link
-
Based
Cluster Ensemble Approach for Categorical Data Clustering. vol. PP 1.

[15]

Izakian, H.; Abraham, A.; Snasel, V.;Machine Intell. R
es. Labs. (MIR Labs.), Auburn, WA, USA.
Clustering categorical data using a swarm
-
based method. pp. 1720
-
1724

[16]

Charu C.Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications.
SIGKDD ’03, August 2427, 2003, Washington, DC, USA

[17]

Huajie Zhang; Zhiyue Cao; Fangzhu Qiang;Dept. of Comput. Sci., China Univ. of Geosci., Wuhan.
Representation and clustering of numeric data in concept formation
. vol.1, pp 597
-
600.

[18]

M. Mahdavi and H. Abolhassani, (2009) Harmony K
-
means algorithm for docume
nt clustering, Data Min
Knowl Disc (2009) 18:370

391.

[19]

Yong Wang; Naohiro Ishii.Learining Feature Weight for Similarity Measures.

[20]

Bainian Li; Kongsheng Zhang; and Jian Xu. Similarity measures and weighted fuzzy c
-
mean clustering
algorithm. World Academy of
Science, Engineering and Technology 76 2011

[21]

K. Rajendra Prasad, dr. P.Govinda Rajulu,
a survey on clustering Technique for datasets using Efficient
graph structures,
vol. 2 (7), 2010, 2707
-
2714

[22]

Sotirios P. Chatzis. A fuzzy c
-
means
-
type algorithm for cluste
ring of data with mixed numeric and
categorical attributes employing a probabilistic dissimilarity functional. Department of Electrical and
Electronic Engineering, Imperial College London, Exhibition Road, South Kensington Campus SW7 2BT,
UK
.

[23]

G. Gan, Z. Ya
ng, and J. Wu (2005), A Genetic k
-
Modes Algorithm for Clustering for Categorical Data,
ADMA, LNAI 3584, pp. 195

202.

[24]

J. Z. Haung, M. K. Ng, H. Rong, Z. Li (2005) Automated variable weighting in k
-
mean[1] type clustering,
IEEE Transaction on PAMI 27(5).

[25]

K.
Krishna and M. Murty (1999), ‘Genetic K
-
Means Algorithm’
,
IEEE Transactions on Systems, Man, and
Cybernetics vol. 29, NO. 3, pp. 433
-
439.

[26]

Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), ‘Incremental genetic K
-
means algorithm and its
application in
gene expression data analysis’, BMC Bioinformatics 5:172.

[27]

[2
7
] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), FGKA: A Fast Genetic K
-
means Clustering
Algorithm’, ACM 1
-
58113
-
812
-
1.

[28]

Z. He, X. Xu, & S. Deng,(2005) Scalable algorithms for clustering
categorical data, Journal of Computer
Asadi Srinivasulu
et al

/ VSRD
International Journal of CS & IT Vol. 2 (4), 2012


Page
11

of 11


Science and Intelligence Systems 20, 1077
-
1089.

[29]

A. Juan and E. Vidal, “Fast K
-
Means
-
like Clustering in Metric Space,” Pattern Recognition Letters, vol. 15,
no. 1, pp. 19
-
25, 1994.

[30]

Decomposition Methodology for Knowledg
e Discovery and Data Mining, O. Maimon and L. Rokach, eds.,
pp. 90
-
94. World Scientific, 2005.

[31]

W. McCormick, P. Schweitzer, and T. White, “Problem Decomposition and Data Reorganization by a
Cluster Technique,”Operations Research, vol. 20, no. 5, pp. 993
-
10
09, 1972. 29] Statistical Pattern
Recognition. A. Webb, ed., pp. 345
-
357. John Wiley & Sons, 2002.

[32]

A. Gordon, Classification, second ed. Chapman and Hall, CRC, 1999.

[33]

S. Roweis and L. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” S
cience, vol.
290, no. 5500, pp. 2323
-
2326, 2000.

[34]

J.B. Tenenbaum, V. Silva, and J. Langford, “A Global Geometric Framework for Nonlinear Dimensionality
Reduction,” Science, vol. 290, no. 5500, pp. 2319
-
2323, 2000.

[35]

J.C. Bezdek and R. Hathaway, “VAT: A Tool f
or Visual Assessment of (Cluster)

Tendency,” Proc. Int’l
Joint Conf. Neural Networks (IJCNN ’02), pp. 2225
-
2230, 2002.

[36]

M. Belkin and P. Niyogi, “Laplacian Eigenmaps

and Spectral Techniques for Embedding and Clustering,”
Proc. Advances in Neural Information Processing Systems (NIPS), 2002.

[37]

M. Breitenbach and G. Grudic, “Clustering through Ranking on Manifolds,” Proc. 22nd Int’l Conf.
Machine Learning (ICML), 2005.

[38]

R.B
. Catelli, “A Note on Correlation Clusters and Cluster Search Methods,” Psychometrika, vol. 9, no. 3,
pp. 169
-
184, 1944.

[39]

P. Sneath, “A Computer Approach to Numerical Taxonomy,” J. General Microbiology, vol. 17, pp. 201
-
226, 1957.

[40]

T.C. Havens, J.C. Bezdek,
J.M. Keller, M. Popescu, and J.M. Huband, “Is VAT Really Single Linkage in
Disguise?” Pattern Recognition Letters, 2008, in review.Liang Wang received the PhD
.

