Effective Pattern Discovery For Text Mining

ticketdonkeyΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

57 εμφανίσεις

Effective Pattern Discovery

For

Text Mining

Aim:
-

Text mining is the discovery of interesting knowledge in text documents.

Abstract:


In our work, the text data of text mining has gradually become a new follow a
line of investigation.

Text

clustering can greatly simplify browsing large
collections of documents by reorganizing them into a smaller number of
patterns in text documents


manageable clusters.

Text clustering is mainly used for
a document clustering system which clusters the set of documents based on
the user typed key term.

Firstly the system preprocesses the set of documents
and the user given terms.
We use the feature evaluat
ion to reduce the

dimensionality of high
-
dimensional text vector.

The system then identifies the
term frequency and then those frequencies are weighted by using the
inverted document frequency method. Then this weight of documents is us
ed for
clustering.
Feature clustering is a powerful method to reduce the dimensionality of
feature vectors for text classification.

Presents

an innovative and effective pattern discovery

technique which includes
the processes of pattern deploying and patt
ern evolving

in

this paper, we propose a
fuzzy similarity
-
based self
-
constructing algorithm for feature clustering. The words
in the feature vector of a document set are grouped into clusters, based on
similarity test. Words that are similar to each other
are grouped into the same
cluster. Each cluster is characterized by a membership function with statistical
mean and deviation. When all the words have been fed in, a desired number of
clusters are formed automatically. We then have one extracted feature fo
r each
cluster. The extracted feature, corresponding to a cluster, is a weighted
combination of the words contained in the cluster. Experimental results show that
our method is applied to the text clustering, making the results of clustering more
efficient

& accurate and stable than the existing algorithm.

Existing System:

1)


Existing text clustering uses the frequ
ent word sets to cluster
the
documents.

2)

Many well known clustering algorithms deal with documents as bag
of words and ignore the important
relationships between words like
synonyms.

3)

Existing algorithm has a higher probability of grouping unrelated
documents into the same cluster.


Proposed System:

1)

Our proposed text clustering has a frequent concept to cluster the text
documents.

2)

The proposed
technique uses two processes, pattern deploying and
pattern evolving, to refine the discovered patterns in text documents.

3)

Our Proposed algorithm utilizes the semantic relationship between
words to create concepts.

4)

The Relationship between words like synon
yms, hypernymy, also be
identified & hypernymy is most effective for Text clustering.

5)

Associating a meaningful label to each final cluster is more essential.
Then, the high dimensionality of text documents should be reduced.

6)

A clustering algorithm works wi
th frequent concepts rather than
frequent items used in traditional text mining techniques.

7)

FCDC found more accurate, scalable and effective when compared
with existing text clustering algorithms.


Modules:
-


1)

Registration

2)

User

3)

Authentication

SYSTEM
REQUIREMENT:

Hardware Requirements



System


: Pentium IV 2.4 GHz



Hard disk


: 40 GB



Monitor


: 15 VGA colour



Mouse


: Logitech.



Ram



: 256 MB



Keyboard


: 110 keys enhanced.









Software Requirements




Operating System :
Windows




Programming language:
c#.Net




Web
-
Technology:
ASP




Front
-
End:
ASP.NET




Back
-
End:
SQL SERVER