Decentralized Probabilistic Text Clustering

tealackingΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 1 μήνα)

77 εμφανίσεις

Decentralized Probabilistic Text
Clustering

ABSTRACT


Text clustering is an established technique for improving quality in
information retrieval, for both centralized and distributed environments.
However, traditional text clustering algorithms fail to sca
le on highly
distributed environments, such as peer
-
to
-
peer networks. Our algorithm for
peer
-
to
-
peer clustering achieves high scalability by using a probabilistic
approach for assigning documents to clusters. It enables a peer to compare
each of its docume
nts only with very few selected clusters, without
significant loss of clustering quality. The algorithm offers probabilistic
guarantees for the correctness of each document assignment to a cluster.
Extensive experimental evaluation with up to 1 million pee
rs and 1 million
documents demonstrates the scalability and effectiveness of the algorithm.

ARCHITECTURE






SYSTEM ANALYSIS


EXISTING SYSTEM
:


Text clustering is widely employed for automatically

structuring large
document collections and enabling

clust
er
-
ba
sed information browsing,
which
alleviates the

problem of information overflow.

In previous work
process search
words are does not clearly displayed. And all
related
information to view user
searching process
.



Problems on existing system:

1.

Any pro
cess operates only resisted candidates.

2.
Not be excellent data display.

3.
No secure process.


PROPOSED SYSTEM
:

It enables a peer to compare each of i
ts documents only with very few
selected clusters, without significant loss of

clustering
quality. The a
lgorithm
offers probabilistic guarantees for the correctness

of each document
assignment to
a cluster.

In this process search keywords to display highest
ranking based in registered user. To get output in user and publisher ranking
basic to be secure and v
iew your search data also rank with us
. Both cluster
indexing and document assignment
s are

repeated periodically to compensate
churn, and

to maintain an up
-
to
-
date clustering solution.

Examples of
document clustering include web document clustering for sea
rch users.

Websites where the main purpose is to vote content
and images
are called
rating sites
. Ratings are implemented in separate users and publishers.

Main Modules:


1.

ADMIN MODUL
E

2.

PUBLISHER MODULE

3.

TEXT CLUSTERING

4.

SEARCH MODULE




1.

ADMIN MODULE:

In this module, Admin should login with his specified AdminName
and with his specified password. And accept the publisher uploads data.
Admin can check the publisher’s and user’s browsing h
istory. And admin
give access to register user and publisher.



2.

PUBLISHER MODULE

:



In this module, is used to enter the
publisher in our own registered
website and also edit our password details. The module publisher is used to
upload

any famous details

and images and view
admin

can check and verify
all details and images to accept our uploaded files
.
I
t

is allowed to accept
and then view in all users and publishers. Because secure process in
searching.







3.

TEXT CLUSTERING

:


Text clustering

is automa
tic document organization,
topic

extraction and
fast

information retrieval

or filtering. It is close
ly related to
data clustering
.

Document clustering is generally considered to be a centralized process.
Examples of document clustering include web document clustering for
se
arch users.

Websites where the main purpose is to vote content
and images
are called
rating sites
.


4.

SEARCH MODULE

:


To create and upload the famous information details in any intere
st
publisher process. And the updating detail to view any publisher and user
enter into Search key and all very highest voting data initially displayed in
links.And click any order of the links connect to view uploaded datas and
accepted images. Our requir
ed data and images are download in any
registered users and publishers. Mainly may like this contents and images
also voting our search view sites.






5.


SYSTEM SPECIFICATION


Hardware Requirements:




System


:


Pentium IV 2.4 GHz.



Hard Disk

:

40 GB.



F
loppy Drive

:

1.44 Mb.



Monitor

:


1
4’

Colour

Monitor
.



Mouse


:


Optical Mouse
.



Ram


:


512 Mb.



Keyboard

: 101 Keyboard.


Software Requirements:




Operating system

:

Windows
XP and IIS



Coding Language

:

ASP.Net with C# SP1



Data Base


:

SQL Serv
er 2005.