Abstract - Technofist

stemswedishΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

63 εμφανίσεις


SCALABLE


LEARNING OF COLLECTIVE BEHAVIOUR



Abstract:


This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Face book, Twitter, Flicker,

and
YouTube present opportunities and challenges to study collective behavior on a large scale. In
this work, we aim to learn to predict collective behavior in social media. In particular, given
information about some individuals, how can we infer the beh
avior of unobserved individuals in
the same network? A social
-
dimension based approach has been shown effective in addressing
the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, i
nvolving hundreds of thousands of actors. The scale of
these networks entails scalable learning of models for collective behavior prediction. To address
the scalability issue, we propose an

edge
-
centric clustering scheme to extract
sparse
social
dimensions
. With sparse social dimensions, the proposed approach can efficiently handle
networks of millions of actors while demonstrating a comparable prediction performance to other
non
-
scalable methods.

Architecture:





Algorithm:

1. Algorithm for Learning of
Collective Behavior

Input:

network data, labels of some nodes, number of social

dimensions;

Output:

labels of unlabeled nodes.

1. Convert network into edge
-
centric view.

2. Perform edge clustering
.

3. Construct social dimensions based on edge partition n
ode belongs to

one community as long
as any of its neighboring edges is in that

community.

4. Apply regularization to social dimensions.

5. Construct classifier based on social dimensions of labeled nodes.

6. Use the classifier to predict labels of unlabel
ed ones based on their

social dimensions.



Literature Survey:

Leveraging User
-
specified Metadata to Personalize Image Search


The social media sites, such as Flickr and del.icio.us, allow users to upload content and annotate
it with

descriptive labels
known as tags, join special
-
interest groups, etc. We believe user
-
generated metadata

expresses user’s tastes and interests and can be used to personalize
information to an individual user.

Specifically, we describe a machine learning method that
analyzes a

corpus of tagged content to find

hidden topics. We then these learned topics to select
content that matches user’s interests. We empirically

validated this approach on the social photo
-
sharing site Flickr, which allows users to annotate images

with freely

chosen tags and to search
for images labeled with a certain tag. We use metadata associated

with images tagged with an
ambiguous query term to identify topics corresponding to different senses of

the term, and then
personalize results of image search by d
isplaying to the user only those images that are

of
interest to her.


2. Automatic Identification of User Interest For Personalized Search


One hundred users, one hundred needs. As more and more

topics are being discussed on the web
and our vocabulary

remains relatively stable, it is increasingly difficult to let the

search engine
know what we want. Coping with ambiguous

queries has long been an important part in the
research of

Information Retrieval, but still remains to be a challenging

task. Personal
ized search
has recently got significant atten
-

tion to address this challenge in the web search community,

based on the premise that a user’s general preference may

help the search engine disambiguate
the true intention of a

query. However, studies have s
hown that users are reluctant

to provide any
explicit input on their personal preference. In

this paper, we study how a search engine can learn
a user’s

preference automatically based on her past click history and

how it can use the user
preference to pers
onalize search re
-

sults. Our experiments show that users’ preferences can be

learned accurately even from small click
-
history data and

personalized search based on user
preference yields signif
-

icant improvements over the best existing ranking mecha
-

nis
m in the
literature.


3.

A Community
-
Based Approach to Personalizing Web Search


over the past few years, current Web search

engines have become the dominant tool for

accessing information online. However, eve

today’s most successful search engines strugg
le

to provide high
-
quality search results: Approximately

50 percent of Web search sessions fail to
find

any relevant results for the searcher.

The earliest Web search engines adopted an
information
-

retrieval view of search, using sophisticated termbased

matching techniques to
identify relevant documents

from repeated occurrences of salient query terms.

Although such
techniques proved useful for identifying

a set of potentially relevant results, they offered little

insight into how such results could be us
efully ranked.

How then should documents be ranked
and ordered?

Some researchers1,2 solved this problem when they realized

that ranking could be
greatly improved by evaluating the

importance or authoritativeness of a particular document.

By
analyzing the l
inks in and out of a document, it became

possible to evaluate its relative
importance within the wider

Web. For example, Google’s famous PageRank metric

assigns a
high page
-
rank score to a document if it is itself

linked to by many other documents with a h
igh
page
-
rank

score, and it iteratively evaluates the page
-
rank scores for

every document in its index
for use during results ranking.

Other researchers began exploring alternative ranking

options.
One notable alternative, implemented in

the Direct Hit sea
rch engine, argued that search results

shou
ld be ranked by their popularity
amo
n
g
Existing System:

As existing approaches to extract social dimensions suffer from

scalability, it is imperative to
address the scalability issue. Connections in

social media
are not homogeneous. People can
connect to their family,

colleagues, college classmates, or buddies met online. Some relations are

helpful in determining a targeted behavior while others are not. This relation

type

information,
however, is often not readil
y available in social media. A

direct application of collective
inference or label propagation would treat

connections in a social network as if they were
homogeneous.


Disadvantages:



Social dimension suffer from scalable in heterogeneity.


This heteroge
neity of connections limits the effectiveness.





Proposed System:

A recent framework based on
social dimensions
is shown to be

effective in addressing this
heterogeneity. The framework suggests a novel

way of network classification: first, capture the
latent affiliations of actors by

extracting social dimensions based on network connectivity, and
next, apply

extant data mining techniques to classification based on the extracted

dimensions.

In the initial study, modularity maximization was employed to ex
tract

social dimensions. The
superiority of this framework over other

representative relational learning methods has been
verified with social

media data in. The original framework, however, is not scalable to handle

networks of colossal sizes because the
extracted social dimensions are rather

dense. In social
media, a network of millions of actors is very common. With

a huge number of actors, extracted
dense social dimensions cannot even be

held in memory, causing a serious computational
problem.

Sparsifyi
ng social dimensions can be effective in eliminating the

scalability bottleneck.
In this work, we propose an effective
edge
-
centric

approach to extract
sparse
social dimensions.
We prove that with our

proposed approach, sparsity of social dimensions is gua
ranteed.


Advantages:


An incomparable advantage of our model is that it easily scales

to handle networks with millions
of actors while the earlier

models fail. This scalable approach offers a viable solution to

effective
learning of online collective beha
vior on a large scale.


Modules:

1.
Social dimension extraction:

The latent social dimensions are extracted based on network

topology to capture the potential
affiliations of actors. These extracted

social dimensions represent how each actor is involved in

diverse

affiliations. These social dimensions can be treated as features of

actors for subsequent
discriminative learning. Since a network is

converted into features, typical classifiers such as
support vector

machine and logistic regression can be employ
ed. Social dimensions

extracted
according to soft clustering, such as modularity

maximization and probabilistic methods, are
dense.


2.
Discriminative learning:

The discriminative learning procedure will determine which

social dimension correlates with the

targeted behavior and then assign

proper weights. A key observation is that actors of the same
affiliation

tend to connect with each other. For instance, it is reasonable to

expect people of the
same department to interact with each other

more frequently.

A key observation is that actors of
the same

affiliation tend to connect with each other. For instance, it is

reasonable to expect
people of the same department to interact with

each other more frequently. Hence, to infer
actors’ latent affiliations,

we n
eed to find out a group of people who interact with each other

more frequently than at random.

3.
Chart Generation for Group/Month:

Two data sets reported in are used to examine our proposed

model for collective behavior
learning. The first data set is
acquired

from user interest, the second from concerning behavior;
we study

whether or not a user visits a group of interest. Then generates chart

the based on the
user visit group in the month.


4.
Chart Generation for User/Group:

Two data sets reported in

are used to examine our proposed

model for collective behavior
learning. The first data set is acquired

from user interest, the second from concerning behavior;
we study

whether or not a user visits a group of interest. Then generates chart

the based on t
he
user visit group in the month.



System Requirements:

Hardware Requirements:

Processor



: Intel Duel Core.

Hard Disk

: 60 GB.

Floppy Drive

: 1.44 Mb.

Monitor

: LCD Colour.

Mouse

: Optical Mouse.

RAM


: 512 Mb.


Software Requirements:

Operating system

: Windows XP.

Coding Language

: ASP.Net with C#

Data Base


: SQL Server 2005