Abstract - Technofist

stemswedishΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 8 μήνες)

78 εμφανίσεις




This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Face book, Twitter, Flicker,

YouTube present opportunities and challenges to study collective behavior on a large scale. In
this work, we aim to learn to predict collective behavior in social media. In particular, given
information about some individuals, how can we infer the beh
avior of unobserved individuals in
the same network? A social
dimension based approach has been shown effective in addressing
the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, i
nvolving hundreds of thousands of actors. The scale of
these networks entails scalable learning of models for collective behavior prediction. To address
the scalability issue, we propose an

centric clustering scheme to extract
. With sparse social dimensions, the proposed approach can efficiently handle
networks of millions of actors while demonstrating a comparable prediction performance to other
scalable methods.



1. Algorithm for Learning of
Collective Behavior


network data, labels of some nodes, number of social



labels of unlabeled nodes.

1. Convert network into edge
centric view.

2. Perform edge clustering

3. Construct social dimensions based on edge partition n
ode belongs to

one community as long
as any of its neighboring edges is in that


4. Apply regularization to social dimensions.

5. Construct classifier based on social dimensions of labeled nodes.

6. Use the classifier to predict labels of unlabel
ed ones based on their

social dimensions.

Literature Survey:

Leveraging User
specified Metadata to Personalize Image Search

The social media sites, such as Flickr and del.icio.us, allow users to upload content and annotate
it with

descriptive labels
known as tags, join special
interest groups, etc. We believe user
generated metadata

expresses user’s tastes and interests and can be used to personalize
information to an individual user.

Specifically, we describe a machine learning method that
analyzes a

corpus of tagged content to find

hidden topics. We then these learned topics to select
content that matches user’s interests. We empirically

validated this approach on the social photo
sharing site Flickr, which allows users to annotate images

with freely

chosen tags and to search
for images labeled with a certain tag. We use metadata associated

with images tagged with an
ambiguous query term to identify topics corresponding to different senses of

the term, and then
personalize results of image search by d
isplaying to the user only those images that are

interest to her.

2. Automatic Identification of User Interest For Personalized Search

One hundred users, one hundred needs. As more and more

topics are being discussed on the web
and our vocabulary

remains relatively stable, it is increasingly difficult to let the

search engine
know what we want. Coping with ambiguous

queries has long been an important part in the
research of

Information Retrieval, but still remains to be a challenging

task. Personal
ized search
has recently got significant atten

tion to address this challenge in the web search community,

based on the premise that a user’s general preference may

help the search engine disambiguate
the true intention of a

query. However, studies have s
hown that users are reluctant

to provide any
explicit input on their personal preference. In

this paper, we study how a search engine can learn
a user’s

preference automatically based on her past click history and

how it can use the user
preference to pers
onalize search re

sults. Our experiments show that users’ preferences can be

learned accurately even from small click
history data and

personalized search based on user
preference yields signif

icant improvements over the best existing ranking mecha

m in the


A Community
Based Approach to Personalizing Web Search

over the past few years, current Web search

engines have become the dominant tool for

accessing information online. However, eve

today’s most successful search engines strugg

to provide high
quality search results: Approximately

50 percent of Web search sessions fail to

any relevant results for the searcher.

The earliest Web search engines adopted an

retrieval view of search, using sophisticated termbased

matching techniques to
identify relevant documents

from repeated occurrences of salient query terms.

Although such
techniques proved useful for identifying

a set of potentially relevant results, they offered little

insight into how such results could be us
efully ranked.

How then should documents be ranked
and ordered?

Some researchers1,2 solved this problem when they realized

that ranking could be
greatly improved by evaluating the

importance or authoritativeness of a particular document.

analyzing the l
inks in and out of a document, it became

possible to evaluate its relative
importance within the wider

Web. For example, Google’s famous PageRank metric

assigns a
high page
rank score to a document if it is itself

linked to by many other documents with a h

score, and it iteratively evaluates the page
rank scores for

every document in its index
for use during results ranking.

Other researchers began exploring alternative ranking

One notable alternative, implemented in

the Direct Hit sea
rch engine, argued that search results

ld be ranked by their popularity
Existing System:

As existing approaches to extract social dimensions suffer from

scalability, it is imperative to
address the scalability issue. Connections in

social media
are not homogeneous. People can
connect to their family,

colleagues, college classmates, or buddies met online. Some relations are

helpful in determining a targeted behavior while others are not. This relation


however, is often not readil
y available in social media. A

direct application of collective
inference or label propagation would treat

connections in a social network as if they were


Social dimension suffer from scalable in heterogeneity.

This heteroge
neity of connections limits the effectiveness.

Proposed System:

A recent framework based on
social dimensions
is shown to be

effective in addressing this
heterogeneity. The framework suggests a novel

way of network classification: first, capture the
latent affiliations of actors by

extracting social dimensions based on network connectivity, and
next, apply

extant data mining techniques to classification based on the extracted


In the initial study, modularity maximization was employed to ex

social dimensions. The
superiority of this framework over other

representative relational learning methods has been
verified with social

media data in. The original framework, however, is not scalable to handle

networks of colossal sizes because the
extracted social dimensions are rather

dense. In social
media, a network of millions of actors is very common. With

a huge number of actors, extracted
dense social dimensions cannot even be

held in memory, causing a serious computational

ng social dimensions can be effective in eliminating the

scalability bottleneck.
In this work, we propose an effective

approach to extract
social dimensions.
We prove that with our

proposed approach, sparsity of social dimensions is gua


An incomparable advantage of our model is that it easily scales

to handle networks with millions
of actors while the earlier

models fail. This scalable approach offers a viable solution to

learning of online collective beha
vior on a large scale.


Social dimension extraction:

The latent social dimensions are extracted based on network

topology to capture the potential
affiliations of actors. These extracted

social dimensions represent how each actor is involved in


affiliations. These social dimensions can be treated as features of

actors for subsequent
discriminative learning. Since a network is

converted into features, typical classifiers such as
support vector

machine and logistic regression can be employ
ed. Social dimensions

according to soft clustering, such as modularity

maximization and probabilistic methods, are

Discriminative learning:

The discriminative learning procedure will determine which

social dimension correlates with the

targeted behavior and then assign

proper weights. A key observation is that actors of the same

tend to connect with each other. For instance, it is reasonable to

expect people of the
same department to interact with each other

more frequently.

A key observation is that actors of
the same

affiliation tend to connect with each other. For instance, it is

reasonable to expect
people of the same department to interact with

each other more frequently. Hence, to infer
actors’ latent affiliations,

we n
eed to find out a group of people who interact with each other

more frequently than at random.

Chart Generation for Group/Month:

Two data sets reported in are used to examine our proposed

model for collective behavior
learning. The first data set is

from user interest, the second from concerning behavior;
we study

whether or not a user visits a group of interest. Then generates chart

the based on the
user visit group in the month.

Chart Generation for User/Group:

Two data sets reported in

are used to examine our proposed

model for collective behavior
learning. The first data set is acquired

from user interest, the second from concerning behavior;
we study

whether or not a user visits a group of interest. Then generates chart

the based on t
user visit group in the month.

System Requirements:

Hardware Requirements:


: Intel Duel Core.

Hard Disk

: 60 GB.

Floppy Drive

: 1.44 Mb.


: LCD Colour.


: Optical Mouse.


: 512 Mb.

Software Requirements:

Operating system

: Windows XP.

Coding Language

: ASP.Net with C#

Data Base

: SQL Server 2005