Towards A Community of Machine Learners

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 4 χρόνια και 24 μέρες)

117 εμφανίσεις


PG Day Presentation

Zhili Wu


Supervisor: Dr. Chunhung Li

Cosupervisor: Prof. Jiming Liu

1/Oct/2004 ~ 10/Jan/2005

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Outline:


1. Introduction



Motivation & Objective



Background & Related Topics



Tentative Proposals


2. Experimental Study



BBS Data Study



MATLAB Programming Contest Platform Study


3. Future Work

4. Q & A

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Motivation:



1. Social (Interaction) Networks grow from cyberspace



remarkably fast



with large scale participation and data



e.g. Email, Blog, Instant Messaging (IM), and the WWW







2. Machine Learning algorithms



current trend : highly optimized and tuned



perform well in many classification, clustering scenarios



e.g. kernel machines, ICA, LDA……





Towards A Community of Machine Learners

Through Learning Online Communities of Practice

AI

Human

NN, GA, Agent ……

web mining …any more????

Motivation:



1. Social (Interaction) Networks grow from cyberspace



more underlying dynamics?



how to improve so as to query,communicate conveniently?









2. Machine Learning algorithms



can they show collective power rather than (over
-
)fitness?



can they help (1), and benefit from their effort to (1)?







Towards A Community of Machine Learners

Through Learning Online Communities of Practice

AI

Human

NN, GA, Agent ……

web mining …any more????

Objective:


Witness network (Internet) technology enabled social activities
are powerful, massive, learnable, going virtual ……


How can artificial learners like
machine learners

progress by
getting inspiration from the social setting of online human
learning?

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

More Specific Domain



1. Understand social interactions, mainly emphasize on
online

social interactions, through statistical and machine learning on data
collected from
online Communities (of practice)
.





2. Add more social factors learned from studying community data
into machine learners, hope they can form
a community of machine
learners
.


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

More background:



1. Machine Learners



(instances of a or a set of ) machine learning (ML)
algorithms



2. Typical ML algorithms doing follows:



Classification: predict the categories



Clustering: find groups/clusters



Regression: predict continuous outputs



Ranking: giving an order, recommending the best match



Feature selection: the best relevant descriptors



Other human learning they can computerize?

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

More background:



1. Community of Practice (COP)
[E Wenger 1991]



Social learning that occurs when people who have a
common interest in some subject or problem, collaborate to
share ideas, find solutions, and build innovations.

e.g.
an
apprenticeship

where an employee learns from job




2. COP today is more general, virtual but controversial



COP going online
[kimble 2001]



(Old) COP disappears
[Patricia 2004]



Online Community v.s. online COP


E.g. orkut? Blog? Wikipedia? Forum?


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

More background: How is COP studied?


A. Theories of COP derived from WWW studies:


1.
"
Power
-
Law

Distribution of the World Wide Web".
(Huberman, L. A. Adamic),
Science
,
287
: 2115 (2000).

2.
"Strong
Regularities

in World Wide Web Surfing",
(Huberman, P. Pirolli, J. Pitkow and R. M. Lukose),
Science
, (1998).

3.
"Evolutionary Dynamics of the World Wide Web",
(Huberman, L. A. Adamic), Nature 401,131 (1999).

4.
Modeling the Internet's large
-
scale topology Barabási
PNAS
99
, 13382
-
13386 (2002).



B. Specific COP issues:

1.

Finding Communities

in Linear Time: a Physics Approach
” (with Fang Wu),
Eur.Phys. Journal
B38, 331
-
338
(2004).

2.

Email as Spectroscopy: Automated
Discovery of Community Structure

within Organizations
” (with J. Tyler
and D. Wilkinson), in
Communities and Technologies

(2003).

3.
How To
Search a Social Network

Lada A. Adamic
and
Eytan Adar

4.
Identifying communities of practice

through ontology network analysis

IEEE INTELL SYST

2003


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

More Related Study:




a.
Ensemble Learning:


A collection of learners whose predictions are combined by
weighted averaging or voting




e.g. bagging, boosting, biting



b.
Distributed Learning


Mainly on partitioning data



c. Biologically Inspired Learning



Neural Network, etc. al
---

not social inspired learning


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Tentative Proposals:



1.
Find a specific online COP

2.
Study the general properties of online COP


3. Improve machine learning performance on online COP


Operate on multiple views of data


Enrich inter
-
learner communication


Role distribution of learners




4. Verify the collective power of learner combination


5.

……

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Ongoing Studies:


A. Chinese Bulletin Board Systems Study

1.
As the startup scope

2.
Study the general properties


3. Improve machine learning performance on them


Operate on multiple views of data


Enrich inter
-
learner communication


Role distribution of learners


B. A case study of MATLAB Programming Contest
Platform



1. To verify the collective power of learner combination



2. Observe the dynamics

Towards A Community of Machine Learners

Through Learning Online Communities of Practice


Chinese Bulletin Board Systems Study





Focus on university BBS at this stage



Relatively topic focused



Large data throughput, compatible platforms among many BBS



Accessible due to personal experiences



Less likely been studied by others, but much room to improve


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

中山大学
BBS An example thread/conversation


Job board:
http://bbs.zsu.edu.cn/bbstdoc?board=Job





转寄
/
推荐




转贴




删除
文章




修改
文章




回文章




同主
题阅读



发信人
:
redrain

(
猫咪

Aegean),

信区
: Job




:
广州日
报集团
2005
招聘活



发信站
:
逸仙
时空

Yat
-
sen Channel (Fri Dec 10 18:26:41 2004),
站内信件



http://www.dayoo.com/corp/job/

--


寒烟
渐淡草凝翠

琴音
袅袅猫相随

欢迎光临
Fantasy
版,
redrain
最新小
说连载中
……



来源
:
.逸仙
时空

Yat
-
sen Channel bbs.zsu.edu.cn

[FROM: bbs.nju.edu.cn]




转寄
/
推荐




转贴




删除
文章




修改
文章




回文章




同主
题阅读



发信人
:
ssky

(`~
\
/~~
\
oo/~~
\
/~`),

信区
: Job




:Re:
广州日
报集团
2005
招聘活



发信站
:
逸仙
时空

Yat
-
sen Channel (Fri Dec 10 18:30:51 2004),
转信



好多要有
经验的,冇乜合适的,
ft






redrain (
猫咪

Aegean)
的大作中提到
:


:



http://www.dayoo.com/corp/job/



--

锦瑟无端五十弦,一弦一柱思华年。

庄生
晓梦迷蝴蝶,望帝春心托杜鹃。

沧海月明珠有泪,蓝田日
暖玉生烟。

此情可待成追
忆,只是当时已惘然。


来源
:
.逸仙
时空

Yat
-
sen Channel bbs.zsu.edu.cn

[FROM: ssky@zsu ]


来源
:
.逸仙
时空

Yat
-
sen Channel bbs.zsu.edu.cn

[FROM: 192.168.48.35]


First author

content

signature

source

the user ip / domain

First replier

content

signature

source

Previous post citation:

title

Title with
Re:

added



Data Collection & Processing:


Each thread (
conversation
) has many posts



Content

&
Title




Author

&
Ref

(
the author being replied/referenced)


Each conversation as a document is word
-
segmented


-
> a
document
-
of
-
word (frequency)

matrix

All user
post
-
reply

relation accumulated in all conversations


a) 611 conversations, 7/Dec/2004, 729 authors, JOB board


M:
611 x 7892

; R:
729 x 729

b) 656 conversations, 9/Oct~16/Nov, GRADUATE board


M:

656 x 5644;
R:

536 x 536

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Graduate School Examination Board


High frequency words

Stop words:


high frequency but



unimportant words


Like stop words, there should be stop authors



1. author
-
author

post
-
reply matrix

2. author
-
document

association matrix

3. document
-
author

association matrix

4. document
-
document

cosine similarity matrix



P =

A modified ranking approach by constructing a matrix based on both

the author
-
author in
-
degree relation matrix and the matrices related

with document content

Maybe time information

can be used to describe

the conversation
-
conversation

dependence/reference.

Record the presence of authors


in each conversation, aim to

connect part 1 with 4

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Top 10 authors among 729:

In
-
degree out
-
degree

Typical way on the document of word matrix:

Job + Graduate document of word matrix, 1267 x 10693




Linear kernel,


10 CV,
91.0813%

Normalized Lin
-
ker

10 CV, 91.1602%

Poly 2,


10 CV, 52.0126%

RBF



10 CV, 55.8011%



After removing 1552 stop words:

Linear kernel


10CV

90.6867%

Normalized Lin
-
ker

10CV

91.6338%


Solely based on document of author frequency

Linear kernel


10CV

78.5320%



Document of term frequency

matrix



document
-
author

association matrix

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Pad the document


author association matrix




Linear kernel,


10 CV,
91.0813%


92.7388%


Normalized Lin
-
ker

10 CV, 91.1602%

Poly 2,


10 CV, 52.0126%

RBF



10 CV, 55.8011%



After removing 1552 stop words:


Lin kernel

10CV

90.6867%

Normalized Lin
-
ker

10CV

91.6338%


94.8698%


document
-
author

association matrix

Document of term frequency

matrix



Observations are


A. Chinese Bulletin Board Systems Study

1.
As the startup scope




data are retrievable, manageable


2.
Study the general properties




zipf law, but not enough, better ranking scheme possible


3. Improve machine learning performance on them


Operate on multiple views of data

-

combining words with author information improves


classification


Enrich inter
-
learner communication


Role distribution of learners

Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Matlab Programming contest Case Study:



Background:


A biannual event, an online platform


Entries are given real time evaluations


Participants are allowed to



build their own solvers



modify (tweak) others’ submissions


Furniture contest:


1270 passing entries


A week duration




Towards A Community of Machine Learners

Through Learning Online Communities of Practice

These Three Pictures from http://www.mathworks.com/contest/furniture/analysis.html

Performance gradually improved, submissions form about six clans

Matlab Programming contest Case Study:


1.
Six inferred clans have evolutional meaning


1.
To which extent, an entry claims its reference to a
previous entry?


We build a connection matrix based on all reference
information



Assume directional reference implies bidirectional relation valued with 1, otherwise
Inf.

Calculate the shortest distances among each pair of submissions

Approximate a similarity matrix based on kernel operation on the distance matrix


Towards A Community of Machine Learners

Through Learning Online Communities of Practice







The number of ‘function’ defined and used in an entry







Among 150 functions

appeared, some
quickly faded out,
some came later and
some survived





With frequencies taken into account, some functions are persistently

popular, some can be seen to decay more quickly

Observations:



B. A case study of MATLAB Programming Contest
Platform



1. To verify the collective power of learner combination



a.

Apparent collective efforts of participants




b. Entries are composed of multiple sub
-
functions







2. Observe the dynamics




a.

Evolution patterns of entries: six big shifts




b. Some functions play active roles, some quickly decay



Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Summary


1. Introduction



Motivation & Objective



Background & Related Topics



Tentative Proposals


2. Experimental Study



BBS Data Study




MATLAB Programming Contest Platform Study


Towards A Community of Machine Learners

Through Learning Online Communities of Practice

Future work


1.
To formulate and specify the objective better

1.
Is it too vague, too large ?

2.
Is it feasible, useful?


2.
Build models/frameworks/methodologies



3.
Apply to workable scenarios


Towards A Community of Machine Learners

Through Learning Online Communities of Practice




Q & A


And


Your Comments & Suggestions


Towards A Community of Machine Learners

Through Learning Online Communities of Practice