feature extraction: Application

builderanthologyAI and Robotics

Oct 19, 2013 (3 years and 7 months ago)

64 views

Distributed, Competition
-
based
feature extraction: Application
to Personal Information
Retrieval

Travis Bauer

Sandia National Laboratory

(Research discussed today was done at Indiana University)

Learning Task Contexts:

Calvin


Learn what
characterizes a
user’s task
contexts


Unobtrusive
Observing


Keyword
Extraction


Index based
on Context



Cognitive Systems Relevance

Current Techniques


Application to
Cognitive Studies


LSI


Learning term
meanings


Eigenfaces


Static Corpora


Comprehensive
Statistics



WordSieve


Neural Network
-
like processing


Stream of data


Local learning


Competitive
Learning


Good Discriminator of Context

WordSieve Concept

User

Browsing

Attributes

Term

Activation

Priming

WordSieve 1

Words Absent in Document Sequences

Words Occurring in Document Sequences

Words Currently Occurring Frequently

Doc

Stream

User

Profile

Context

Profile

WordSieve 2

Words Reflecting Context

Words Currently Occurring Frequently

Doc

Stream

User

Profile

Context

Profile

Web Browsing Data Set


Sixteen Users


Four Topics, 10 minutes Each


Political Life Al Gore


Political Life George Bush


Traditional Indonesian Cooking


Traditional Thai Cooking

Categorized

Document

Set

Automatically

Generated

Queries

Browsing Results

Usenet Data Set

Three sets of 5 newsgroups


alt.atheism

talk.religion.misc

soc.religion.christian

rec.sport.baseball

rec.sport.hockey


comp.os.ms
-
windows.misc

comp.sys.ibm.pc.hardware

comp.sys.mac.hardware

rec.autos

rec.motorcycles


talk.politics.guns

talk.politics.misc

sci.electronics

sci.med

sci.space

Categorized

Document

Set

Automatically

Generated

Queries

Usenet Results

Contributions


It is possible to extract context
differentiating terms from document
streams using unsupervised competitive
learning.


A "bird's eye view" of the data is not
necessary in the described situations
given an ordering of the documents.


Performance is comprable to LSI and
better than Log
-
Entropy and TFIDF

Potential Next Steps


WordSieve


Automate Parameter Optimization


Co
-
occurrance of terms


Other Domains


Multi
-
dimensional data stream


Machine Vision


Other Issues


LSI

Support


This work was conducted under the advisement of David Leake at
Indiana University.


It was sponsored in part by the GAANN fellowship.


The original version of the personal information agent was
designed and written with partial support from NASA under award
No NCC 2
-
1035