clicking here. - the Babson College Faculty Web Server

cracklegulleyAI and Robotics

Oct 19, 2013 (3 years and 10 months ago)

64 views

KEYWORD


BASED FILTERING

Content base filtering that uses keyword counts from documents as
representations of items.


Advantages


Mature technology


Works as well as more sophisticated content
-
filtering technologies
in high
-
quality document domains.


Disadvantages


Only works in document domains


Cannot capture subjective notions of quality, style of documents
being filtered.


Applications


Filtering high
-
quality news wires and document databases.


Web search engines





NEURAL NETWORKS:
Highly sophisticated content
-
based filtering
technology that can use any arbitrary attribute information about items
being filtered.


.


Advantages


Very powerful technology (can work with many kinds of items

having attribute information).


Given sufficient training examples, can learn almost any concept.


Disadvantages


Require long training


“Black boxes”: no way to determine what exactly they have


learned.


Not scaleable (works only for small samples).


Cannot capture subjective notions..


Applications


Filters any information stream where the items are tagged with

attributes (documents, credit records, etc.) or contain keywords.






ACTIVE COLLABORATIVE FILTERING:
“Manual” collaborative
filtering technique where users explicitly identify other users in the
community whose opinions they are interested in.


.


Advantages


Works well for small communities where users know each other

and their areas of expertise.


Combines elements of feature
-
based filtering with opinion
-
based

filtering.


Disadvantages


Not feasible in large communities of users.


The burden of identifying the appropriate members of the


community and constructing the appropriate query rests on the

user.


Applications


Information and document sharing in small workgroup


environments.






AUTOMATED COLLABORATIVE FILTERING (ACF
An
automated version of “word of mouth,” where the technology uses the
opinions of a large community to filter items for each person.




Advantages


Incorporates subjective notions of quality into the filtering


process.


Very effective for domains where items cannot be easily


analyzed by computer or that are highly subjective.


Disadvantages


No knowledge about the “kinds of items” being filtered: can lead

incorrect results.



Technology cannot utilize additional information about the items

even when it may be available and relevant.


Applications


Highly subjective domains (music, travel…).


Domains that are not amenable to machine analysis (e.g., video).


Domains where the perceived quality of items fluctuates very


widely (e.g., Web sites)..






FEATURE
-
GUIDED AUTOMATED COLLABORATIVE
FILTERING (FGACF):
Technology that utilizes features of items to
partition items and more effectively apply the ACF algorithm.




Advantages


Utilizes available feature information to partition the item space

to apply ACF effectively. Combines strengths of simple content
-

based filtering with those of collaborative filtering while


addressing the limitations of standard ACF.


Disadvantages


Feature information used must be relevant to partitioning the


item space.


Applications


“Broad” subjective domains (Web sites, books, restaurants)


where additional feature information is available.


Any domain standard ACF applies to.


Rule
-
based Technology



Rule 1:

If visitor age under 40 and not married and income
greater than $100,000, show a Mercedes ad.

Rule 2:

If visitor age under 40 and married and income not
greater than $100,000, show a Plymouth ad.


What will happen if visitor age under 40 and not
married and income not greater than $100,000?
Maybe show a VW ad, but the rule must be
explicitly given!


Rule
-
based Technology



What will happen if visitor age under 40 and not
married and income not greater than $100,000?
Maybe show a VW ad, but the rule must be
explicitly given!

There are algorithms, such as ID3, which will
generate a set of business rules based on a list of
example cases. These rules then can be examined
to verify their validity. Neural networks can
perform the same type of classification but are
“black boxes”, the business rules are not explicit.

Collaborative Filtering Algorithm



How do we use rating to make predictions?

How do we predict Ken’s rating for product 6?

Collaborative Filtering Algorithm



Collaborative Filtering Algorithm



We use the correlation coefficients. Notation RKL


is the correlation between Ken & Lee.

But how?

Collaborative Filtering Algorithm




Did the user like or dislike the product?


How close is the user’s rating to his/her average?


E.G. Lee’s AVG = 3, and gave Product 6 a 2,



so use
-
1. Write L6
-
Lavg = 2
-

3 =
-
1.


Weighted average from Ken’s average:

K6 = Kavg + (L6
-
Lavg)RKL + (M6
-
Mavg)RKM +



+ (N6
-
Navg)RKN


= 3 + (2
-
3)(
-
.8) + (5
-
3)(.33) + (3
-
2.6)(0)


= 3 + .8 + .66


= 4.46

Neural Network Algorithm



A diagram of a single
-
layer neural network.




x
i

is the signal level at input i (attribute i).

w
i

is the weight associated with input i.

w
i
(t) is the weight associated with input i at time t.



is a threshold level.


y =





Neural Network Algorithm



Neural nets “learn” by adjusting the values of the
weights. Initially the values of the weights are set
to small random values. Training (learning)
involves the readjustment of the input weights to
develop the correct response to the training set.






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






Neural Network Algorithm






15..

Calculate output y



w
i
x
i

= (0.1)(1) + 2.0(1) + (
-
1.7)(0) = 2.1


and 2.1 >

. So y(4)= 1. Therefore the network also recommends
Plymouth correctly and we can stop.


15 Dimensions of Data Quality

(in no actual order of importance)


First 5:


Believability (believable)


Accuracy (data are certified error
-
free, accurate,
correct, flawless, reliable, errors can be easily
identified, the integrity of the data, precise)


Timeliness (age of data)


Accessibility (accessible, retrievable, speed of
access, available, up
-
to
-
date)


Value

added (data give you a competitive edge,
data add value to your operations)


15 Dimensions of Data Quality


Second 5:


Relevancy (applicable, relevant, interesting, usable)


Objectivity (unbiased, objective)


Concise (well
-
presented, concise, compactly
represented, well
-
organized, aesthetically pleasing,
form of presentation, well
-
formatted, format of the
data)


Appropriate amount of data (the amount of data)


Representational consistency (data are continuously
presented in same format, consistently represented,
consistently formatted, data are compatible with
previous data)


15 Dimensions of Data Quality


Last 5:


Ease of understanding (easily understood, clear,
readable)


Interpretability (interpretable)


Completeness (breadth, depth, and scope of
information contained in the data)


Reputation (reputation of the data source, reputation
of the data)


Access security (data cannot be accessed by
competitors, data are of a proprietary nature, access
to data can be restricted, secure)