RapStar's Solution to Data Mining Hackathon on Best Buy Mobile Site

desertcockatooData Management

Nov 20, 2013 (3 years and 10 months ago)

94 views

RapStar’s
Solution
to Data Mining
Hackathon on Best Buy Mobile Site

Kingsfield
, Dragon

Beat Benchmark

Beat Benchmark


Naive Bayes


We want to know the probability

𝑖
𝑐

that user
click sku
𝑖

under context
𝑐
.


We use query



as context first.


So we have:



𝑖
𝑐


(
𝑖
)
×


(
𝑤
𝑘
|
𝑖
)
𝑤
𝑘





Select 5 item with highest predicted probability as
prediction.

Use Time information


Time is a good feature in data mining.


Use Time information


Divided
data
into
12
time periods based on
click_time

field


Use frequency at time period where
click_time

belongs to as
“prior”
instead of
global
frequency.

Use Time information


Smooth data


Unigram to Bigram


Likelihood of Naive Bayes:




𝑖
=


(
𝑤
𝑘
|
𝑖
)
𝑤
𝑘





Here
𝑤
𝑘

is word.


Use Bigram instead of Unigram(word).


Use query “
xbox

call of duty”


Rerank
: “call duty of
xbox



Bigram: [“call duty”, ”call of”, ”call
xbox
”… “of
xbox
”]


Once We have bigram training data, the rest is the same as
unigram


Blending unigram and bigram:



=
𝑤
1
×

𝑢𝑖𝑔
+
𝑤
2
×


𝑖𝑔


Data Processing


The most important part: Query Correction


Lemmatization


Split words and number


Query correction(in small version)


A lot of thing that can help to improve:


“x box”, “x men”


New algorithm for query correction


Rank predictions that user clicked lower.


Conclusion


Data Preprocessing and feature Engineering
are most important things.