RapStar's Solution to Data Mining Hackathon on Best Buy Mobile Site

desertcockatooΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

89 εμφανίσεις

RapStar’s
Solution
to Data Mining
Hackathon on Best Buy Mobile Site

Kingsfield
, Dragon

Beat Benchmark

Beat Benchmark


Naive Bayes


We want to know the probability

𝑖
𝑐

that user
click sku
𝑖

under context
𝑐
.


We use query



as context first.


So we have:



𝑖
𝑐


(
𝑖
)
×


(
𝑤
𝑘
|
𝑖
)
𝑤
𝑘





Select 5 item with highest predicted probability as
prediction.

Use Time information


Time is a good feature in data mining.


Use Time information


Divided
data
into
12
time periods based on
click_time

field


Use frequency at time period where
click_time

belongs to as
“prior”
instead of
global
frequency.

Use Time information


Smooth data


Unigram to Bigram


Likelihood of Naive Bayes:




𝑖
=


(
𝑤
𝑘
|
𝑖
)
𝑤
𝑘





Here
𝑤
𝑘

is word.


Use Bigram instead of Unigram(word).


Use query “
xbox

call of duty”


Rerank
: “call duty of
xbox



Bigram: [“call duty”, ”call of”, ”call
xbox
”… “of
xbox
”]


Once We have bigram training data, the rest is the same as
unigram


Blending unigram and bigram:



=
𝑤
1
×

𝑢𝑖𝑔
+
𝑤
2
×


𝑖𝑔


Data Processing


The most important part: Query Correction


Lemmatization


Split words and number


Query correction(in small version)


A lot of thing that can help to improve:


“x box”, “x men”


New algorithm for query correction


Rank predictions that user clicked lower.


Conclusion


Data Preprocessing and feature Engineering
are most important things.