Challenges in Commerce Search

nostrilshumorousInternet και Εφαρμογές Web

18 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

49 εμφανίσεις



Hugh E. Williams

Vice President, Experience, Search, and Platforms

@
hughewilliams
,
hugh@hughwilliams.com

Challenges in Commerce Search

eBay Today

Of data in our
Hadoop

and Teradata
clusters

Page views each day

Database calls each day

50+ petabytes

2+ billion

75+ billion

250 million

Queries per day









$1
trillion

Commerce

$10
trillion


The
opportunity

ahead is huge


Source: Economist Intelligence Unit, Morgan Stanley

Note: Market sizes as of 2012, Compounded Annual Growth Rates from 2012 to 2015

Online
Commerce


Today’s Search

Turnaround contributor

Series of improvements

Ten year old technology

Conversion

up
13%

Better
Search

2010

Simple
Flows

Better
Images

Merch’ing

Other

2012

Improving Search from 2009 to 2012


User
experience changes


Imagery


Reorganization


Optimization


Major page refresh


Speed


Search science


Query understanding and rewriting


Understanding user intent


Behavioral measurement


Substantial
ranking improvements (particularly to Fixed Price ranking)


And all on a 10+ year old platform named
Voyager

Query Understanding and Rewriting


Our search
engine was
literal


We’re on a journey to make it more
intuitive


Idea
: Mine our
query
-
session data
, look for patterns, and use
these to map words in user queries to
synonyms

and
structured
data

Query Rewrite

Search

User Query

eBay Results

Search Query

PATTERNS: QUERY REWRITES …


How do buyers purchase the
pilzlampe
?


It turns out, they
do one
of a few things:


Type
pilzlampe
, and purchase


Type
pilzlampe
, … ,
pilz

lampe
, and purchase


Type
pilzlampe
, … ,
pilzlampen
, and purchase


Type
pilz

lampen
, … ,
pilzlampe
, and purchase





How do buyers purchase the
pilzlampe
?


From our
data
mining:


We automatically discover that
pilz

lampe

and
pilzlampe

are the
same


We also discover that
pilz

and
pilze

are the same, and
lampe

and
lampen

are the same


From these patterns, we rewrite the user’s query
pilzlampe

as
:

pilzlampe

OR “
pilz

lampe
” OR “
pilz

lampen
” OR
pilzlampen

OR

pilze

lampe
” OR
pilzelampe

OR “
pilze

lampen
” OR
pilzelampen


Are Query Rewrites easy?



Nothing is easy at scale



Incorrect strong signals:



CMU

is not
Central Michigan University



Mariners

is not the same as
Marines



Context matters



Correcting
Seattle Marines
to
Seattle Mariners
is (generally) right



Denver Nuggets
is not
Denver

in the
Jewelry & Watches category


An even
bigger

opportunity



Next Gen Search

Cassini
:
Reengineering
eBay Search

Top
-
to
-
Bottom View

How hard is it to ship a new search engine?


Voyager is used for much more than
the obvious. It’s multi
-
tenant:


“Default Search” search (already migrated to Cassini in the US)


Completed
, null and low (already migrated to
Cassini worldwide)


Description search


Deterministic sorts


Query rewrite


Merchandizing


The Feed


Selling (for example, allowing sellers to create listings from similar items)


Category browsing


Motors and other verticals


Many fast “
item lookup” scenarios
for other teams


Many scenarios we don’t even know about…



19

What’s else is hard about eBay search?


eBay has over 400 million items listed in multiple languages


Our collection of items changes fast


You can find just about anything on eBay. We have to optimize for every type of item


Not everybody follows the same listing practices, or uses the same keywords or units


Examples include:


Units of measure:
centimeter

versus
cm
,
gigabytes

versus
gb


Colors:
Blue

versus
Aqua
,
Rojo

is the same as
Red


Synonyms:
laptop

and
notebook
,
mobile phone
and
cell phone


Abbreviations:
SGA

means
Stadium Giveaway


Spelling errors


Our goal is to help
both buyers and sellers find items even when they use different
ways of expressing the same things



Technology Deep dive: Infrastructure



What’s hard
at eBay?



Multi
-
tenant system



Document
additions and deletions



Document
modifications



Index
updates



Result
caching



Data
center automation




Technology Deep dive: Ranking



What’s hard
at eBay?


Mix of items: good ’til canceled multi quantity vs. single quantity


Gaps in catalog data


A very different problem: different ranking signals to Web search


The deterministic sort:



R
ecall versus precision



Consistency
with best match


Spam


Result blending

But What Comes Next?

21%

of eBay

multiscreen

users

of GMV share

Q&A?