Modern search box with suggestions for most popular CMSs and all other (non-CMS-based) sites. SaaS.

conditioninspiredInternet και Εφαρμογές Web

4 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

167 εμφανίσεις

Modern search box with suggestions for most popular CMSs
and all other (non
-
CMS
-
based) sites.
SaaS
.

Modern web technologies become more complex day
-
by
-
day. Site
owners outsource more and more features to web
SaaS

providers

www.disqus.com

Communities

http://getsatisfaction.com


Forum

http://www.janrain.com

Social login

http://www.liveperson.com

Chat

http://mailchimp.com/

Email newsletters

Typical site

What are other common site features causing difficulties for site
-
owners? What
of them can be outsourced and used as
SaaS
?


We interviewed a lot of friends, met people running online communities,
ecommerce, blogs


Their answers vary significantly, from recommendation system to spam control,
but the most common answer is


site search

Site
-
owners want to improve

site search experience!

What’s next to move
SaaS
?

Facebook
:

Fast and relevant suggestions while typing
-
apps, pages, people with photos

OSCommerce based site:

No suggestions, poor results… You probably don’t
use this kind of old search boxes anymore

What needs to be improved in the search text input box?

Compare
Facebook

and any OS Commerce site

Now imagine: you can add modern search box with suggestions to you site
simply with a
line of Java Script

The main idea

Modern search box performs searching
under different fields while user is typing.

Available fields: categories, product titles,
articles etc. The results are automatically
split


User can:


get more results from any filed,


or click on the found products to go
directly to the according site page


The goal is to avoid leading the user to the
standard CMS search results page and to
provide highly relevant results while typing


This will dramatically improve search box
performance,user experience and finally
sales and other online business metrics

The main idea: details

Let’s take a look at the forum engines: difficult to find relevant results.

Imagine you can implement modern full text search box with suggestions that searches by


Theme title


Theme text


Comments

and provide results sorted by comments count. No any additional software needs to be installed.

A lot of sites need this

Even simple full text blog search can be done better with the modern search box

A lot of sites need this

What’s inside the technology

www.my
-
planes
-
shop.com|

Site
-
owner enters site URL into the
index using administrator interface of
the
SaaS

searchbox

:

<script>
var

searchElementID

=“
srch_box
”….</script>

Site
-
owner gets a line of JS code that needs to be
included into the myshop.com page with the search
box:

From the site owner’s point of view:

www.my
-
planes
-
shop.com

After we get site URL,
we download and parse
its main page

This is
OpenCart

Almost all CMSs have
fingerprints like class
names, IDs, script
names that identify
them definitely.
(
OpenCart
,
Joomla
,
Wordpress

etc)

Fields detected:

Product name

Product description

Product image

After we detected the CMS,
we detect fields available on
the pages.


For example we know that in
OpenCart

product title is
marked with first H1 tag, and
the product image is marked
with <
img

id=“
product_img
”>


Even if we can’t do it automatically,
site
-
owner can “teach” our system how
to parse fields, or add special marks to
html code like HTML5
Microdata
,
Semantic markup etc.

Get The Main Page

Detect CMS

Detect fields

(title, comments, text, image..)

1

2

3

Behind the scene:

We crawl all the pages
of the site, extract the
fields and save them.

Word “plane” found

913 times in Product name

2301 times in Product
desc
.

We produce full text
index and rank the
results.

Found “plane” in products

The customer’s search box is
ready and provides results

Crawl the site and
extract the fields

Make index

Ready to serve requests

4

5

6

10963 pages contain:

Product name

Product description

Product image

Behind the scene:

The main technical trick is to extract fields from CMS’s (and
other) web pages.


It can be done by parsing html for marks like DOM elements
classes and IDs.

<div class=“article”>In mathematics, a plane is a flat…</div>


If you can’t find any marks, you parse by HTML element
number in DOM: “to get the article text find <div>
№5
in
<body> section.”


This works nice, but in case CMS changes HTML markup
everything blows and need to be fixed.


But HTML5 brings us Semantic markup
and
Microdata

and Google claims they will use it in their search algorithms,

So in future the situation for extracting structured data from
web pages will be better and better.

Main trick

Indicate site URL pattern to index

For example you want to search only the
pages under www.mysite.com/products.

Indicate fields that will be presented in
search results

Search only in product title and description.
Results in titles will be presented first.

Manually customized ranking

Under search term “Plane” provide “F16 RC
Plane sale 50% off” page.

Rank by parameters

Forum topics with the higher comments count
ranked higher

More features of the
SaaS

search box

Construct search box suggestion view

Search results will be presented by product image,
product rating and title.

User behavior rank system

Products that attract more clicks while
searching are shown higher in search results

Default search results

Most popular products are shown right after user
clicks in search box (without any letter entered)

Provide synonyms

Plane = airplane = jet.

Search results will be the same.

Reports and statistics

The most searched word is “Plane”, top clicked
product is “F16 RC plane”

More features of the
SaaS

search box

Google Custom Search

Now: Enterprise Google Site Search

Pluses

+ It’s Google

+ Nice spell
-
check, synonyms

+ Nice search landing page
customization

+ Search and rank by fields
(attributes)


Minuses

-
Very simple suggestions

-
$5 per 1000 queries

-
If you want to customize, it
becomes much more complex and
programmer
-
oriented

-
No CMS support by default

http://www.google.com/cse/

Google Commerce Search

Pluses

+ Has most of the functionality
described in this presentation


Minuses

-
Only for Ecommerce

-
Complex integration

-
Data provided manually or via API

-
Large enterprise
-
oriented (can be
purchased via Google sales force)


http://www.google.com/commercesearch/

Celebros

http://www.celebros.com/

+Suggestions

+ Complete search solution (not
only search box, but also search
landing page)


-
Only Ecommerce

-
Only enterprise

-
Hosted
SaaS

is just an option.
Complete software integration is
preferable.

-
A lot of products for e
-
Commerce
(no focus on search)

-
Looks like manual data upload or
DB API integration is required

Site search competitors

Sli
-
systems

http://www.sli
-
systems.com

Pluses

+ Nice suggestion box


Minuses

-
No focus on search (a lot of SEO
and other products)

-
Integration via salesman

-
Enterprise
-
oriented

-
Integration process not
described (looks like manual
data upload, or DB, API
integration is required)

GetWebSearch

http://getwebsitesearch.com

Pluses

+Nice looking site

+Easy integration

+Easy demo on your site

+ Clear pricing model ($59
monthly basic, $499 enterprise)


Minuses

-
No search box suggestions at all

-
Very poor search options

And some more

Site search competitors

https://www.indexdepot.com/en/

-
Looks nice, but has too simple
suggestions


http://www.websolr.com/


http://www.solrhq.com/


-

both focus on hosted
Solr


http://www.isys
-
search.com



technology



http://www.searchfit.com



for
eCommerce



http://www.cxense.com



technology




Some more

Easy integration

No APIs or DB
connectors,
cron

jobs to
update index,
maintaining
Lucene

etc.
Your site is just enough.

CMS support

We know how to parse
you CMS, so there is next
to nothing to tune.

Complete focus on
search box
suggestions for web

Staying focused on our
niche we can implement
useful features faster and
better than competitors
with wide product lines.

Not only e
-
Commerce

Competitors focus on e
-
Commerce now. But
there is much potential
in Forums, Helpdesks,
CMSs
,
Knowledgebases
,
etc.

Clear SMB
-
oriented
pricing model

Sales by site with clear
and understandable
prices. Avoiding
enterprise
-
style sales.

Mobile ready

Search box for mobile
versions of the site is ready
out of the box.

Why we are better

Apache
Lucene

For full text search,
ranking, snippets,
highlighting.


Apache
Nutch

Fast crawler


Apache
Solr

Used for
suggestions, cache,
replication,
sharding


Apache
Tika

or
jSoup

To parse HTML


In scale of SMB business (tens of thousands pages) searching is relatively simple and Google
can’t use all the power of it’s ranking algorithms and scalability.


We use Apache’s stack of search technologies:

As you see this is a highly technological project that uses a lot of open source, but
in fact there are no extremely difficult algorithms or low level system coding.

Is it extremely complex in development? The answer is NO

Integration

Usually it takes about 2
-
3 weeks to
research, implement, test and
deploy
Solr

technology

Does anybody need “
SaaS

Search Box” if
Solr

can be used directly?

Maintain

You need to think about constant
index updates, incremental
updates from DB and other.

Customize for you business

Any customization, like adding
new synonyms brings you to
command line and .conf files.

Business features

Solr

doesn’t provide you with reports
and tools your business needs, like
popular searches and promoting
products for this searches, targeting.

Simple
Solr

integration can cost you about
$3.000
(programmer’s
time) and maintaining can cost about
$300
monthly (server
maintaining + index updates management)

Shared web hosting

A lot of SMBs still on shared web
hosting plans, they simply don’t
have root access to the server to
install
Solr

CMSs

A lot of SMBs and enterprise use
CMS and do not develop any
custom features.

Search box is a paid solution

But definitely, we can do it under $5 for 1000 queries
(Google’s price)


The pricing is completely clear

Pay as user clicks on the search results



100 clicks per day
-

FREE



(for sites under 1000 pages. Crawled once in two days)


100
-
1000 clicks per day
-

$9.95


1000
-
5000 clicks per day
-

$19.95



Alternative business model:
simple search box is for free, and
advanced features are paid. Paid features can be:


Customize search fields


Customize results view


Customize URL patterns to search


Alternative business model 2
: Search box is a free solution,
advertisement block is shown in suggestion area





Business

Contact CMS powered site owners

-
Via CMS communities

-
Directly via email

-
Via official CMS site advertisement

-
Via official CMS plug
-
ins and extensions


Provide Search Powered By … link in the search box


Publications and advertisement on webmasters portals

-

communities, blogs etc.


Standard internet marketing

-
PPC advertisement etc.


Marketing

Veni

Vidi

Own startup, Travelers SN.

Sold out in 2011

http://venividi.ru/

Octoline

Startup
.

SaaS

IP PBX

http://www.octoline.ru/


Drivers Ed

Own startup
.
Online drivers
ed

courses (USA). Sold out to:

http://www.driverseddirect.com/


Andrey

Uglev

I’m project manager with strong technical
background (programmer). 31 years old.


Been involved in Internet business for 7 years I'm
always trying to be on the cutting edge of
technology, platforms, UI and business trends.


My interest now is in open source big data
solutions (mostly
Hadoop

stack) and Apache
Lucene

stack.


2 more engineers

are highly interested to join in case project gets
investment

Mail.ru Group

MAIL.RUGRP LSE

Project manager at Social
Networks BU (42.000.000 daily
unique users total)

http://mail.ru/


NOW:

IN THE PAST:

Team

To try the prototype please visit:

http://188.40.134.20:9999/war/



What’s realized:

-

Crawling products pages from Open Cart

-
Extracting Product Title, Description and Image

-
Saving results to
Lucene

-
Search Box with suggestions by Title and Description.
Image is provided by search box too.

Prototype

Contacts

Thanks for you interest!

Andrey

Uglev


Email:
andrew.uglev@gmail.com


Skype:
cher8080

Tel
.
+7

903 9765110