Cloak and Dagger

moonlightmidgeInternet and Web Development

Nov 18, 2013 (3 years and 8 months ago)

90 views

Cloak and Dagger















In a nutshell…


Cloaking


Cloaking in search engines


Search engines’ response to cloaking


Lifetime of
cloaked search results


Cloaked pages in search results




Ubiquity
of advertising on the
Internet.


Search
, by and large,
enjoys
the
primacy.


Search
Engine
Optimisation



SEO


doctoring of
search results.


For benign
ends such as simplifying page content,
optimizing
load times,
etc.


For
malicious purposes
such as manipulating
page ranking algorithms.


Cloaking


Conceals the
true nature of a Web site


Keyword Stuffing


Associating
benign content to
keywords


Attracting
traffic to scam pages


Protecting
the Web servers from being
exposed


Not scamming those who arrive at the site via
different keywords.


Types of Cloaking



Repeat Cloaking



User
Agent
Cloaking



Referrer
Cloaking (sometimes also called “Click
-
through Cloaking”
)


IP Cloaking



DAGGER

Dagger encompasses
five

different functions




Collection of
search terms


Querying
search results generated search engines


Crawling
search results


Detecting cloaking


Repeating
the above four processes to study variance
in measurements



Collection of Search Terms

Two different
kinds of cloaked search
terms are targeted:


TYPE 1 : Search terms which contain popular words.


Aimed at

gathering
high volumes of undifferentiated
traffic.


TYPE 2: Search terms which reflect
highly targeted
traffic


Here cloaked
content matches the cloaked search
terms.



TYPE 1
:
Use popular
trending
search terms


Google
Hot Searches
and

terms
-

shed light on
search
engine based
data collection methods,
respectively


Alexa

-

client
-
based data collection methods


Twitter

terms clue us on social
networking trends.


Cloaked page entirely unrelated to the trending
search terms


TYPE 2
:
set of terms catering to a specific
domain


Content
of the cloaked pages actually matches the
search terms.

Querying Search Results


Terms collected in the previous step are fed
to the
search
engines


Study the prevalence
of cloaking across
engines


Examine
their response to
cloaking.


Top
100 search results and accompanying
metadata compiled
into list



K
nown
good”
domains entries eliminated in order to
false
positives during data
processing.


Similar
entries are grouped
together

with appropriate ‘count’.



Crawling Search Results


Crawl
the
URL’s.


Process
the fetched pages


Detect
cloaking in parallel


Helps minimize
any possible time of day effects.


Multiple crawls






Normal
search
user


Googlebot

Web
crawler


A
user who does not click through the search
result


Detect
pure user
-
agent cloaking without any checks on
the referrer.


35
% of cloaked search results for a single
measurement
perform pure user
-
agent
cloaking.


P
ages
that employ both user
-
agent and referrer cloaking
are nearly always
malicious.


IP Cloaking
-

half of current cloaked search results do in
fact employ IP cloaking via reverse DNS
lookups.


Detecting Cloaking


Process
the crawled data using multiple iterative passes


Various transformations
and analyses
are applied


This helps
compile the
information
needed to detect cloaking.


Each
pass uses a
comparison
based approach:


Apply same transformations
onto the views of the same URL, as seen
from the user and the
crawler


Directly
compare the result of the transformation using a scoring
function


Thresholding

-

detect
pages that are actively cloaking and annotate
them.


Used
for later analysis.


Temporal
Re
-
measurement


To study lifetime
of cloaked
pages.


Temporal component in Dagger.


Fetch
search results from search
engines


Crawl and
process
URLs at later instances of time.


M
easure
the rate at which search engines respond to cloaking


M
easure
the duration pages are cloaked



Cloaking Over Time



In
trending searches the terms constantly
change.


Cloakers

target many more search terms and a broad demographic
of potential victims


Pharmaceutical
search terms are
static


Represent
product searches in a very specific domain.


Cloakers

have
much more time to perform SEO to raise the rank of
their cloaked
pages.


This results
in more cloaked pages in the top results.




Sources of Search Terms


Blackhat

SEO


artificially
boost the rankings of
cloaked pages.


Search detect cloaking either directly (analyzing
pages) or indirectly (updating the ranking
algorithm
).




Augmenting
popular search terms with
suggestions.


Enables targeting
the same semantic topic as popular
search terms.


Cloaking in search results highly influenced by the
search terms.


Search Engine Response


S
earch
engines try to identify and thwart cloaking.


Cloaked
pages do regularly appear in search results
,.


Many
are removed or suppressed by the search
engines
within hours to a day.


Cloaked search results rapidly begin to fall out of the
top 100 within the first day, with a more gradual drop
thereafter.



Cloaking Duration


Cloakers

manage
their pages similarly independent of
the search engine.


Pages are
cloaked for long durations: over 80%
remain cloaked past seven days.


Cloakers

will want to maximize the time that they
might
benefits of cloaking by
attracting customers to
scam sites, or victims to
malware
sites
.


Difficult to recycle
a cloaked page to reuse at a later
time.


Cloaked Content


Redirection of users through chain of advertising
networks


About half of the time a cloaked search result leads to
some form of abuse.


long
-
term SEO campaigns constantly change the
search terms they are targeting and the hosts they
are using.



Domain Infrastructure


Key
resource
to effectively deploy cloaking in scam:


Access
to Web sites


Access to domains


For TYPE I terms, majority
of cloaked search results
are in .com.


For TYPE II terms,
cloakers

use
the “reputation” of
pages
to
boost their ranking in search results



Search Engine Optimization


Since a major motivation for cloaking is to attract user
traffic, we can extrapolate SEO performance based on the
search result
positions
the cloaked pages occupy.


Cloaking
the TYPE I terms target
popular terms that are
very dynamic, with limited time and heavy competition for
performing SEO on those search terms.


Cloaking TYPE II terms is a
highly
focused
task on a static
set of terms,


Provides
much longer time frames for performing SEO on
cloaked pages for those terms.


Conclusion


Cloaking has become a standard tool in the
scammer’s
toolbox


Cloaking adds
significant complexity for
differentiating
legitimate
Web content from
fraudulent pages.


M
ajority
of
cloaked
seaarch

results remain high in
rankings for 12 hours


The pages themselves can persist far longer.


S
earch
engine providers will need to further
reduce

the
lifetime of cloaked results
to demonetize the
underlying scam activity.