WHEN IS SOCIAL MEDIA MINING GOOD ENOUGH?

addictedswimmingAI and Robotics

Oct 24, 2013 (3 years and 9 months ago)

110 views

1

WHEN IS SOCIAL MEDIA MINING GOOD ENOUGH?

OR


HELP! I THINK I MIGHT BE A SCIENTIST.


Nick Buckley

Social Media Director GfK NOP

2

1. What are we talking about?

3

Definition* of social
m
edia monitoring:

“Social Media Monitoring
(SMM)
means the
identification, observation, and analysis of
user
-
generated
social media content

for
the
purpose
of
market research.”





What

exactly

are

we

talking

about
?


What they
say

Public

Communities

Video sites

Review sites

Professional & Consumer

Blogs/

Microblogs

Forums

Client sites

News sites

* http
://
www.social
-
media
-
monitoring.org

4

What was that 2.0 thing again?

Before

the

rise

of

the

internet

Web 2.0

T
he
“era of shout marketing”
is over*:

* Marshall, 2012

Eh?

5

Web Mining, Social Media Monitoring or Social Media Mining?

I like “
Mining
”. User generated content in social media lays down a rich seam of activity,
opinion, thought and
information… mess, echoes and ‘whimsy’.




For some time marketing and PR professionals have been
monitoring

Social Media
to capture headline ‘buzz’ in real
time, and to detect sudden changes requiring a response.

But collecting and
counting
this content is
only the
beginning
of a process which can add value
via
many
techniques… including integration with other
sources
such as market research
data.

6

Rapid supply
-
side evolution. What has driven it?


For the original PR and Marketing Users…



Boring outputs


flat lining “buzz share”


Commoditisation [seeming] of the core process by technology
newcomers


Differentiation by interface… the “Dashboard”


to emphasise
use
-
cases


Making user self
-
service easier


for all kinds of reasons


Increasingly sophisticated users… looking for outputs
suggestive of insights


The ‘social CRM’ branch


http://blog.glennz.com/evolution/

7

2. What happens when Market Researchers get
hold of it?

8

Sony brand
d
amage
w
as
d
riven
b
y PlayStation breach
(2011)

s
ony

buzz this year

s
ony

sentiment this year

s
ony

buzz in
april

s
ony

sentiment in
april

p
laystation

buzz

p
laystation

sentiment

9

Market Researchers believe that SMM can also give clients a
window on other dimensions of online conversations




Category Dynamics



Consumer needs



Problems and issues consumer discuss



Product usage discussions



New product entries



Corporate



Corporate mentions related to reputation



Crises



Social issues







Brand


Brand/sub
-
brand mentions, brand “buzz”


Number of positive vs. negative sentiments for
each brand


Brand content analysis, what’s being said
about brand


Advertising noticed most and related
discussion


Source of mentions (specific sites.) and the
most influential sites



Competition



All the above for competition


SMM provides insights into:

©
2012
GfK
NOP

10

Inevitably they think about comparison with surveys…

Strengths


Very immediate


Unconditioned by participant awareness of a
research
process Often
more emotive than
considered survey
responses


Spontaneously generated content
-

unconstrained by research
frame.


Offers
insight into active social media users


Potentially global


You can ‘ask a new question’ without having
to issue a new questionnaire
*


Low cost


under certain circumstances


Weaknesses


Not necessarily representative of the general
population


Difficult to weight back to general population,
as demographic data is sparse


Automated sentiment analysis only as good
as the algorithms [and these vary greatly]


Automated harvesting can capture a lot of
‘noise’ for certain words or brands


No guarantee of sufficient data


Costs rise when we use supplementary
analysis to overcome some of these issues

*
within certain technical limitations

©
2012
GfK
NOP

11

Different client needs indicate different SMM approaches


For example
-

Precision
Extraction vs
‘Trawl
&
Filter’

Crude
mention &
mood tracking

Quantitative

-

Brand tracking
and integration
with traditional
research

Indicative
Qual

e.g
.
using trends and
volumes to guide focus of
analysis

Exploratory Qual


more
complex collection. Manually
manageable volumes and ‘tuning’

Higher data volumes

from
simple search terms

Lower
data volumes

from
targeted & compound search terms

More
post
processing
,
applied to
data
by
MR
agency
-

to
reduce noise
and refine
sentiment
attribution

Accept
raw data
output
from
application

12

3. Too Abstract?

13

The raw material
-

Results from search terms


SMM applications extract results from wholesale supplies of
data, conducting searches defined by “search terms”


These can be anything from a simple and distinctive brand
or product name, to a complex expression configured to
capture discussions about a category or concept.


A search term combines words or phrases via logical
instructions such as AND, OR, NOT. They may also employ
functions such as WITHIN to detect words in a certain
proximity to each other. Finally


just as in mathematical
equations


brackets can dictate the sequence in which
the instructions are applied, e.g.




word1” AND ( “word2” OR “word3”
)

14

T
ypical SMM application offers a dashboard view of data returned by these search terms


and the facility to export the underlying data

15

Analyses

Whatever

the Search Terms define


here is what can be measured about the results
returned… in combination or in isolation

Volume



“how much is it
talked about, and how is this
changing over time”

Channels



“where on
the web is it being talked
about… twitter, blogs,
forums, comments?”

Location



“where
in the
world
is it being talked about?”

Themes



“what other
words and phrases are most
regularly associated with it?”

People



“who is talking about it?” That may be by
influence


according to various proprietary indices


or by
demographics [to be used with caution]

Sentiment
:
Across all of these variables is superimposed automatically generated “Sentiment” analysis


positive
,
negative

or
neutra
l language associated with the subject of the posts…

Verbatims

-

drill
-
down to individual
posts, in their own words


“what do people
actually say?”

16

Examples of outcomes from SMM studies

FINDING:
Focus on the right social media channels at the right time.
A manufacturer
used a video from a high profile pop star to drive a major campaign. Predictably, when aired,
the video generated a ‘spike’ of twitter activity. BUT


looking back down the timeline showed
there had also been a burst of activity on forums, and some blogs, from fans of the artist when
the video was being shot.


FINDING:
Differentiate ‘trade press’ buzz from real engagement.
A manufacturer used a
novel approach, through Facebook, to support advice and collaboration between users of its
product. This appeared to have some success in stimulating social media conversations about
the product. However


deeper scrutiny revealed that this traffic was almost exclusively
blogging by sector and marketing industry press, attracted by the novel approach, with further
blog, forum and link
-
tweeting activity amongst sector insiders and social media enthusiasts.


17

Examples of outcomes from SMM studies (2)

FINDING:
Consumers don’t always talk about the product features that you highlight.
Analysis
of conversations about a newly launched electronics product revealed that the functional features
most discussed [particularly those with largely positive sentiment attached] were not those which the
manufacturer had chosen to highlight. Subsequent marketing was able to adjust to take account of
these ‘more loved’ features.


FINDING:
‘The world’ can sometimes throw up more interesting stories about you than you
could hope to generate for yourself… but not always with the connotations you would like.
An
automotive manufacturer which had enjoyed modest online buzz as a result of its own sponsorship
activities experienced a ‘spike’ in online mentions which was 10 times the size


as a result of a
much repeated witty comment. A high profile celebrity had appeared on TV news being interviewed
from the drivers’ seat of one of their vehicles. The comment


linking the celebrity to a negative ‘folk
image’ of the vehicle


spread rapidly across a range of social media channels.

The moral is that
spontaneous, and genuinely social, media can currently still outperform marketers.




18

BUT!

19

There are many forces* which erode this nice model…

Accuracy?


Reach?...................................................


Relevance?


Reach image f rom titletrack.com

20

Accuracy

Is the searched
-
for phrase even in the returned “snippet”?


Is it ‘content’


or is it


Navigation?


Ticker or title content?


Ad Content?


Various species of spam [overlaps with ‘Relevance’]?


Is meta
-
data about the poster


Present?


Reliable?


Understanding this, apart from making your own manual checks, is about understanding your third party
suppliers’ processes and content and,
often
, that of their ‘wholesale data suppliers’


each of which may
differ from the others.

21

Reach

[T]here are known
knowns
; there are things we know that we know.

There are known unknowns; that is to say there are things that, we now know we don't know.

But there are also unknown unknowns



there are things we do not know, we don't know
.

Donald Rumsfeld



Are these results from scrutiny of the entire [English speaking] social web


No


Are they results from a very large, sometimes stated, number of social sources?

Yes


Could this range be skewed relative to the subject under scrutiny?


Yes


Where it’s Twitter data


is it from the whole of Twitter


Maybe


Is historical data always the same basis as current data,


or data gathered since the search was defined?




Not always


Do we always have a good idea of what the ‘Reach’ is?



No

22

Relevance

Even when the application has collected exactly what we asked for, and it
is

legitimate
content, with some nice useful data about the poster… it might not be relevant


“Cats
are great
company.”


“#
EMT Bolt one cool cat
!”


“Also
, the Cat is a great
resort”


“I
love my aunt Cat
!”


“I
think Cat Stark is worse than any
Lanister
.”


“I
think this hurricane was a scam cooked up by the fat cats in Big Grocer
.”



23

Challenges include

However , commencing too early

public smoking facts

will just overstress your
pet ; quite a fresh pet will not learn everything from services. Just after he has
ended up perched for some a few moments, supply him with the particular
take care of, plus for instance in advance of, make sure you compliment the
pup. When dog house teaching your dog, continue to keep the dog house in
the vicinity of the spot where you as well as the canine are usually conversing.

24



And I haven’t mentioned automated Sentiment Analysis yet!

Irony


really?

Slang/Dialect/Register

Multiple meanings


“50 strong”

Adjacent subjects


“My beautiful FIAT next to a BMW”

25

4. And what is Good, and what is not Good?

26

To Recap


SMM tools make it very easy to “Super Google” certain Brands, people, objects and even
categories or concepts


quickly generating convincing
-
looking tables and charts.


But underneath there’s a complex story about accuracy, reach and relevance… which
becomes apparent on scrutiny of drilled
-
down text samples


and can only fully be
understood by getting inside the provider’s systems and sources.


It doesn’t mean they are misleading users


it just means that they started out somewhere
else.


The conclusion is that you have to carefully consider use cases, or build your own better
mouse trap, or wait for proprietary solutions to get better at certain things


Sentiment analysis is part of this story


but doesn’t define it.

27

Natural Language Processing [NLP] to the rescue?

Definition

“Specifically
, it is the process of a computer extracting meaningful information from
natural language input and/or producing natural language
output”*

Most SMM applications claim some level of NLP.







*
Warschauer
, M., & Healey, D. (1998). Computers and language learning: An overview

Whilst this
may be legitimately contrasted with
simple
vocabulary
, combination
and probabilistic
methods,
it
can end up meaning little. It may only mean that
some rules of language have been ‘attended to’ in
what is
still essentially
a pattern
-
matching exercise

28

But clearly sophisticated NLP would make a big
difference


Improved Accuracy


including filtering out of unstructured spam


More tools available to achieve/check Relevance


Much
-
improved Sentiment Analysis


Some commercial tools have become available in the last 12 months which offer an
assessment of their confidence in their own NLP analysis


dividing snippets into
those with Low, Medium and High confidence.

Significantly, ‘High’ is a minority of the output.


29

Barking up the wrong Tree?

The recap assumes that the Market Researcher’s instinct is correct… to make the fuzzy
working of the social web itself… the collection mechanisms and enterprises, and the
analytical engines… into a familiar data collection process,
somehow isomorphic with surveys
.

But “what is good” is, as many of the ancient philosophers would tell us, about
function and purpose
.


I think we’ve now learned enough,


and experienced enough un
-
straightforwardness


and contemplated enough need for manual evaluation or augmentation
-

dispelling the
notion that this is a self
-
evident labour saving device along the way…


to stop and ask, “
what was it we were trying to do?”

30

To Recap


SMM tools make it very easy to “Super Google” certain Brands, people, objects and even
categories or concepts


quickly generating convincing
-
looking tables and charts.


But underneath there’s a complex story about accuracy, reach and relevance… which
becomes apparent on scrutiny of drilled
-
down text samples


and can only fully be
understood by getting inside the provider’s systems and sources.


It doesn’t mean they are misleading users


it just means that they started out somewhere
else.


The conclusion is that you have to carefully consider use cases, or build your own better
mouse trap, or wait for proprietary solutions to get better at certain things


Sentiment analysis is part of this story


but doesn’t define it.

31

What are we trying to do?


Use the social web as a proxy for the population?


Understand how the social web is responding


for
the benefit of those solely interested in this sub
-
set
of the population as a channel or marketplace?


Access particularly niches which are more
concentrated online than off?


Detect significant events?


Measure shifts and changes?


Make rough comparisons?


Discover new insights, themes and connections?

32

How useful is extracted Social Media content?

Mechanically extracted content is
inevitably

imperfect as regards:


relevance


comprehensiveness relative to ‘total web’


accuracy of classification, sentiment etc


representativeness of general population

In general web mining is
therefore useful for:


relative measures


measuring and detecting
change or discontinuity


iterative discovery of related
concepts and drivers


comparing channels


matching to events and
schedules

It’s important to know when this matters, and how much. It is vital to work honestly with the constraints and
exploit the strengths…


and, of course,
integration with other
sources of data.

©
2012
GfK
NOP

33

Different client needs indicate different SMM approaches


For example
-

Precision
Extraction vs
‘Trawl
&
Filter’

Crude
mention &
mood tracking

Quantitative

-

Brand tracking
and integration
with traditional
research

Indicative
Qual

e.g
.
using trends and
volumes to guide focus of
analysis

Exploratory Qual


more
complex collection. Manually
manageable volumes and ‘tuning’

Higher data volumes

from
simple search terms

Lower
data volumes

from
targeted & compound search terms

More
post
processing
,
applied to
data
by
MR
agency
-

to
reduce noise
and refine
sentiment
attribution

Accept
raw data
output
from
application

Not radical
enough!

Too much like
hard work

Sensible

34

Rather than wait for NLP utopia…

Settle for:

1.
SMM as a powerful and novel
Qual

exploration tool

2.
Do big number crunching on brands but take a
“hyena” approach.

Accept all* occurrences of a brand or product name in posts as an
indication of significance… even the spam and the adverts and the
competitions

Similarly look for pure correlations between words/phrases and
other word/phrases

Or between trends in these numbers and classes of offline events


such as sales, complaints and other behaviours… with a view to
predicting, explaining or causing such events in the future.


*Except for the most obvious duplication errors such as over
-
indexing


35

5. Some Conclusions

36

I am not a scientist

OK


I’m a scientist amongst researchers, and possibly amongst programmers

But amongst scientists


and text analysis specialists


I’m a mere researcher.


Because I couldn’t use these tools “as is” with confidence I had to start delving…

… and delving is time consuming in a commercial environment.


Our technology suppliers have become more like partners… increasingly transparent as they’ve understood,
but not challenged, what we tried to do. The software and services will now adapt to us


whether they
should or not.

PR monitors, real time trackers and ‘social CRM’ folks will carry on using the tools the same way they
always have… and may even benefit from changes my industry has now initiated
.

37

But

How will commercial SMM applications and services with the best accuracy, reach
and relevance capabilities be recognised, validated and promoted?

Is the ‘bit in the middle’ just a holy grail until such time as the NLP part of the
reckoning makes a step change


driven by all its other exploitations, such as ordinary language
driven IT interfaces.


If you’re a researcher and you want to use this stuff tomorrow… what must be
done?

Fortunately


there’s enough to learn by “super
-
googleing
”, browsing and crude
trend tracking to keep us going… and learning… for some time to come.


38

39



Dr Nick Buckley

Social Media Director

GfK

NOP

M: 07958 516967 T: @
grimbold

E: nick.buckley@gfk.com


[from August 2012. E: nick@soshall.net]