Text/Content Analytics 2011

Arya MirΔιαχείριση

11 Οκτ 2011 (πριν από 5 χρόνια και 11 μήνες)

1.867 εμφανίσεις

User Perspectives on Solutions and Providers Seth Grimes An Alta Plana research study

Text
/Content

Analytics 20
11
:


User
Perspective
s

on
Solutions and Providers

Seth Grimes



An Alta Plana research study
Sponsored by










Published September 9, 2011 under the
Creative Commons Attribution 3.0 License
.

Text/Content Analytics 2011:
User Perspectives





2


Table of Contents

Executive Summary

................................
................................
................................
................................
............

3

Market Size and Growth
................................
................................
................................
.............................

3

Growth
Drivers

................................
................................
................................
................................
...........

3

The 2011 Market

................................
................................
................................
................................
........

4

The Study

................................
................................
................................
................................
....................

4

Key Study Findings
................................
................................
................................
................................
......

4

About the Study and the Report

................................
................................
................................
................

5

Te
xt and Content Analytics Basics

................................
................................
................................
......................

6

From Patterns…

................................
................................
................................
................................
..........

6

… To Structure

................................
................................
................................
................................
............

7

Beyond Text
................................
................................
................................
................................
................

7

Metadata

................................
................................
................................
................................
....................

7

A Focus on Applications

................................
................................
................................
.............................

7

Applications and Markets

................................
................................
................................
................................
..

8

Application modes
................................
................................
................................
................................
......

8

Business Domains

................................
................................
................................
................................
.......

8

Business Functions

................................
................................
................................
................................
.....

9

Technology Domains

................................
................................
................................
................................

10

Solution Provi
ders

................................
................................
................................
................................
....

12

Demand
-
Side Perspectives

................................
................................
................................
...............................

13

Study Context

................................
................................
................................
................................
...........

13

About the Survey

................................
................................
................................
................................
......

13

Market Size and the Larger BI Market

................................
................................
................................
......

15

The Data Mining Community

................................
................................
................................
...................

15

Demand
-
Side Study 2011:
F
indings

................................
................................
................................
..................

17

Q1: Length of Experience

................................
................................
................................
.........................

17

Q2: Appl
ication Areas

................................
................................
................................
...............................

18

Q3: Information Sources

................................
................................
................................
..........................

19

Q4: Return on Investment

................................
................................
................................
........................

21

Q5: Mind
share

................................
................................
................................
................................
..........

22

Q6: Spending

................................
................................
................................
................................
............

23

Q8: Satisfaction

................................
................................
................................
................................
........

23

Q9: Overall
Experience

................................
................................
................................
.............................

25

Q10: Providers

................................
................................
................................
................................
..........

28

Q11: Provider Selection

................................
................................
................................
............................

29

Q13: Promoter?

................................
................................
................................
................................
........

31

Q14: Information Types

................................
................................
................................
...........................

32

Q15:
Important Properties and

Capabilities

................................
................................
............................

32

Q16: Languages

................................
................................
................................
................................
........

34

Q17: BI Software Use

................................
................................
................................
...............................

35

Q18: Guidance

................................
................................
................................
................................
..........

36

Q19: Comments

................................
................................
................................
................................
.......

39

Additional Analysis

................................
................................
................................
................................
...

40

Interpretive Limitations and Judgments

................................
................................
................................
..

42

About the Study

................................
................................
................................
................................
...............

43

Solution Profile:

AlchemyAPI

................................
................................
................................
............................

45

Solu
tion Profile:

Attensity

................................
................................
................................
................................
.

47

Solution Profile:

Basis Technology

................................
................................
................................
....................

49

Solution Profile:

Language Computer Corp.

................................
................................
................................
.....

51

Solution Profile:

Lexalytics

................................
................................
................................
................................

53

Solution Profile:

Medallia

................................
................................
................................
................................
.

55

Solution Profile:

SAS

................................
................................
................................
................................
.........

57

Solution Profile:

Sybase

................................
................................
................................
................................
....

59

Solution

Profile:

Verint Systems Inc.
................................
................................
................................
.................

61

Text/Content Analytics 2011:
User Perspectives





3


Executive

Summary

Text and content analytics
ha
v
e

become a source of competitive advantage,
enabling business
es
, government

agencies
, and research
ers

to
extract

unprecedented

value from “unstructured” data
.
U
p
t
a
k
e

i
s

s
t
r
o
n
g



s
o
f
t
w
are
,

solutions
,

an
d

s
er
v
i
c
e
s

a
re
de
li
vering
significant

business value to users in a
spectrum of industries


yet

the potential of the
market

remains

unreached
.

These points and more are brought out in Alta Plana

s
market study,

Text/Content Analytics 2011: User
Perspectives on Solutions and Providers.


Market Size and Growth

T
ools and solutions now cover the gamut of business, research, and
governmental
needs
. User adoption

continues to grow

at a very rapid pace, an estimated
25% in 2010
,
creating a
n

$
835

million market

for software

tools
,
business solutions
,

and vendor
supplied support and services.
These tools and solutions generate

busin
ess value

several
times that figure
,

extrapolating from revenue generated by

applications and solutions (for
instance,
social
-
media analysis, e
-
discovery, and
search),
information
products

created by
mining content
,
professional
service
s
,
and research
.


T
he

addressable market
for text/content analytics
is much larger
.
The technologies are a
subset of a larger business intelligence, analytics, and performance management software
market, which is dominated by solutions that analyze numerical data that origi
nates in
enterprise operational systems.
Gartner

estimated

that larger market
a
t

$10.5 billion

globally in
2010
.
Yet
,

g
iven
now
-
broad
awareness of
the
business value
that resides
in
“unstructured”
social, online, and enterprise sources
,

text/content
-
analytics


share of the
much larger market
will surely
grow steeply in coming years
. Overall
,
expect
annual
text/content
-
analytics growth
averaging
up to 25%

for the next several years
.

Growth
Drive
rs

A number of factors
contribute to

sustained
growth
,
foremost

the growth of social
platforms
, which have become essential life tools for individuals and an important
business marketing,
communication
, research,

and commerce channel.


Social

Keeping up with Social is a
must

for every consum
er
-
facing organization, and a
utomated
monitoring, measurement, and engagement
is the only way to
deal

with Social’s variety,
volume, and velocity.

Leading

solutions rely

on natural
-
language processing
, provided
by text/content analytics,

to
identif
y
and
e
xtract fact
s and
sentiment
.

E
xpect even
lower
-
end
tools to embrace NLP by 2013
.

Publishing, advertising, and information services

Second,
text/content analytics is
central to
competitive
online publishing and
advertising
to
effective
information access

(
essentially, next
-
generation

search). These
are
two sides of
a single

coin
. As applied by content producers and publishers
,

technologies

discover

and associate

appropriate
descriptive and semantic labels with
content
. The aims are

to optimize search fin
dability
, to allow content to be stored and
retrieved at a fine
-
grained level
(
document
s

as
database
s
),

a
nd to

enhance the content
consumer’s experience interacting with content.
As applied by search, content
aggregation, online advertising, and informati
on
-
service providers, t
he technology
fuels

situationally appropriate
results
th
at respond to
the information/service seeker’s
cont
e
xt
and

intent
.


Text/Content Analytics 2011:
User Perspectives





4


Question
-
answering and information access

Question
-
answering systems such as IBM Watson and Wolfram Alpha
are examples of
next
-
generation
, analytics
-
enabled

information
-
access engines
, which will play a key role
in online commerce, customer support, health
-
service delivery, and other applications
starting by early 2013
.

Similarly,

Semantic Web information

res
ources should finally enter
the mainstream by 2014. They
will
very
frequently rely on analytics to
semanticize

and
structure
content

and support on
-
the
-
fly information integration.

Rich media

Last
,
content analytics
makes
sense

of
rich media. The technol
ogy finds and exploits
patterns


what’s in a g
iven piece of content and how
the content of content

changes
over time


in speech and sound, images, and video.


There are important
today

content
-
analytics
applications for contact centers, security, general

information access, and even
in consumer electronics: Witness face detection and tracking in consumer
-
grade cameras
and camcorders.

Arguably, we could include analyses of social and enterprise network,
mined from e
-
mail, messaging, online, and social con
tent, under the content
-
analytics
umbrella.

The

2011

Market

As in prior years,
no single
solution
provider dominates the market.

P
layers
range from
the largest enterprise software vendors to a stream of

new entrants
,
both

commercializing research technologies and bringing solutions to new markets
. In
between, established enterprise content management (ECM), BI and analytics, search,
software tools,
and business
-
solution providers


the sponsors of this study among them


continue to innovate and deliver business value.

The Study

Alta Plana’s
2011 text/content analytics
market
study combines a survey
-
based,
quantitative and qualitative examination of usage, perceptions, and plans with
observations derived from
numerous conv
ersations with solution providers and users
.
It

seek
s

to answer the
question,

What do current and prospective text
/content
-
analytics
users really think of the technology, solu
tions, and solution providers?



Responses
will
help providers cra
f
t

products and services
t
hat

better serve
user
s
.
Findings
will guide
us
ers

seeking to
maximize
benefit
for their
own
organizations
.

Alta Plana received

224
valid survey
responses

between
June 6
and
July 9, 2011.
This
document reports findings and w
hen
appropriate
, contrasts
them
with
comparable
numbers
from

Alta Plana’s

spring
-
2009
text
-
analytics market
study
.
1

Key Study
Findings

The following are key

2011

study findings:



The big news is not news at all:
Social
is

by far the most popular

source fueling
text/content analytics initiatives
. Four of the top 5
information
categories are
social/online (as opposed to in
-
enterprise) sources
:

o

blogs and other social media (
62%)

o

news

articles (41
%)

o

on
-
line forums (35%)

o

reviews (30%)




1

“Text Analytics 2009: User Perspectives on Solutions and Providers”:
http://altaplana.com/TA2009

Text/Content Analytics 2011:
User Perspectives





5


as well as dir
ect cu
stomer feedback in the form of:

o

customer/market surveys (35%)

o

e
-
mail and correspondence (
29%)

for
a
n average of 4.5 sources per respondent
.



All three
top
capabilities

that users look for in a solution
, each garnering over 50%
response
, relate to
getting the most information out of sources
:

o

Broad information extraction capabilities (63%)

o

Ability to
use

specialized
dictionaries, taxonomies, ontologies, or
extraction rules (57%)

o

Deep sentiment/emotion/opinion extraction (57%)

Low cost dropped from 51
% of 2009 responses to 38% in 2011.



Top business applications of text
/content

analytics

for respondents are

the
following
:


o

Brand / product / reputation management (39
% of respondents)

o

Voice of the Customer / Customer Experience Management (39%)

o

Search, In
formation Access
, or questions Answering (36%)

o

Competitive intelligence (33%)



Seventy percent of users are Satisfied or Completely Satisfied with text/content
analytics and 24% are Neutral with only 7% Disappointed or Very Disappointed.

Dissatisfaction is

greatest, at 25%, with ease of use, with only 36% satisfied. Only
42% are satisfied with availability of professional services/support.




Only 49% of users are likely to recommend their most important provider. 28%
would recommend against their most imp
ortant provider.

About the Study and the Report

Seth Grimes, an
industry analyst and consultant
who is a
recognized authority
o
n the
application of text analytics
,
designed and conducted
t
he study

Text/Content Analytics
2011:
User Perspectives on
Solutions and Providers


and wrote this report
.

The author is grateful for the
support

of the
nine
study
sponsors
,
Verint, Sybase, SAS,
Medallia, Lexalytics, Language Computer Corporation, Basis Technology, Attensity, and
AlchemyAPI.
Their sponsorships

allowed him to conduct an editorially independent study
that should promote understanding of the text/content analytics market and of user
-
indicated implementation and operations best practices.

The solution profiles that follow
the
report’s
editor
ial

ma
tter were provided by the sponsors and included with only minor
editing for to regularize their layout.

Otherwise, t
he author is solely responsible for the
editorial content of this report, which was not reviewed by the sponsors prior to
publication.

Text/Content Analytics 2011:
User Perspectives





6


Text

and Content
Analytics Basics

The term
text analytics

describes software and transformational
processes

that
un
cover
business value in “unstructured”
text

via the application of statistical, linguistic,
machine learning
, and data analysis and visualization

techniques
. T
he aim is to improve
automated text processing
, whether for search, classification, data and opinion
extraction, business intelligence, or other purposes
.


Rough synonyms include
text mining
,
text ETL
,

and
semantic analysis
.
T
erm
inology
choice
s
are typically
rooted

in history and competitive
positioning
.

Text mining

is an
extension
of data mining and

text ETL

of the BI world’s extract
-
transform
-
load concept
.
S
emantic analysis

seems most often used by Semantic Web aficionados, who someti
mes
use t
he
broader term
Semantic Web technologies
, which also covers protocols such as
RDF, triple stores, query systems, and the like
.

These
text
technologies all perform some form of
natural language processing (NLP)
.

C
ontent analytics

can and should be seen as an extension of capabilities to
also

cover
images, audio and speech, video, and composites
, the gamut of information types not
generated or held in data
fields. (
Some organizations use

the

content analytics

label
for
text analyt
ics on online, social, and enterprise content
,
typ
ically, published information
.

These organizations most often have a strong focus on enterprise content management
(ECM
)

systems.)

From

Patterns…

T
ext
, images, speech and other audio, and video are all dir
ectly underst
andable by
humans (although not universally: Any given human language


English,
Japanese
,
or
Swahili


is spoken by a minority of people
, and not everyone recognizes a Beethoven
symphony or
Nelson Mandela in
a photo
)
.
Understanding relies on

three capabilities:


1)

Ability to recognize
small
-

and large
-
scale patterns.

2)

Ability to grasp context and, from context,
to infer
meaning.

3)

Ability to
create and apply models.

Descriptive statistics provides a
n NLP

starting point: The most frequently used words and
terms give an indication of the topics a message or document is about.

We can create
categories and classify text (a form of modeling) based on notions of statistical similarity.

Next steps take advantag
e of the linguistic structure of text, detectable by machines as
patterns. We have word form (“morphology”) and arrangement (grammar and syntax) as
well as higher
-
level narrative and discourse. Usage may be correct (as judged by editors,
grammarians, and

linguists) or not, whether the language is spoken, formally written, or
texted or tweeted: The most robust technologies deal with text in the wild.
We apply
assets such as lexicons of “named entities”; part
-
of
-
speech resolution that can help
identify sub
ject, object, relationship, and attributes;
and
“word nets” that associate words
to
help in disambiguation, determination of the contextual sense of terms that may have
different meanings in different contexts.

Yet, i
n the words of artificial
-
intelligence
pioneer Edward A. Feigenbaum,


Reading from text in general is a hard problem, because it involves all of
common sense knowledge. But reading from text in structured doma
i
ns
,

I
don’t think is as hard.


So
some techniques
(also)
apply knowledge representat
ions such as ontologies to the
analysis task
. A
ll techniques
, however,

aim to generate machine
-
processable structure.

Text/Content Analytics 2011:
User Perspectives





7


… To Structure

NLP o
utputs
, as part of a text
-
analytics system,

are typically

expressed
in the form of

document annotations, that is, in
-
line or external tags that identify and describe features
of interest. Outputs may
be mapped into

machine
-
manageable data structures whether
relational database records or in
XML, JSON,

RDF, or an
other format
.


Text
-
extract
ed
data

represent
ed in the Seman
tic Web’s Resource Description Framework
(RDF)
may form part of a Linked Data system. Text
-
derived information stored in a
relational database may become part of a business intelligence system that jointly
analyzes, for instance, DBMS
-
captured customer transactions and free
-
text responses to
customer
-
satisfaction surveys.

And text
-
extracted features such as entities, topics, dates,
and measurement units may form the basis of
advanced semantic search systems.

Beyond Text

Beyond
-
text technologies for information
-
extraction from images, audio, video, an
d
composite media exist but d
o not match NLP’s sense
-
making capabilities.


Likely most
developed is speech
-
analysis technology that supports indexing and search using
phonemes and is capable of detecting emotion in speech via analysis of indicators such as

pace, volume, and intonation with contact
-
center and others applications

that include
intelligence
.
Intelligence, along with consumer and social search, motivates work on
image a
nalysis
, as do marketing and competitive
-
intelligence related studies of onl
ine and
social brand mentions and use. Video analytics extends both speech and image analysis,
with an added temporal aspect, for security applications and also
potential
business uses
such as
study of customer in
-
store behavior.


For beyond
-
text media, a
s for text, metadata
is of

critical importance.

Metadata

Metadata describes
data
properties
that may include the

provenance, structure, content,
and use of data points, datasets, documents, and document collections. Content
-
linked
metadata typically includes author, production and modification dates, title, topic
(s),
keywords, format, language, encoding (e.g., char
acter set), rights, and so on.
The
metadata

label extends to specialized annotations such as part
-
of
-
speech and data type.

Metadata may
be

created as part of content production or publication
(
for instance, the
save
date
captured by a word
-
processor,
a
geotag associated

with a social update, camera
information stored in an image file)
. I
t may be
appended
(for instance via social tagging),
or

extracted from content via text/content analysis.
Whether

stored internally within a
data object (for instance v
ia RDFa, FOAF, or other microformats embedded in a Web page)
or managed externally, in a database or search index
, metadata is fuel for
a range of
applications
.

A Focus on Applications

We will not devote f
urther
space in this report to
discussion of text
-

and content
-
analysis
technology.

If you do want to learn more about

text
-
analytics history and technology,
do
continue with the technology sections of
Alta Plana’s 2009 study report,
“Text Analytics
2009: User Perspectives on Solutions and Providers,


av
ailable online at
http://altaplana.com/TextAnalyticsPerspectives2009.pdf
.


As a bridge to survey
-
derived reporting of user perceptions of
the
text and content
analytics
market,
solutions,
and
providers,
we will look next at applications.


Text/Content Analytics 2011:
User Perspectives





8


Applications

and
Markets

Business users naturally focus on business benefits, whether of analytics or of any other
technology or investment. Who are those users?

Text and content analytics solu
tions have a place a) in any business domain, b) for any
business function, and c) within any technology stack, that would benefit from automated
text/content handling, that is, wherever text/content volume, velocity, and variety, and
business urgency, are

sufficient to justify costs.
Consider

a very telling quotation
,
however
:

Philip Russom of the Data Warehousing Institute wrote in a 2007 report,

BI
S
earch and Text Analytics:
New Additions to the BI Technology Stack
,”
2

“Organizations embracing text ana
lytics all report having an epiphany moment
when they suddenly knew more than before.”


In the analytics world, w
e
see

now that it is not enough to know
more
. You
need to
understand
how
to
use knowledge gained, the processes and outcomes necessary to turn
insights into ROI.
T
ext and content analytics
elements


information sources
,
insights
sought
, processes, and ROI measures


will vary by industry and application
.


In this report section
, by way of lead
-
in to survey findings



applications, information
sources, and ROI measures are the subject of survey questions 3, 4, and 5


we look at
text/content analytics
adaptation for
applications in
several industries and
for
a variety of
business

functions.

Application modes

A
pplications
are diverse but

may be classified in several (overlapping) groups
.
Our

categorization
i
s
a
n

update

of 2009’s with social and online addition in particular
:



Media
, knowledgebase,

and publishing systems



the autho
r includes search
engines here


use text
and content
analytics to generate metadata and enrich
and index metadata and content in order to support content distribution and
retrieval. Semantic Web applicat
ions would fit in this category, as would
emerging
information
-
access
engines
.



Content management systems



and, again, related search tools


use text
analytics to enhance the findability of content for business processes that include
compliance, e
-
discovery, and claims processing.



Line
-
of
-
business
and supporting
systems

for functions such as compliance and
risk, customer experience management (CEM), customer support and service,
marketing and market research,
human resources and recruiting
… and newer
tasks that include social monitoring, measurement
, and engagement.



Investigative and research systems

for functions such as fraud, intelligence and
law enforcement, competitive intelligence, and
science
.

Where are these applications used?


Business Domains

C
onsider a sampling of industry domains
where te
xt and content analytics are frequently
applied:



In
intelligence and counter
-
terrorism
, and in
law enforcement
, there is broad
content variety


languages, format (text, audio, images, and video), sources
(news, field reports, communications intercepts, go
vernment records, social



2

http://www.teradata.com/assets/0/206/308/96d9065a
-
0240
-
44f1
-
b93c
-
17e08ae6eacc.pdf

Text/Content Analytics 2011:
User Perspectives





9


postings)


and, at times, great urgency.



In
life sciences
, for instance for pharmaceutical drug discovery, source materials
have been more uniform (scientific literature, clinical reports) and there is no
need for real
-
time
response, yet information volumes are huge and complex and
the potential payoff


years and millions of dollars shaved off lead
-
generation and
clinical trials processes


to justify very significant investments in text mining.



For
financial services

and
in
surance
, effective
credit
,
risk
,
fraud
, and
legal and
regulatory
-
compliance

decision
-
making involves creation of predictive models via
analysis of large volumes of transactional records and often incorporates
information mined from text sources such as fin
ancial and news reports, e
-
mail
and corporate correspondence, insurance and warranty claims. Automated
methods are essential.



Market researchers

rely on text analytics to hear and understand market voices.
Focus groups are (on their way) out: They are co
stly, slow, and often unreliable.
Surveys still have great value


beyond soliciting opinions, they can serve as an
engagement tool


but neither they nor focus groups help researchers hear
unprompted views, the attitudes that consumers express to their p
eers but not in
more formal research settings. Why text analytics? S
ocial

is
hot
, yet human
analysis, whether or surveys or of social postings, can be inconsistent and don’t
scale. Add in text analytics and you have
next
-
generation market research
.



As c
ontent delivery and consumption shift to digital, search and information
-
dissemination tools that exploit metadata

(publisher
-
produced, analytically
generated metadata, or socially tagged)
are

essential survival tools for
media and
publishing

organizations
. Content analytics creates better targeted, richer
content and a much friendlier and more powerful experience for content
consumers.



Online

and social have fomented an
advertising

revolution. Targeting is the word,
whether based on behaviors (modeled via tracking and clickstream analysis) or on
analytically computed matching. Matches may draw from user profiles, context
(geography, accessing application, device or machine being u
sed) and inferred
intent (for instance from search terms), and the semantic
-
signatures of the
content where ads are to be delivered.



Text analytics provides essential capabilities in support of
legal

domain
e
-
discovery
mandates. Organizations must “pro
duc
e” materials relevant to
law
suits, a task

that would often be impossible

without automated text
processing, given huge volumes of
electronically stored information

generated in
the course of business.

Intellectual property

is another legal
-
domain applicat
ion.
The task is to identify names, terminology, properties, and functions

salient to an
IP search that seeks to identify, for instance, prior art and possible patent
infringement.

Business Functions

Many business tasks are independent of industry. Every

organization

of any significant
size has in
-
house customer
support, marketing, product development, and similar
functions (even while definitions of
customer
,
marketing
, and
product

do still, of course,
vary by industry.) Let’s examine the role text and
content analytics play

for

the following
:



Customer

experience management (CEM)

is a signal text/content analytics
success story.

The aim is to transform
customer relationship management (CRM)
,
which captures transactions and interactions, into a set of t
ools and practices that
cover the engagement span from
customer
acquisition
to

customer
service and
Text/Content Analytics 2011:
User Perspectives





10


support
, first and foremost by listening and responding to the
voice of the
customer

across channels.

In plain(er) English, CEM marries text
-

and speech
-
sourced information


from e
-
mail, online forums, surveys, contact
-
center
conversations, and other touchpoints… and also from employee input


with
transa
ctional and profile information. The hope is to improve customer
satisfaction

and operate more effici
ently and profitably
.

Simplistic, reductive
indicators such as the Net Promoter Score can
only
point at issues and challenges.

The
y

can neither explain them
nor suggest actions or remedies


insights that are
accessible (at enterprise scale) only via tex
t and content analytics.



Marketers translate
market
-
research

and
competitive
-
intelligence

findings into
marketing

campaigns and

advertising

and
, in cooperation with product
developers, into
higher
quality
, more satisfying products and services.
It’s all
a
bout listening.

Steve Rappaport, in
his book
Listen First!
, says we should “Change the research
paradigm. Social media listening research should bring about an era of real
-
time
data that anticipates change and can be used to visualize and create a rewarding
business future,”
as well as

“rethink ma
rketing, advert
ising, and media.

His
prescriptions about listening apply across channels and touchpoints, as they do
for CEM, with the difference here, for research
-
related functions, being that we
are looking at an aggregate rather than an individualized picture, seeki
ng to hear
the
voice of the market
,
again aided by text and content analytics
. Our aim is

to
deliver
targeted
, compe
lling advertising

via more effective marketing

and, of
course, superior
products and services

that better meet customer needs
.



Competitiv
e intelligence
, i
n particular, involves mining customer voices
,

at both
individual and aggregate levels
,

and also

business
information, for instance about
sales, personnel, a
lliances, and market conditions

that indicate opportunities and
threats. Ability
to extract domain
-

and sector
-
focused information

from online
and social sources

and to integrate information from disparate sources in order to
derive coherent
signals

is

essential, delivered by analytically rooted technologies.



Business intelligence (BI)

was fi
rst defined, in the lat
e

1950s, in terms of
extraction and reuse
of knowledge
drawn

from
text
ual

sources.
3

BI took off in a
different direction, however,
starting in the late 1960s,
c
entering

on analysis of
numerical data captured in computerized corporate operational and transactional
systems.

Back to the (1950s)
F
uture:
Number crunchers

of all stripes recognize
the business value of information
in

text sources
. They
are seeking, with the h
elp
of both major and niche BI and data warehousing vendors
,

to bring text
-
sources
information in
to

enterprise BI

initiatives
.

Call this
integrated analytics
, also
incorporating geospatial and machine
-
generated Big Data to bring businesses a
step closer t
o the
sought
-
after (although
mythical
)

360
o
-
view of the
customer

(and the market and one’s own business)
.

Technology Domains

Last
,
for context,
let’s
briefly
consider technology domains

where text and content
analytics

come into play,
semantics

and the
Semantic Web
, and
then look at
emerging text
analytics applications
.

Semantic Computing

First
,
redefining,
text/content analytics

involves

the acquisition, processing, analysis, and



3

Seth Grimes, “BI at 50 Turns Back to the Future,”
InformationWeek
, November 21, 2008:
http://www.informationweek.com/news/software/bi/211900005

Text/Content Analytics 2011:
User Perspectives





11


presentation of enterprise, online, and social information derived from
text and rich
-
media sources.
The technology is
one route to

semantics
,
to generating
machine
-
usable
identification of
information objects
attached
to databases, tables, fields, and rows; to
corpora, documents, and document content; and to media
files, e
-
m
ail and text messages.

Text/content analytics provides a descriptive route to semantics, making sense of
information in
-
the
-
wild, as generated by humans (and machines) online, on social
platforms, and in everyday business and personal communications, whe
ther written,
spoken, or captured in rich media. The alternative route to semantics is prescriptive,
generated or captured in the course of content generation, whether via database export
or a plug
-
in to an authoring application.

The Semantics market inc
ludes technologies for the creation, management, and use of
artifacts such as taxonomies, ontologies, thesauruses, gazetteers, semantic networks,
controlled vocabularies, and metadata. These artifacts may be generated manually by
subject
-
matter experts.
They may be generated automatically by text analytics. And in
many situations, a hybrid system involving manual curation of automatically generated
artifacts may be in order.

Semantics applications include digital content management, publishing, researc
h, and
librarianship across a broad set of industrial and government applications. The semantics
market includes semantic search, whether open
-
domain, vertical (applied to a particular
information domain), or horizontal (applied in a particular business f
unction). It also
includes classification and information integration.

Classification, Search, and Integration

Semantic computing
finds its primary application in
classification
, search,

and
integration
.
Classification

determines what a data item or object represents, including how it may be
used, in relation to other data items and objects in a data space.

This is, admittedly, an
abstract and not particularly practical definition.
Information i
ntegration
and search a
re

where semantics finds
its most compelling
application
s.


Semantic search

is, in essence,

search made smarter, search that seeks to boost accuracy by taming ambiguity via an
understanding of context.”
4

Several approaches fit under the semantic search u
mbrella
.
They include
related searches, search
-
results enrichment, concept searches, faceted
search, and more
. The common thread is

better matching searcher intent

(inferred from
search context including past searches and the searcher’s profile)

to searc
hed
-
for
information content. Semantic search is behind many emerging
search
-
based
applications
, fueled by text and content analytics, for applications such as e
-
discovery
,
faceted navigation for online commerce,

and search
-
driven business intelligence.

A
nd it is
captured semantics, in the form of data identifiers and descriptions, th
at

enables dynamic,
adaptive
information integration
, where join paths are discovered based on business and
application needs, not hard
-
wired as in until
-
recent computing

gene
rations.

The Semantic Web

The
Semantic Web

is
, at its root,

an information
-
integration and sharing application, a set
of standards and protocols designed to facilitate creation and use of “Web of data.”
Eventually, the Semantic Web market will include tools and services that execute
knowledge
-
reliant business tra
nsactions over distributed, semantically infused
data
spaces
. We are years from that market.

T
he bulk of Semantic Web focused expenditures
are

for government funded research



4

Seth Grimes, “Breakthrough Analysis: Two + Nine Types of Semantic Search,”
InformationWeek
, Jan
uary

21, 2010:
http://www.informationweek.com/news/software/bi/
222400100

Text/Content Analytics 2011:
User Perspectives





12


projects at universities and similar institutions. Outside research contexts,
business
implementations do not extend significantly beyond a) the use of microformats and RDFa
(
Resource Description Framework

attributes
)
to allow Web
-
published structured data to
be indexed by search engines to facilitate information access and b)
the
u
se of RDF triples
as a convenient format for structuring facts for storage in DBMSes supporting graph
-
database schemas t
o

facilitate integration and query of data from disparate sources.

At a certain point however, perhaps in 2
-
4 years, the Semantic Web w
ill reach a tipping
point where its business value, and the revenues generated by technology and solutions
sales, licensing, and support, will explode.

Value Today

At
this time, text/content analytics delivers business value that is greater by far than th
e
value delivered by related semantic and Sem
antic Web technologies
. Th
is

is because the
vast majority of subject information


text, images, audio, and video
(a.k.a. content)


is in

unstructured


form,
just a string of bytes (and terms, in the case of text) so far as
software systems


Web browsers and office productivity tools, content management
systems, search engines


are concerned.

To make content tractable for business ends, for operational or an
alytical purposes or in
order to monetize content as a product, one must first create structure. To maximize
content usability, for most social and for many enterprise sources, generated structure
will take into account semantic information extracted from

source materials. That is,
structure shouldn’t be arbitrary, a matter of sticking information into a set of round
pigeonholes for square
-
peg content.

This process of the discovery, extraction, and use of semantic information in content is the
domain of

text/content analytics solutions.

Solution
P
roviders

T
he aggregate characteristics of the text and content analytics solution
-
provider

spectrum
are

little changed since 2009 although there has been significant turn
-
over in players. We
still have, as repo
rted in 2009, “
a significant cadre of young pure
-
play software vendors
,
software giants that have built or acquired text technologies, robust open
-
source projects,
and a constant stream of start
-
up
s
, many of which focus on market niches or specialized
capa
bilities such as sentiment analysis.


The big change is in delivery mode
. The market
now favors as
-
a
-
service analytics,
whether
in the form of online applications, cloud provisioned, or
provided via Web application
programming interfaces (APIs).

This shi
ft makes sense.



The most in
-
demand new information sources are
online, social, and on
-
cloud
.



Use of as
-
a
-
service, cloud, and via
-
API applications
means low up
-
front
investment, faster time to use, and pay
-
as
-
you
-
go
pricing without IT involvement
.



Certain p
roviders offer
as
-
a
-
s
ervice

access to both historical and current data at
attractive
cost
s

given
the
buy
-
on
c
e, sell
-
many
-
times economies

they enjoy
.



Modern applications are

designed to draw data via APIs, facilitating application
-
inclusion of plug
-
in text and content analytics capabilities.

The
re is every expectation that the

solution
-
provider market will continue to evolve to
keep pace with user needs and broad
-
market busin
ess and technical trends.

Text/Content Analytics 2011:
User Perspectives





13


Demand
-
Side

Perspectives

Alta Plana
designed a
2011

survey,

Text
/Content

Analytics demand
-
side perspectives:
users, prospects, and the market
,


to
collect

raw material for an exploration of

key
text
-
analytics
market
-
shaping

quest
ions
:



What do customers, prospects, and users think of the technology, solutions, and
vendors?



What works, and what needs work?



How can solution providers better serve the market?



Will your companies expand their use of text analytics in the coming year?
Will
spending on text
/content

analytics grow, decrease, or remain the same?

It

i
s clear that
current and prospective

text
/content
-
analytics

users
wish to

learn how
others are using the technology
, and solution providers of course
need demand
-
side data
to improve their products,
services
,

and market positioning
,

to boost sales and better
satisfy

customers
.
The
Alta Plana
study
therefore
has

two goals:



T
o raise market awareness and educate current and prospective users
.



T
o collect
infor
mation of value to s
olution providers, both study s
ponsors

and
non
-
sponsors
.

Survey findings
,

as presented and analyzed in this study report,
provide a form of
measure
of the state of the market, a form of
benchmark
.

They are designed to be of use to

everyone who is interested in the commercial text
/content
-
analytics market.

Study
Context

The author

previously
explored
market questions in a number of papers and articles.
These included white papers created for the
Text Analytics Summit in 2005,
The
Developing Text Mining Market
,”
5

and 2007,

What's Next for Text
.”
6

A systematic look at
the demand side
provide
s

a good complement

to
provider
-
side views

and
to

vendor
-

and analyst
-
published case studies
,
including the author’s own. This
understanding motivated
the 2009 study, “
Text Analytics 2009: User Perspectives on
Solutions and Providers
,”
available fo
r free download
.
7


That research was preceded by
Alta Plana’s

2008 study report,


Voice of the Customer:
Text Analytics for the Responsive Enterprise
,”
8

published by
BeyeNETWORK.com
,

a

first
systematic

survey of demand
-
side perspectives, albeit focused on a particular
set of
business problems
.
VoC

analysis is
frequently
applied
to enhance customer support and
satisfaction initiatives, in support of marketing, product
and service quality, brand and
reputation management, and other enterprise feedback initiatives.

About the
Survey

There were 224 responses to the 2011 survey, which ran from June 6 to July 9, 2011.
(Contrast with
116 responses to
the

2009 survey, which ran from April 13 to May 10
,
2009
.
)





5

http://altaplana.com/TheDevelopingTextMiningMarket.pdf

6

http://altaplana.com/WhatsNextForText.pdf

7

http://altaplana.com/TA2009

8

http://altaplana.com/BIN
-
VOCTextAnalyticsReport.pdf

Text/Content Analytics 2011:
User Perspectives





14


Survey
invitations

The

author solicited responses via



E
-
mail to the
TextAnalytics
,
SentimentAI
,
Corpora
,
Lotico
,
BioNLP
,
Information
-
Knowledge
-
Content
-
Management
,
and
ContentStrategy

lists
and the a
uthor’s
personal list.



Invita
tions published in
electronic
newsletters:
In
formationWeek
,
BeyeNETWORK
,
CMSWire
,
KDnuggets
,
AnalyticBridge
,
and
Text Analytics Summit
.



Notices posted to LinkedIn forums and Facebook groups and on
T
witter.



Messages sent by spon
sors to their communities.

Survey

introduction

The survey started with a definition and brief description as follow:

Text Analytics / Content Analytics is the use of computer s
oftware or
services to automate

• annotation and information extraction from
text


entities, concepts,
topics, facts, and attitudes,

• analysis of annotated/extracted information,

• document processing


retrieval, categorization, and classification,
and

• derivation of business insight from textual sources.

This is a survey of
demand
-
side perceptions of text technologies,
solutions, and providers. Please respond only if you are a user, prospect,
integrator, or consultant. There are 21 questions. The survey should take
you 5
-
10 minutes to complete.

For this survey, text mining, t
ext data mining, content analytics, and text
analytics are all synonymous.

I'll be preparing a free report with my findings. Thanks for participating!

Seth Grimes (grimes@altaplana.com, +1 301
-
270
-
0795)

The
introduction

ended with

the text
:

Privacy stateme
nt: This survey records your IP address, which we will use
only in an effort to detect bogus responses. It is your choice whether to
provide your name, company, and contact information. That information
will not be shared with sponsors without your permiss
ion, and if shared
with sponsors, it will not be linked to your survey responses.

Survey response

There is little question that the survey
results

overweight current text
-
analytics users



7
3% of respondents

who answered Q1,

How long have you been using T
ext Analytics?


(n=224)
versus
78
% of respondents who replied to Q7,

Are you currently using
text
/content

analytics?


(n=206)


among the broad set of potential business,
government, and academic users.

(The difference in percentage is likely due to a higher
rate of survey abandonment among non
-
users. The figures contrast with 63% and 61% in
the 2009 survey.) So call this
a

Pac Man

question
, one whose response indicates
very
significant survey selection

bias
:

Text/Content Analytics 2011:
User Perspectives





15



Market
S
ize and
the Larger BI Market

We can infer overweighting

by comparing
market
-
size figures. The author estimates
an

$
8
35 million
2010
global market for
text
/content
-
analytics
software and vendor supplied
support and services.
As the author

described in the May 12, 2011
InformationWeek

article
Text
-
Analytics Demand Approaches $1 Billion
9
,

“My $835 million market
-
size estimate covers software licenses, service
subscriptions, and vendor
-
provided technical support and professional
services. De
spite strong growth, it remains a small fraction of
Gartner's
$10.5 billion 2010 valuation

of the broader BI, analytics, and performance
-
management software market
.”
10

By contrast,
the 2009
text
-
analytics market report cited the author’s figure of $350 million
for the global, 2008 text analytics market
. (That figure did not account for
search
-
based
applications, which were included in the 2010 market
-
size estimate.
)

The 2009 report
also
cit
ed a
2008

BI
-
market
estimate

from
research firm IDC
:


T
he business intelligence
tools software market grew 6.4% in 2008 to reach $7.5 billion.”
11


The
Data
M
ining

Community

Another

contrast
ing data point is that
6
5% of respondents to a
July

20
11

KDnuggets
poll
12

report
(n=1
21
)
us
ing

text analytics on projects

in the preceding year
.
Results were tallied
nine days into the poll, before it was closed, so final numbers may differ from those
reported here.

The figure in a similar, March 2009 poll was 55%

currently using text analytics/text mining.





9

http://www.informationweek.com/news/software/bi/229500096

10

http://www.gartner.com/it/page.jsp?id=1642714

11

http://www.idc.com/getdoc.jsp?containerId=217443

12

http://www.kdnuggets.com/2011/07/poll
-
text
-
analytics
-
use.html

78.2%
21.8%
Are you currently using text/content analytics?
Yes
No
(n=206)
Text/Content Analytics 2011:
User Perspectives





16


KDnuggets: How much did you use text analytics / text mining in the
past 12 months?


KDnuggets reaches data miners, a technically sophisticated audience

who are among the
most likely of any market segment to
have embraced text analyti
cs. The rate of text
-
analytics adoption by d
ata

min
ers surely exceeds the rate adoption by any other user
sector.

As an aside,
4
9
% of KDnuggets

respondents stated that in comparison to the last 12
months
,
in the next 12
they woul
d use text ana
lytics
more
, w
hether on additional projects
or more intensively on a steady project workload. 4
3
% stat
ed their use would r
e
main

about the same
and only
8
% anticipated less use.

34.7%
19.0%
14.9%
9.9%
21.5%
0%
5%
10%
15%
20%
25%
30%
35%
40%
Did not use
Used on < 10% of my projects
Used on 10
-
25% of projects
Used on 26
-
50% of my projects
Used on over 50% of my projects
Text/Content Analytics 2011:
User Perspectives





17


Demand
-
Side Study 20
11
:
Response

The subsections that follow tabulate and chart
survey responses
, which

are
presented
without unnecessary elaboration
.


Q1:
Length

of
Experience

As in 2009, the 2011 survey opened with a basic question



We see that 2011 responses skew to longer experien
ce

than measured in 2009. Survey
results were not based on a scientifically designed or measured population sample
however, neither
in
2011 nor
in
2009, and given how out of proportion survey
-
measured
experience is to
that of the broad business population



the addressable market for
text/content analytics likely extends far beyond the currently user base


the most
plausible conclusion one can draw from Q1 responses is that
2011
survey outreach failed
to
bring in the proportion of new and prospective user
s reached in 2009
.

Nonetheless
,
Q1
responses will prove illuminating in analyses of subsequent survey questions, in studying
how attitudes vary by length of text/content analytics experience.

not using,
no definite
plans to use
currently
evaluating
less than 6
months
6 months to
less than
one year
one year to
less than
two years
two years to
less than
four years
four years
or more
2009 (n=107)
16%
22%
8%
5%
7%
18%
25%
2011 (n=224)
6%
21%
3%
5%
12%
20%
33%
0%
5%
10%
15%
20%
25%
30%
35%
How long have you been using Text/Content Analytics?
Text/Content Analytics 2011:
User Perspectives





18


Q2: Application Areas


The 219 respondents
in 2011
chose a
total of 748 primary applications, an average of 3.4
primary applications per respondent. While there is some category overlap, it is notable
that respondents are applying text analytics toward multiple business needs.


7%
19%
17%
13%
15%
15%
18%
14%
22%
37%
33%
33%
40%
6%
7%
8%
8%
9%
10%
11%
15%
15%
15%
26%
33%
36%
39%
39%
39%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Law enforcement
Military/national security/intelligence
Content management or publishing
Insurance, risk management, or fraud
Other
Financial services/capital markets
Online commerce including shopping, price intelligence,
reviews
E
-
discovery
Life sciences or clinical medicine
Product/service design, quality assurance, or warranty
claims
Customer service/CRM
Competitive intelligence
Research (not listed)
Search, information access, or Question Answering
Voice of the Customer / Customer Experience
Management
Brand/product/reputation management
What are your primary applications where text comes into play?
2011 (n=219)
2009 (n=103)
Text/Content Analytics 2011:
User Perspectives





19


Q3: Information Sources


7%
15%
11%
12%
16%
13%
16%
8%
21%
25%
27%
36%
21%
34%
35%
44%
47%
5%
6%
7%
8%
9%
9%
10%
12%
14%
14%
15%
21%
22%
23%
27%
29%
30%
35%
35%
41%
62%
0%
10%
20%
30%
40%
50%
60%
70%
warranty claims/documentation
video or animated images
insurance claims or underwriting notes
photographs or other graphical images
patent/IP filings
point
-
of
-
service notes or transcripts
medical records
crime, legal, or judicial reports or evidentiary materials
speech or other audio
field/intelligence reports
employee surveys
text messages/SMS/chat
Web
-
site feedback
contact
-
center notes or transcripts
scientific or technical literature
e
-
mail and correspondence
review sites or forums
customer/market surveys
on
-
line forums
news articles
blogs and other social media
What textual information are you analyzing or do you plan to analyze?
2011 (n=215)
2009 (n=100)
Text/Content Analytics 2011:
User Perspectives





20


The 215 r
espondents
in 2011
chose a total of 962 textual
-
information sources, an average
of 4.5 sources per respondent.

The big news is not news at all: Social sources are by far
the most popular and 4 of the top 5 categories are social/online (as opposed to in
-
en
terprise) sources. Despite
social
’s status, however, it is a source for barely more than 6
out of 10 respondents.


Text/Content Analytics 2011:
User Perspectives





21


Q4: Return on Investment

Question 4 asked, “
How do you measure ROI, Return on Investment? Have you achieved
positive ROI yet?

There were 1
64 respondents.
Results are charted from highest to
lowest
values of the sum of
“currently measure”
and
“plan to measure”:


Out of 164 respondents, 37.8% (62), report that they have achieved positive ROI according
to some measure.
Those 62 respondents
reported achieving ROI according to a total of
182 measures, that is, 2.94 ROI
-
achieved

measures for each respondent who achieved
positive ROI.

Out of 164 respondents, 50 are measuring ROI but have not yet achieved positive ROI
according to any measure.

T
he 112 respondents who are measuring ROI (whether achieved or not) track a total of
385 measures among them,
3.44 measures per respondent.

The following

are several of the
Other

responses given:



Better customer insight, market intelligence, and competitive

intelligence.



Content findability.



Creation of scientific knowledge.



Higher employee engagement and better L&D outcomes.



Improvement in existing processes
,

turnover time.

6%
10%
5%
9%
9%
10%
10%
9%
11%
13%
19%
7%
6%
7%
6%
9%
12%
12%
15%
13%
18%
18%
20%
19%
23%
23%
23%
22%
23%
25%
27%
29%
28%
0%
10%
20%
30%
40%
50%
60%
70%
more accurate processing of claims/requests/casework
faster processing of claims/requests/casework
lower average cost of sales, new & existing customers
fewer issues reported and/or service complaints
reduction in required staff/higher staff productivity
higher search ranking, Web traffic, or ad response
higher customer retention/lower churn
improved new
-
customer acquisition
ability to create new information products
increased sales to existing customers
higher satisfaction ratings
How do you measure ROI, Return on Investment?
Measure: Achieved
Measure: Not Achieved
Plan to Measure
Text/Content Analytics 2011:
User Perspectives





22




Incremental sales lift.



Lowered cost of fraud, more accurate predictive analytics.



Number of action executives can take, estimated dollar savings from risk
correction/avoidance.



Patient outcomes.



Providing better data to scholars.



Reduction of Claim Cost.



Stronger understanding of subconscious emotional zones.



We don´t know how to measur
e it properly.

Q5: Mindshare

A word cloud, generated at
Wordle.net
, seemed
a good way to present responses

to the
query, “
Please enter the names of companies that you know provide text/content
analytics functionality, separated by commas. List up to the fi
rst 8 that come to mind.


There were 129 responses, many offering several
companies.

A bit of data cleansing was
done, to regularize names and remove inappropriate responses.


Contrast with the 2009 word cloud (deliberately rendered smaller than the 201
1 cloud
,
without an attempt to create sizing consistent between the two clouds
) based on 48

response

records
, as follows
:


Note that

IBM acquired SPSS in mid
-
2009.



Text/Content Analytics 2011:
User Perspectives





23


Q6: Spending

Question 6 asked

about 2010 spending and 2011 expected spending.


Questions asked of only current text/content
-
analytics users.

Questions 8 through 13 were posed exclusively to curre
nt text/content analytics users, to
the 81.2% of the 206 respondents to
Q7:
Are you currently using text/content analytics?


Q
8
: Satisfactio
n

Question 8 asked, “
Please rate your overall experience


your satisfaction


with text
analytics.

It offered five categories
, listed here with

response

counts
:



Overall experience/satisfaction

(n=117
, of whom 3 No experience/No opinion
)
.



Ability to
solve business problems

(n=114
, 12 NE/NO
)
.



Solution/technology ease of use

(n=112
, 5 NE/NO
)
.



Solution/technology performance

(n=114
, 4 NE/NO
)
.

2010 spent (n=176)
2011 expected (n=165)
$1 million or above
6%
7%
$500,000 to under $1 million
2%
3%
$200,000 to $499,999
4%
6%
$100,000 to $199,999
7%
7%
$50,000 to $99,000
9%
7%
under $50,000
23%
30%
use open source
15%
19%
15%
19%
23%
30%
9%
7%
7%
7%
4%
6%
2%
3%
6%
7%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
How much did your organization spend in 2010, and how
much do you expect to spend in 2011, on text/content
analytics software/service solutions?
$1 million or above
$500,000 to under $1 million
$200,000 to $499,999
$100,000 to $199,999
$50,000 to $99,000
under $50,000
use open source
Text/Content Analytics 2011:
User Perspectives





24




Availability of professional services/support

(n=112
, 13 NE/NO
)
.

Re
sponse
s
, which
across categories
are somewhat
anomalous,

are as shown:


Overall,
70
% of
current
-
users respondents
who had an opinion
reported themselves

S
atisfied
/Completely Satisfied

even while the break
out
-
category

counts totaled 5
9%, 36%,
47%, and 4
2
% Satisfied/Completely Satisfied.

We can surmis
e that the numbers who
voiced “No experience/No opinion” for the breakout categories
tended to have a
favorable overall experience.

12%
17%
9%
12%
11%
58%
42%
27%
35%
31%
24%
31%
38%
36%
36%
4%
7%
21%
13%
17%
3%
3%
4%
4%
4%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Please rate your overall experience

your satisfaction

with
text/content analytics
Very disappointed
Disappointed
Neutral
Satisfied
Completely satisfied
Text/Content Analytics 2011:
User Perspectives





25



Q9: Overall Experience

Question 9 asked, “Please describe your overall experience


your satisfaction


with text
analytics.” The following are

49

from among the

63

responses
,
categorized,

lightly edited
for spelling and grammar and
with the

name
s of

t
hree

product
s

mask
e
d
:

Happy

It works.

Excellent.

Absolutely essential.

Very satisfied, most goals exceeded, big
jump in effectiveness and customer
satisfaction.

Pretty happy given we are in a highly technical different to monitor/track niche.

Saving a lot of time for our journalists.

We have found having an application with the capabilities to clean and normalize

the text and quantitative data, process it to a form to analyze, and run text mining
and categorization on an ad hoc or production basis has greatly enhanced my
team's capabilities and productivity.

We found great value from using a Speech Analytics solu
tion to retain customers
and improve the overall customer experience through root
-
cause analysis.

I have been working with text analytics for academic and scientific purposes and I
am quite satisfied with results achieved.

I work with nurse and social science researchers. They think that a chat with 20
people is research. I tend to analyze hundreds or thousands of free
-
text comments.
0%
20%
40%
60%
80%
Overall experience /
satisfaction
Ability to solve
business problems
Solution / technology
ease of use
Solution / technology
performance
Availability of
professional services /
support
Experience/satisfaction sentiment polarity
Positive
Neutral
Negative
Text/Content Analytics 2011:
User Perspectives





26


I use software to overcome the biases inherent in manual analysis
.

It Takes Work

Very
powerful tool but requires the organization's ability to take action on the
insights.

Valuable tool; my clients are content to underutilize it, so what is available more
than meets our needs.

Since we use open source, the ROI is bas
ic
ally how much time y
ou put into the
solution and how many problems it solves. We have been successful so far.

Very Satisfied but extremely labor intensive

We provide this as a tool to our clients in our application for publishing press
releases. It works fine but could be
better but that is up to us to implement it fully.

Once you spend man hours to set up the tool, it is extremely consistent on doing
what you tell it to do. I know improvements are coming but I'd like more AI from
text analytics tools than what is currentl
y offered.

Do
-
It
-
Yourself is challenging but not impossible. Very cheap to operate.

Fairly satisfied


problem is I am sole researcher and data/text clean
-
up takes too
much time given other demands.

I've been a user and vendor of text analytics (in fac
t, in my early
<...>

days, we
helped coin the phrase

text analytics

). Vendors generally overpromise and have
difficulty delivering. Bot
h vendors and customers underes
timate the amount of
resources required to get it right. So, still hard to use for mains
tream purposes.

Reservations and complications

Steep learning curve.

I am currently satisfied, but I believe we (as analysts) are just beginning to fully
unlock the full potential of text analytics.

On one hand, I'm amazed and thrilled that this stuff
exists at all. But on the other
hand, I haven't seen anything that does just what I want it to do.

It's opened up opportunities to analyze unstructured data but not at the same level
as structured data.

Works well at highest level of analysis (e.g.
sentiment) but not as well in auto
-
coding for custom (i.e. project) studies.

Tools are good, but lack transparency, ability to explain how conclusions are
reached.

There is still a lot of work required to optimize this technology since it can currently
provide concepts but does not capture context and it

s a lot of slow painful work to
get the software to recognize context in which something is mentioned and
Text/Content Analytics 2011:
User Perspectives





27


accuracy is still not a lot.

Unmet needs

Very promising tech
nology but some difficulties to

-

I
mp
lement smoothly text mining component into existing information system.

-

C
ope with various languages,

formats, volumes
,

etc. of data.

-

M
easure and demonstrate tangible results in terms of improved
information
extraction quality.

-

A
ssess ROI
(reducing
processing time / saving

resources for core tasks e.g.
analysis).

Powerful but overly difficult, impenetrable
-

technology vs. solutions.

An emerging and enabling technology in our business with broad applicability.
Satisfied in our applications with ac
curacy and precision but hitherto disappointed
with export capability to other applications.

Still a volatile market for applications beyond VOC/sentiment analysis. Vendors are
eag
er to please but sometimes over
state the capabilities. However, I still
have
limited experience in solving real business problems with these tools (I am a
consultant).

I think this field is in its infancy. Lots of issues with data quality. Sentiment
analytics often flawed. Hard to scale or automate.

The handful of compani
es and solutions I came across do not seem to marry or
integrate structured and unstructured text easily.
.
.

Al
gor
i
thm
s

are not quite
available as a function or way to improve accuracy.

I feel there is so much more work to be done both on the analysis
side and also on
the business implementation side. While I work heavily in this area, I won't be
more satisfied until I see better end
-
to
-
end integration and until I see more
effective and systematic use of insights.

I do everything myself. The lack of go
od lexical resources and taxonomies is a real
problem that drives up the cost (in manpower) of providing a solution. And the
complexity of the infrastructure required vs. the apparent simplicity of the
problem (in managers' minds) makes it very difficult t
o adjust expectations.

We use <...> and we have to write our own routines to find the text and content
that we are interested in. There are plenty of functions that help us with our goals
but obviously there is still much that we need to do to higher
recall and accuracy.

<...> is the only tool which is both open source and professionally useful. However
in spite of 20 years of development, it still has a very poor user interface as well as
API interface which
hinder

productivity and acceptance at a be
ginner's level.

Skepticism

Jury is still out.

It

s still evolving, accuracy of results something to watch for in iterations.

Text/Content Analytics 2011:
User Perspectives





28


Still learning.

Very early days!

Promising but still very difficult to see quick results
. E
verything seems to take ages
and
it

s been a painful learning curve.

Hard to trust the automated results when you've been used to achieving 100% with
manual human analysis.

Still too new.

Field as a whole is underperforming what is possible.

Though the concept is very appealing, it
is
still in its native stages
,

and
a
lot more
possibilities are left to be explored. IBM Watson is a good step ahead in that
direction.

Very poor, almost useless.

Looking ahead

On the whole, very satisfied with the range of solutions available and their

ease of
use. Very much looking forward to wa
tching the technology progress