Knowledge using KnowBuddy

farmpaintlickInternet and Web Development

Oct 21, 2013 (4 years and 8 months ago)


Anybody can Contribute
nowledge using KnowBuddy

Paul Haley

Automata, Inc.

(412) 741

We describe how a group of distributed collaborators have used web
based software

to economically curate machine understanding of a chapter of a textbook deeply enough

that the resulting artificial intelligence can answers more difficult questions

than previously demonstrated in systems such as

IBM’s Watson, Apple’s
, and Stanford Research Institute’s Aura.

We discuss how the approach promises to impact the web

and transform education, healthcare, and enterprise intelligence.

Google excels at “recall”


response to
simple questions
is not a problem.

Does Google
“know” what a
lipid bilayer is?

Could Google

What is the structure of an integral protein?

information is not
the same as
understanding or

Google doesn’t
have “knowledge”,
per se

Google “recalls”

Will blood cells in a hypertonic
environment burst?

A question from a
college biology

Google finds Yahoo
answers to this
common question from
biology students

But even those findings
are indirect and, in
some cases, misleading

KnowBuddy answers
“no”, correctly

Why can’t search engines do better?

Statistical analysis of even vast amounts of text
produces only superficial correlations between
anything more complex than simple words.

Statistical analysis requires exponentially more
data and processing power for each increase in
the depth of what it might discover.

Statistical analysis produces information which
may be ranked for relevance but which becomes
knowledge only when used in reasoning.

IBM on Watson vs. search engines

The bottom line is that the Jeopardy! Challenge poses a
different kind of problem than what is solved by web

demands that the computer
the question
to figure out exactly what is being asked,
deeply analyze
the available content to extract
precise answers
, and
quickly compute a reliable confidence in light of whatever
supporting or refuting information it finds.

believes that an effective and general solution to this
challenge can help drive the broader impact of automatic
question answering
in science and the enterprise

IBM on QA using logic


based AI approaches to
QA try
to logically prove
an answer is correct from a logical encoding of the question and all
the domain knowledge required to answer it.

approaches are
stymied by two problems:

prohibitive time and manual effort required to
acquire massive
volumes of knowledge

formally encode it as logical formulas
accessible to computer algorithms, and

difficulty of
understanding natural language questions well enough
to exploit such formal encodings

if available.

for dealing with huge amounts of natural language text,
such as Information Retrieval,
suffer from nearly the opposite

in that they can always find documents or passages
containing some keywords in common with the query but
lack the
precision, depth, and understanding necessary to deliver correct

with accurate confidences.

The Knowledge Acquisition Bottleneck

IBM is talking about the typically prohibitive cost of
knowledge acquisition (KA)

KA cost has come down only for superficial &
approximate knowledge using statistical NLP

IBM implies logic for which KA is expensive is

the ability of statistical NLP to discern

IBM lacks such knowledge while admitting it is

to overcome the limitations of statistical NLP

Why not QA using logic


What if it was “
” to
acquire massive
volumes of knowledge
formally encoded as

What if it was “
” to
understand natural
language questions well enough to exploit
such formal

What does KnowBuddy do?

KnowBuddy allows English text to be precisely
understood as formal logic almost as quickly
as such text can be authored in the first place.

KnowBuddy makes it easy to understand
natural language sentences

questions, well enough to exploit formal logic.

How does KnowBuddy work?

KnowBuddy helps people curate English
documents into bodies of knowledge.

KnowBuddy translates knowledge from
English into the formal logic to be exploited.

KnowBuddy answers questions by exploiting
the formal logic using reasoning technology.

How do people curate knowledge?

KnowBuddy helps people clarify what sentences mean with
an easy to use, graphical interface.

The grammatical structure and logical semantics of
sentences become increasingly precise as people click on or
drag &
drop words, phrases, clauses, etc.

People produce high quality logic for moderately complex
sentences in
4 minutes (on their 1


Its real
time, wiki
like platform facilitates collaborative
curation of grammatical, logical, and semantic precision
across all the sentences in the knowledge base.

How much knowledge? How fast?

Distributed people contribute concurrently.

Several people tend to improve each sentence.

users are more competent in English than logic

Fully encoding thousands of sentences covering a
chapter of a college science textbook took a few
minutes per sentence
, on average.

quality formal logic is produced an order of
magnitude more quickly than prior approaches.

How is 10x KA productivity possible?

Essentially, KnowBuddy focuses on productivity
rather than automating understanding, which is
far beyond the state of the art and error prone


with discerning the logical
semantics of English sentences.


statistical techniques to
improve productivity but makes

no assumptions or guesses

Why KnowBuddy doesn’t guess…

The “parse” ranked first by machine learning techniques is
usually not
the one with the right syntax & semantics

below, the 13

ranked interpretation is the correct one


What does improving knowledge acquisition
productivity by an order of magnitude imply?

What does increasing the number of people who
can perform KA by another 10x imply?

10x as much knowledge per contributor

10x as many contributors

<$ per paid contributor

0$ per wiki contributor

Implications of >100x KA

Legions of world
wide web collaborators curating scientific,
medical, and legal wikis into broader, deeper formal knowledge
systems than previously conceivable (e.g., to IBM).

revolutionary in education, science, healthcare, …

Enterprise performance management systems in more direct and
reliable compliance with enterprise requirements and regulations.

perhaps $1,000,000,000,000 value w/in 10 years

intelligent expert systems
bringing orders of magnitude
knowledge to bear

revolutionary in legal, engineering, healthcare, …

Precise answers to
increasingly deep questions.

revolutionary in education,
the web,
machine, …

A Textbook Example

If a Paramecium swims from a hypotonic environment to an
isotonic environment, will its contractile vacuole become
more active

As IBM admits, such questions are far beyond the
foreseeable capabilities of statistical NLP to answer reliably.

KnowBuddy can even explain its answer to this question.

The answer is “no”.

Why is it easy for KnowBuddy but impractical for Watson?

Cognitive Skills for QA

Educators use the Bloom scale to organize how they
teach and how they measure understanding and

The base of the Bloom scale distinguishes recalling
information from understanding knowledge

search engines & statistical NLP are hardly cognitive

The middle of the Bloom scale distinguishes using
knowledge to solve problems from understanding.

skills at and above this level are beyond Watson et al.

Where does KnowBuddy fit on Bloom?

The Paramecium question requires cognitive skills
ranging from the 3

to the 5


In the biology KA experiment discussed here,

subject matter experts rated it as Bloom level #4.

KnowBuddy can answer some level #5 questions but is
currently most competent between levels #3 and #4.

The blood cell question was judged level #3.

A level #5 question near current competence:

Would an animal cell lacking oligosaccharides on the external
surface of its membrane have a reduced ability to transport ions
against an electrochemical gradient

How do statistical approaches rate?

Search engines fail before level #2

Watson demonstrates level #2 capability

Watson has no innate level #3 capability

(applying knowledge)

Does this mean an AI is near?

First, measuring cognitive capabilities of systems should not be taken to imply that
they are sentient.

Second, performing well at the higher levels of the Bloom scale is less about
having knowledge than using it well. Although Watson is impressive and
KnowBuddy can make it more so, becoming even more intelligent requires further
advances in problem solving methodologies and reasoning technology.

Third, spoken language input for question asking (in
addition to
spoken answers,
such as demonstrated by Watson) requires more forms of inference than have yet
been well integrated, including deductive (aka logical), inductive (i.e., statistical),

(e.g., hypothetical) inference.

Then, perhaps by 2020, after millions of sentences formalized by thousands of
people are combined with something like Watson combined with something like
, it may seem there is AI in the ether.

If not AI, what fruit is near at hand?

Electronic textbooks or web services that measurably improve
educational outcomes and standardized or advanced placement
test performance.

Information retrieval in which results are more precisely relevant
than foreseeable using statistical techniques (e.g., answers).

Legal, governance, risk, compliance , and operational systems that
rigorously apply formal logic contained in contracts, legislation,
regulation, policy and doctrine.

Decision support systems of greater breadth, depth, reliability and
overall competence, such as in financial or healthcare applications.

Reduced application life cycle costs for so
called “business logic”.

Case 1: Educational Impact

Inquire: An Intelligent Textbook

“Best Video” Award
, AAAI 2012

New Scientist, August 7, 2012

Earlier this year, the team recruited 72 first
year students from De Anza College in Cupertino,
California, to put the system to the test. Students were given either the full Inquire system, the
Inquire system with the query function switched off, or a paper copy of
Campbell Biology
. They
were then asked to spend 60 minutes reading a section of the book, 90 minutes on homework
problems, and to take a 20
long quiz

who used the full Inquire system scored a grade better on the quiz, on average, than
the other groups. "When we did our assessment, we didn't see any Ds or
, which we did see
in the control groups," says Debbie Frazier, a high school biology teacher who works on the
project. "Our students could use Inquire as a tool and ask it questions that they might be
embarrassed to ask a teacher in person because it makes them feel stupid

But Vulcan needed a solution to the following problem:

such results are promising, perhaps it's a little soon to crown Inquire the future of
textbooks. For starters, after two years of work the system is still only half
finished. The team
plan to encode the rest of the 1400
Campbell Biology

by the end of 2013, but they expect
a team of 18 biologists will be needed to do so. This raises concerns about whether the project
could be expanded to cover other areas of science, let alone other subjects

Which was the genesis of the experiment discussed here.

reducing the time and skill required to encode knowledge by at least an order of magnitude

the business case for knowledge
based textbooks such as
Campbell Biology
is now compelling