Knowledge using KnowBuddy

farmpaintlickInternet and Web Development

Oct 21, 2013 (3 years and 11 months ago)

92 views

Anybody can Contribute
K
nowledge using KnowBuddy

Paul Haley

Automata, Inc.

paul@haleyAI.com

(412) 741
-
6420

We describe how a group of distributed collaborators have used web
-
based software

to economically curate machine understanding of a chapter of a textbook deeply enough

that the resulting artificial intelligence can answers more difficult questions

than previously demonstrated in systems such as

IBM’s Watson, Apple’s
Siri
, and Stanford Research Institute’s Aura.


We discuss how the approach promises to impact the web

and transform education, healthcare, and enterprise intelligence.

Google excels at “recall”


Presenting
information

in
response to
simple questions
is not a problem.



Does Google
“know” what a
lipid bilayer is?



Could Google
teach?

What is the structure of an integral protein?


Presenting
information is not
the same as
understanding or
answering
questions



Google doesn’t
have “knowledge”,
per se



Google “recalls”
information

Will blood cells in a hypertonic
environment burst?


A question from a
college biology
textbook



Google finds Yahoo
answers to this
common question from
biology students



But even those findings
are indirect and, in
some cases, misleading



KnowBuddy answers
“no”, correctly

Why can’t search engines do better?


Statistical analysis of even vast amounts of text
produces only superficial correlations between
anything more complex than simple words.



Statistical analysis requires exponentially more
data and processing power for each increase in
the depth of what it might discover.



Statistical analysis produces information which
may be ranked for relevance but which becomes
knowledge only when used in reasoning.

IBM on Watson vs. search engines


The bottom line is that the Jeopardy! Challenge poses a
different kind of problem than what is solved by web
search.



It
demands that the computer
deeply
analyze
the question
to figure out exactly what is being asked,
deeply analyze
the available content to extract
precise answers
, and
quickly compute a reliable confidence in light of whatever
supporting or refuting information it finds.



IBM
believes that an effective and general solution to this
challenge can help drive the broader impact of automatic
question answering
in science and the enterprise
.

IBM on QA using logic
or

NLP


Classic
knowledge
-
based AI approaches to
QA try
to logically prove
an answer is correct from a logical encoding of the question and all
the domain knowledge required to answer it.

Such
approaches are
stymied by two problems:


the
prohibitive time and manual effort required to
acquire massive
volumes of knowledge

and
formally encode it as logical formulas
accessible to computer algorithms, and


the
difficulty of
understanding natural language questions well enough
to exploit such formal encodings

if available.



Techniques
for dealing with huge amounts of natural language text,
such as Information Retrieval,
suffer from nearly the opposite
problem

in that they can always find documents or passages
containing some keywords in common with the query but
lack the
precision, depth, and understanding necessary to deliver correct
answers

with accurate confidences.


The Knowledge Acquisition Bottleneck


IBM is talking about the typically prohibitive cost of
knowledge acquisition (KA)



KA cost has come down only for superficial &
approximate knowledge using statistical NLP



IBM implies logic for which KA is expensive is
beyond

the ability of statistical NLP to discern



IBM lacks such knowledge while admitting it is
needed

to overcome the limitations of statistical NLP

Why not QA using logic
and

NLP?


What if it was “
cheap
” to
acquire massive
volumes of knowledge
formally encoded as
logical
formulas
?



What if it was “
easy
” to
understand natural
language questions well enough to exploit
such formal
encodings
?



What does KnowBuddy do?


KnowBuddy allows English text to be precisely
understood as formal logic almost as quickly
as such text can be authored in the first place.



KnowBuddy makes it easy to understand
natural language sentences
-

including
questions, well enough to exploit formal logic.



How does KnowBuddy work?


KnowBuddy helps people curate English
documents into bodies of knowledge.



KnowBuddy translates knowledge from
English into the formal logic to be exploited.



KnowBuddy answers questions by exploiting
the formal logic using reasoning technology.

How do people curate knowledge?


KnowBuddy helps people clarify what sentences mean with
an easy to use, graphical interface.



The grammatical structure and logical semantics of
sentences become increasingly precise as people click on or
drag &
drop words, phrases, clauses, etc.



People produce high quality logic for moderately complex
sentences in
1
-
4 minutes (on their 1
st

day)
.



Its real
-
time, wiki
-
like platform facilitates collaborative
curation of grammatical, logical, and semantic precision
across all the sentences in the knowledge base.

How much knowledge? How fast?


Distributed people contribute concurrently.



Several people tend to improve each sentence.



Typical
users are more competent in English than logic
.



Fully encoding thousands of sentences covering a
chapter of a college science textbook took a few
minutes per sentence
, on average.



High
-
quality formal logic is produced an order of
magnitude more quickly than prior approaches.

How is 10x KA productivity possible?


Essentially, KnowBuddy focuses on productivity
rather than automating understanding, which is
far beyond the state of the art and error prone
.



KnowBuddy
assists

with discerning the logical
semantics of English sentences.



KnowBuddy
leverages

statistical techniques to
improve productivity but makes

no assumptions or guesses
.


Why KnowBuddy doesn’t guess…


The “parse” ranked first by machine learning techniques is
usually not
the one with the right syntax & semantics


below, the 13
th

ranked interpretation is the correct one

>100xKA


What does improving knowledge acquisition
productivity by an order of magnitude imply?



What does increasing the number of people who
can perform KA by another 10x imply?



10x as much knowledge per contributor


10x as many contributors


<$ per paid contributor


0$ per wiki contributor

Implications of >100x KA


Legions of world
-
wide web collaborators curating scientific,
medical, and legal wikis into broader, deeper formal knowledge
systems than previously conceivable (e.g., to IBM).


revolutionary in education, science, healthcare, …



Enterprise performance management systems in more direct and
reliable compliance with enterprise requirements and regulations.


perhaps $1,000,000,000,000 value w/in 10 years



Artificially
intelligent expert systems
bringing orders of magnitude
more
knowledge to bear
.


revolutionary in legal, engineering, healthcare, …



Precise answers to
increasingly deep questions.


revolutionary in education,
the web,
man
-
machine, …


A Textbook Example


If a Paramecium swims from a hypotonic environment to an
isotonic environment, will its contractile vacuole become
more active
?



As IBM admits, such questions are far beyond the
foreseeable capabilities of statistical NLP to answer reliably.



KnowBuddy can even explain its answer to this question.


The answer is “no”.



Why is it easy for KnowBuddy but impractical for Watson?

Cognitive Skills for QA


Educators use the Bloom scale to organize how they
teach and how they measure understanding and
intelligence.



The base of the Bloom scale distinguishes recalling
information from understanding knowledge


search engines & statistical NLP are hardly cognitive



The middle of the Bloom scale distinguishes using
knowledge to solve problems from understanding.


skills at and above this level are beyond Watson et al.


Where does KnowBuddy fit on Bloom?


The Paramecium question requires cognitive skills
ranging from the 3
rd

to the 5
th

level.


In the biology KA experiment discussed here,

subject matter experts rated it as Bloom level #4.



KnowBuddy can answer some level #5 questions but is
currently most competent between levels #3 and #4.


The blood cell question was judged level #3.


A level #5 question near current competence:


Would an animal cell lacking oligosaccharides on the external
surface of its membrane have a reduced ability to transport ions
against an electrochemical gradient
?

How do statistical approaches rate?


Search engines fail before level #2
(understanding
)



Watson demonstrates level #2 capability



Watson has no innate level #3 capability

(applying knowledge)

Does this mean an AI is near?


First, measuring cognitive capabilities of systems should not be taken to imply that
they are sentient.



Second, performing well at the higher levels of the Bloom scale is less about
having knowledge than using it well. Although Watson is impressive and
KnowBuddy can make it more so, becoming even more intelligent requires further
advances in problem solving methodologies and reasoning technology.



Third, spoken language input for question asking (in
addition to
spoken answers,
such as demonstrated by Watson) requires more forms of inference than have yet
been well integrated, including deductive (aka logical), inductive (i.e., statistical),
and
abductive

(e.g., hypothetical) inference.



Then, perhaps by 2020, after millions of sentences formalized by thousands of
people are combined with something like Watson combined with something like
Cyc
, it may seem there is AI in the ether.

If not AI, what fruit is near at hand?


Electronic textbooks or web services that measurably improve
educational outcomes and standardized or advanced placement
test performance.



Information retrieval in which results are more precisely relevant
than foreseeable using statistical techniques (e.g., answers).



Legal, governance, risk, compliance , and operational systems that
rigorously apply formal logic contained in contracts, legislation,
regulation, policy and doctrine.



Decision support systems of greater breadth, depth, reliability and
overall competence, such as in financial or healthcare applications.



Reduced application life cycle costs for so
-
called “business logic”.

Case 1: Educational Impact


Inquire: An Intelligent Textbook

“Best Video” Award
, AAAI 2012


New Scientist, August 7, 2012


Earlier this year, the team recruited 72 first
-
year students from De Anza College in Cupertino,
California, to put the system to the test. Students were given either the full Inquire system, the
Inquire system with the query function switched off, or a paper copy of
Campbell Biology
. They
were then asked to spend 60 minutes reading a section of the book, 90 minutes on homework
problems, and to take a 20
-
minute
-
long quiz
.


Students
who used the full Inquire system scored a grade better on the quiz, on average, than
the other groups. "When we did our assessment, we didn't see any Ds or
Fs
, which we did see
in the control groups," says Debbie Frazier, a high school biology teacher who works on the
project. "Our students could use Inquire as a tool and ask it questions that they might be
embarrassed to ask a teacher in person because it makes them feel stupid
.“



But Vulcan needed a solution to the following problem:


While
such results are promising, perhaps it's a little soon to crown Inquire the future of
textbooks. For starters, after two years of work the system is still only half
-
finished. The team
plan to encode the rest of the 1400
-
page
Campbell Biology

by the end of 2013, but they expect
a team of 18 biologists will be needed to do so. This raises concerns about whether the project
could be expanded to cover other areas of science, let alone other subjects
.



Which was the genesis of the experiment discussed here.



reducing the time and skill required to encode knowledge by at least an order of magnitude


the business case for knowledge
-
based textbooks such as
Campbell Biology
is now compelling