Making Watson Fast

prettybadelyngeΛογισμικό & κατασκευή λογ/κού

18 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

72 εμφανίσεις

Making Watson Fast

Daniel Brown

HON111


Need for Watson to be fast to play Jeopardy
successfully


All computations have to be done in a few
seconds


Initial application speed: 1
-
2 hours processing
time per question


Unstructured Information Management
Architecture (UIMA): framework for NLP
applications; facilitates parallel processing


UIMA
-
AS: Asynchronous
Scaleout


UIMA chosen at start for these reasons; other
optimization work only began after 2 years
(after QA accuracy/confidence improved)

UIMA implementation of
DeepQA

UIMA implementation of
DeepQA


Type System


Common Analysis Structure (CAS)


Annotator


CAS multiplier (CM): creates new “children”
CASes


Flow Controller



CASes

can be spread across multiple systems
(processed in parallel) for efficiency

Scaling out


Two systems:


Development (+question processing)


Meant to analyze many questions accurately


Production (+speed)


Meant to answer one question quickly


Scaling out: UIMA
-
AS


(UIMA
-
AS: Asynchronous
Scaleout
)


Manages multithreading, communication between
processes necessary for parallel processing


Feasibility test: simulated production system with
110 processes, 110 8
-
core machines


Goal: less than 3 seconds; actual: more than 3
seconds


Two sources of latency: CAS serialization, network
communication


Optimizing CAS serialization resulted in runtime of <1s

Scaling out: Deployment


400 processes, 72 machines


How to find time
bottlenecks in such a
system?


Monitoring tool


I
ntegrated timing
measurements (in
flow controller
component)

RAM Optimizations


Wanted to avoid disk read/write time delays, so
all (production system) data was put into RAM


Some optimizations:


Reference size reduction


Java object size reduction


Java object overhead


String size


Special hash tables


Java garbage collection with large heap sizes


*Full GC between games

Indri Search Optimizations


Indri search: used to find most relevant 1
-
2 sentences
from Watson database


Using single processor, primary search takes too long
(i.e. 100s)


Supporting evidence search even longer


Solution?


Divide corpus (body of information to search) into chunks,
then assign each search daemon a chunk


(specifically, 50GB corpus of 6.8 million documents, 79
chunks of 100000 documents each, 79 Indri search
daemons with 8 CPU cores each; end result, 32 passage
queries could be run at once)

Preprocessing and Custom Content
Services


Watson must first analyze the passage texts
before being able to use them


Deep NLP analysis
-

semantic/structural parsing,
etc.


Since Watson had to be self
-
contained, this
analysis could be done before run time
(preprocessed)


Used
Hadoop

(distributed file system software)


50 machines, 16GB/8 cores each

Preprocessing and Custom Content
Services


Retrieving the preprocessed data?


Preprocessed data much larger than unprocessed
corpus (~300GB total)


Built custom content server


allocated data to 14
machines, ~20GB each


Documents then were accessed from these
servers

End result


Parallel processing combined with a number
of other performance optimizations resulted
in a final average latency of less than 3
seconds.


No one “silver bullet” solution