Software and Hardware Requirements for

benhurspicyAI and Robotics

Nov 7, 2013 (4 years and 1 month ago)

120 views

Software and Hardware Requirements for
Next
-
Generation Data Analytics

John Feo

Center for Adaptive Supercomputing Software

Pacific Northwest National Laboratory


October,
2010

Graphs are everywhere in science

Astrophysics

Problem
: Outlier detection.

Challenges
:
massive datasets,

temporal variations.

Graph problems
: clustering,

matching.

Bioinformatics

Problem
:
Identifying drug target
proteins.

Challenges
:
Data

heterogeneity,
quality.

Graph problems
:

centrality,
clustering.

Social Informatics

Problem
: Discover emergent
communities, model spread of
information.

Challenges:

new analytics routines,
uncertainty in data.

Graph problems
:

clustering,
shortest paths, flows.

… and in commerce

Sample queries:

Allegiance switching
:
identify entities that switch communities.

Community structure
:

identify the genesis and dissipation of communities

Phase change
:

identify significant change in the network structure

Thought leaders
:

identify influential individuals that drive events

Graph features
:

Topology
: Interaction graph is low
-
diameter and has no good separators

Irregularity
: Communities are not uniform in size

Overlap
: individuals are members of one or more communities


1000x
growth

in 3 years!

has more than
300 million
active users



Small
-
world and scale
-
free

Low diameter (small
-
world):

work explodes

difficult to partition/load
-
balance

high % of nodes are visited quickly

“Six degrees of separation”

S
cale
-
free (power
-
law):

difficult

to partition/load
-
balance

work concentrates in a few nodes

4

0.25
0.50
1.00
Ratio of edges cut

Number of partitions

Block
k-way
RMAT
graph with a
million
vertices

Grids,
Erdős

Rényi
, and Scale
-
Free Graphs

USA Roadmap

Erdős

Rényi

Scale
-
Free

Communication trace from execution
of ½
-
approx
weighted matching

(
data distributed using
Metis
)

5

Challenges

Problem size

Ton of bytes, not ton of flops

Little data locality

Have only parallelism to tolerate latencies

Low computation to communication ratio

Single word access

Threads limited by loads and stores

Synchronization points are simple elements

Node, edge, record

Work tends to be dynamic and imbalanced

Let any processor execute any thread

System requirements

Global shared memory

No simple data partitions

Local storage for thread private data

Network support for single word accesses

Transfer multiple words when locality exists

Multi
-
threaded processors

Hide latency with parallelism

Single cycle context switching

Multiple outstanding loads and stores per thread

Full
-
and
-
empty bits

Efficient synchronization

Wait in memory

Message driven operations

Dynamic work queues

Hardware support for thread migration


Cray XMT

Center for Adaptive Supercomputer Software

Driving development of next
-
generation
multithreaded architectures and methods for
irregular problems

DATA

Scientific Simulations

Sensor Networks

Internet

Databases

Data Analytics

Knowledge Discovery

Trend Analysis

Science

Policy

Commerce

Sponsored by DOD

Partners

Analytic methods and applications

Community thought leaders

Blog Analysis

Community Activities

FaceBook
-

300 M users

Connect
-
the
-
dots

Bus

Hayashi

Zaire

Train

Anthrax

Money

Endo

National Security

People, Places, & Actions

Semantic Web

Anomaly detection

Security

N
-
x contingency analysis

SmartGrid

Chapel for hybrid systems

Next generation multithreaded architectures

Communication software for hybrid systems

Performance analysis and tools

Compiler and runtime system

SmartGrid

Sensor Networks

Mesh generation

N
-
x contingency analysis

Semantic Databases

Bayesian networks

Social networks

Architecture

Runtime

System

Languages

Methods

Applications

Research focus areas

MapReduce

Clustering

Computer Security

BioInformatics

Paths

Shortest path

Betweenness

Min/max flow

Structures

Spanning trees

Connected components

Graph isomorphism

Groups

Matching/Coloring

Partitioning

Equivalence

Methods for data analytics

Influential Factors

Degree distribution

Normal

Scale
-
free

Planar or
non
-
planar

Static or
dynamic

Weighted

or
unweighted

Weight distribution

Typed

or
untyped

edges


Load imbalance

Non
-
planar

Concurrent inserts

and deletions

Difficult to partition

Systems for large
-
scale analytics

Cray XMT

Graph
resides in

XMT memory

RDBS

runs on
cluster

Netezza TwinFin

vap

wsp
d_va

tbsk
y 31

sky ir
temp

precip
-
tbrg

percent_op
aque

rada
r7

rada
r13

rada
r19

vap

wsp
d_va

tbsk
y 31

sky ir
temp

precip
-
tbrg

percent_op
aque

rada
r7

rada
r13

rada
r19

vap

wsp
d_va

tbsk
y 31

sky ir
temp

precip
-
tbrg

percent_op
aque

rada
r7

rada
r13

rada
r19

Replicate per time step

Add dependencies across time steps (not shown
)

Dynamic Bayesian Network Model for
Atmospheric Sensor Network Validation

Convert dynamic Bayesian network to junction tree for
inferencing

Each node in the junction tree is a clique or super node containing several nodes
from original Bayesian network

Junction Tree based “Evidence Propagation” is an efficient method of propagating
the effect of any variable’s state to every other variable in the BN

v
a
p

w
s
p
d
_
v
a

t
b
s
k
y
3
1

sky
ir
tem
p

prec
ip
-
tbrg

percent_opaq
ue

r
a
d
a
r
7

r
a
d
a
r
1
3

r
a
d
a
r
1
9

v
a
p

w
s
p
d
_
v
a

t
b
s
k
y
3
1

sky
ir
tem
p

prec
ip
-
tbrg

percent_opaq
ue

r
a
d
a
r
7

r
a
d
a
r
1
3

r
a
d
a
r
1
9

v
a
p

w
s
p
d
_
v
a

t
b
s
k
y
3
1

sky
ir
tem
p

prec
ip
-
tbrg

percent_opaq
ue

r
a
d
a
r
7

r
a
d
a
r
1
3

r
a
d
a
r
1
9

DBN to Junction Tree Conversion

Temp

Cloudy

Rain

Temp,
Cloudy

Cloudy,
Rain

Model as
Variables

Model as
Cliques

Evidence Propagation

is highly irregular

Compute per node is unbalanced

Degree per node is irregular

Data moves up and down

Loop parallelism intra
-
node

Task parallelism inter
-
node
(recursion, futures)

Data flow scheduling

Data synchronization



SMALL

SYSTEMS

HAVE

100
S

OF

MILLIONS

OF

NODES

Atmospheric Sensor Network Validation Framework

Semantic analysis

Understanding the relationships among data

Data intensive science

National security

Commerce

Data and relationships best expressed as triples and graphs

<
John

owns

Dog
>

18

PNNL, SNL, Cray

Patient

Blue
bumps

Pink rash

High fever

John

Yes

_

Yes

Alice

_

Yes

_

Mary

_

_

Yes

18

Blue
bumps

John

Alice

Mary

has symptom

Pink
rash

has symptom

High
Fever

has symptom

has symptom

Mayo Clinic’s patient database has 650K columns

XMT’s

potential for semantic analysis

Machine

Programming Model

Performance
(inferences per sec)

Author

X86, 32 nodes, 128
cores

MPI

~ 600 K
inf
/sec

Weaver and
Hendler

(ISWC 2009)

X86, 64 nodes, 256
cores

Hadoop

~550K


800K

Urbani

et al

(ESWC 2010)

256
Treadstorm

processors

C++

~2.2M w/ read time
~13M w/o read time

RDFS closure

Inferring new relationships and attributes

Rule based

Original Diagram from
Urbani

et al. "Scalable Distributed Reasoning using MapReduce" ISWC 2009

JOB 3:
Delete Duplicates

JOB 0:
Transitive Closure

<
John

studied under
Jim Browne
>

+

<
Jim Browne
teaches at
UT Austin
>




<
John

attended
UT Austin
>

865 million

triples

Summary

The new HPC is irregular and sparse

Bad news:
we need new architectures

Good news:
there are commercial and consumer applications

Shared memory is necessary, but not sufficient

Need processors that can fill the memory system with requests

Need memory systems that support millions of simultaneous requests

Need fine
-
grain hardware synchronization in memory