Disconnected Diagrams, Multi-grid, Nvidia

sizzlepictureSoftware and s/w Development

Dec 2, 2013 (3 years and 9 months ago)

117 views

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


1


Disconnected Diagrams, Multi
-
grid, Nvidia
& all that
y


Richard Brower (Boston University)


James Brannick (Penn)

Ron Babich (BU)


Kipton Barros (BU)

Mike Clark (BU)


George Fleming (Yale)


James Osborn (Argonne)


Claudio Rebbi (BU)



QCDNA
2008


Regensburg

Sept
5
,
2008


y

WARNING: Much here is a FUTURE plan NOT proven results but .....

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


What is QCD?

Acronym

Definition:


QCD

Qualified Charitable Distribution (IRS)

QCD

Quality, Cost, Delivery

QCD

Quantum Chromodynamics

QCD

QuarkCopyDesk (file extension)

QCD

Quasi
-
Cyclic Dyadic

QCD

Quick Change Directory

QCD

Quick Claim Deed (real estate)

QCD

Quintessential CD

(PC media player)

QCD

Quit Claim Deed (real estate)

QCD

Quality Control Department

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


What do these mean?

QCD From Wikipedia, the free encyclopedia




Quintessential Player, formerly known as Quintessential CD




Quality, Cost, Delivery, a three
-
letter acronym used in lean manufacturing




Quad City DJ's, Southern rap group




Quick Control Dial, a control on many DSLR cameras, like the Canon EOS 40D




Quote
-
Comma
-
Delimited known also as Comma
-
separated values




Quantum chromodynamics, the theory describing the Strong Interaction

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)




Physics:

(How strange is the proton?)




Algorithms:

(Multi
-
grid to the rescue?)





Hardware:
(GPU propagator farm?)



Outline

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


5

Physics:

Disconnected Diagrams


Connected vs. Disconnected




Want matrix element:




X
u,d
N
N
t = 0
t = t
f
t = t'
X

X
u,d
u,d,s
N
N
QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


How strange
y

is the proton?
Who cares?



Violation of Standard Model:


Dark Energy (Neutralino scattering):


NuTev anomaly:



Nucleon Physics (include u/d + s quares):



iso
-
scalar Form Factors, nucleon structure function,
Spin crisis for proton, matrix element etc.

y

see Lattice
2008
:
http://conferences.jlab.org/lattice
2008
/parallel
-
bytopic
-
struct.html

S.Collins, G. Bali, A.Schafer


Hunting for the strangeness ... nucleon”

Takumi Doi et al “Strangeness and glue in the nucleon from lattice QCD

Ron Babich et al “
Strange quark content of the nucleon


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


8

Direct detection of dark matter



In SUSY, the neutralino scatters
from a nucleon via Higgs exchange:



The strange scalar matrix element is
a major uncertainty:



Uncertainty in
f
Ts

gives up to a factor
of
4
uncertainty in the cross
-
section!



Bottino et al., hep
-
ph/
0111229
;


Ellis et al., hep
-
ph/
0502001

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


9

Nuclear Experiment

Pate et al., arXiv:
0805.2889
[hep
-
ex]

J. Liu et al., arXiv:0706.0226 [nucl
-
ex]


(see also Young et al., nucl
-
ex/
0605010
)


Parity
-
violating electron scattering
(SAMPLE, HAPPEx, PVA
4
, G
0
)


PVES + BNL E
734

p
scattering)


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)




Monte Carlo update
(Long auto correlations times)



Global‏Heat‏bath‏aka‏“Stochastic‏Estimator:”‏
(Zero auto correlations)

Find
Á

= D
-
1

´

for
´

Gaussian

or
Gaug
e


or Z
2

(
Zero auto correlations!)


With
<
´
y

´
x

> =
±
yx

Algorithm


A
xy

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



Variance reduction
:


Dilution

vs hopping parameter
y

(Short distance)


Multi
-
grid

vs‏“deflation”/truncation
y

(Long distance)


Curing volume divergence


Trace versus Gauge fluctuations


Better and more source (all to all?).


Full multi
-
grid O(N long N) Trace?

Improving Stochastic Estimate



y

S.Collins, G. Bali, A.Schafer


Hunting for the strangeness ... nucleon”

x

y

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


12


Two sources of error: gauge noise and error in trace. In this
calculation, we largely eliminate the second source by calculating a
“nearly‏exact”‏trace‏on‏four‏time
-
slices.


864
sources (x
12
for color/spin). A given source is nonzero on
4
sites on each of
4
time
-
slices.


Minimal spatial separation between sites is . Small
residual contamination is gauge
-
variant and averages to zero.


Equivalent to using a single stochastic source with
“extreme‏dilution.”

Trace estimation

4
x
6
3

=
864

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


13

Preliminary Methods


Configurations‏were‏provided‏by‏the‏LHPC‏“Spectrum‏Collaboration”



anisotropic lattice with


2
dynamical flavors, Wilson fermion and gauge actions


863
configurations




64
(x
12
) inversions per configuration at the light quark mass, for the
nucleon correlators


864
(x
12
) inversions per configuration at the strange mass, for the
trace

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


14

Strange scalar form factor

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


15


Conventionally, one extracts the (e.g. zero
-
momentum) form factor from the
large
t

behavior of the ratio



(or from a similar expression integrated over time).


Instead, we fit the numerator directly, since this allows us


to avoid contamination from backward
-
propagating states, which are
problematic due to the short temporal


extent of our lattice ( ).


to explicitly take into account the contribution of (forward
-
propagating)
excited states.


In the following, we always treat the system


symmetrically with

Ratio approach

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


17

Direct fit


First, we perform a fit to the nucleon two
-
point function, of the
form


















The coefficients and masses are very well
-
determined, since we
are required to calculate correlators from all initial times (a total of
863
x
64
=
55
,
232
).


Next, we perform a fit to the three
-
point function,




Here
j
1

and
j
2

are the form factors for the proton and its first excited
state, and
j
12

is a transition matrix element between them. In
practice, we expect
j
2

and
j
12

to absorb the contribution of still higher
states, and trust only
j
1

to be reliable.

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


18

Strange scalar form factor


For the renormalization
-
invariant quantity
f
Ts
, we estimate















































where we have inserted the physical nucleon mass. The second error is the
uncertainty in relating this mass to the lattice scale, the first error is
statistical, and no other systematics are included.


Note that the matrix element in the numerator
was calculated for a world
with a
400
MeV pion. If we work consistently
in such a world by inserting our calculated
nucleon mass, the scale dependence drops
out, and we find

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


19

Momentum dependence of G
S
(q
2
)


PRELIMINARY

s

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


20

Strange axial form factor

PRELIMINARY


Results have not been renormalized.


Calculated value is distinct from zero at the
3
-
s

level.

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Error = O(L
3
/
2
)
)

as L
3

)

1


For
Exact Trace

in a Connect correlator,

t = 0
t = t
f
t = t'
X
QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)



QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Most Important New Trick:

Multi
-
grid Variance Reduction


The signal and variance of the first term is down by
1
to
2
orders of magnitude because D
c

»

D



The Coarse level Trace for D
-
1
c

is as cheap to calculate as
the level down operator inverse.



This can of course be done recursively giving (I think) an
O(N log N)trace calculation to fixed tolerance.

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


HARDWARE


G
raphics hardware is well suited to highly parallel
numerical tasks.


Hardware vendors provide development tools to support
high performance computing.


NVIDIA'S CUDA offers direct access to graphics hardware
through a programming language similar to C.


Dirac
-
Wilson operator which runs at an effective
68
Gigaflops on the Tesla C
870
GPU.


The recently released GTX
280
GPU at
92
Gigaflops and
we expect improvement pending code optimization.


(Now
98
Gigaflops hope to get O(
150
) Gigaflops)

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Nvidia GPU architecture

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Two Generations Consumer vs HPC GPUs

Consumer cards
)

High Performance (HPC) GPUs

I.
8880
GTX
)

Tesla C
870


(
16
multi
-
processor with
8
cores each)

II. GTX
280
)

Tesla C
1060


(
30
multi
-
processor with
8
cores each)

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


C
870
code using
60
% of the memory bandwidth.

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


http://www.scala
-
lang.org/

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Future software Plans


Need find out why we are
only saturating
60
%

of Memory bandwidth


Further educe memory traffic:


8
real number per SU(
3
)

matrix (
2
/
3
of
12
used now)


shear spinors in
4
3

blocks (
5
/
9
of used now)



Generalize to
clover Wilson

&
Domain Wall

operator (slightly better
flops/mem ratio).



DMA between GPU on Quad

system and network for cluster



Start to design
SciDAC API for many
-
core

technologies.


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Tesla
10
-
Series: What’s the Big Deal?

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Consumer Chip GTX
280
)

Tesla C
1060


QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


1
U

Quad S
1070
System $
8
K

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


CUDA
2.0
(Compute Unified Device
Architecture)


Can compile CUDA code into highly efficient SSE
-
based multi
-
threaded C code

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Need a GPU Dirac Propagator Farm


The Clark
-
Kennedy RHMC Paradox:

(Faster you go harder it is to keep up)



Analysis is now the


χιλλεύς

heel”



Solution: Dedicated Analysis farm.



GPU can deliver O(
10
) to O(
100
) gain in flops/$



Two quad Tesla
)

1
Sustained Teraflop!



Two quad Tesla @
25
K
?
´

One BG/L rack @
2
,
000
K

QCDNA
2008
-

Sept
5
,
2008
Rich Brower (Boston U.)


Commercial Break:


BOSTON POST DOC IN SEPT
2009


PetaAPPS/SciDAC fellow


(QCDNA in Boston Fall
2009
?)