Streaming Algorithms In Graphics Hardware

skillfulwolverineSoftware and s/w Development

Dec 2, 2013 (3 years and 7 months ago)

74 views

StreamingAlgorithmsInGraphicsHardware
SureshVenkatasubramanian
AT&TLabs?Research
StreamingAlgorithmsinGraphicsHardwarep.1/22
TwoConvergingTrendsInComputation...
Theaccelerateddevelopmentofgraphicsacceleratorcards
(GPUs)
Currentgraphicsacceleratorsare
cheap
and
ubiquitous
.
TheyaredevelopingfasterthanCPUs(roughly1.7timesfaster
peryear)
Theincreasingneedforstreamingcomputations
Originalmotivationfromdealingwithlargedatasets
Alsointerestingfromperspectiveofmultimediacomputations,
imageprocessing,visualization,andotherareas.
StreamingAlgorithmsinGraphicsHardwarep.2/22
TwoConvergingTrendsInComputation...
Theaccelerateddevelopmentofgraphicsacceleratorcards
(GPUs)
Currentgraphicsacceleratorsare
cheap
and
ubiquitous
.
TheyaredevelopingfasterthanCPUs(roughly1.7timesfaster
peryear)
Theincreasingneedforstreamingcomputations
Originalmotivationfromdealingwithlargedatasets
Alsointerestingfromperspectiveofmultimediacomputations,
imageprocessing,visualization,andotherareas.
StreamingAlgorithmsinGraphicsHardwarep.2/22
GraphicsCardsCanCompute!
Agraphicscardtakesastreamofobjects(points,lines,triangles),
andrendersthemonascreen.
Graphics Card
Eachpixelinthescreencanbeviewedasasmallprocessingunit.
glBlend





z-test








StreamingAlgorithmsinGraphicsHardwarep.3/22
LargeSetOfDiverseApplications
OcclusionCullinginscenes
Shadingonobjects
ViewdependentSimplicationofShapes
GeometricOptimization
MotionPlanningandCollisionDetection
Imageprocessing(waveletanalysis)
PhysicalSimulations
ScienticComputations(matrixmultiplication)
Dataanalysis(especiallyspatialdata)
StreamingAlgorithmsinGraphicsHardwarep.4/22
T
HE
G
RAPHICS
P
IPELINE
:A
CLOSERLOOK
StreamingAlgorithmsinGraphicsHardwarep.5/22
SureshWritesAProgram
#include<gl.h>
...
glLight(..)//Setlighting
glOrtho(..)//Setviewpoint
//Nowdrawobjects
glColor(1,0,0);
glBegin(GL_TRIANGLES)
glVertex(x1,y1,z1)
...
glEnd()
gcctriangle.cc-lGL
StreamingAlgorithmsinGraphicsHardwarep.6/22
ProcessingObjectsintheGPU:Step1
Fragments
CPUGPU
Lighting
Color
Vertices
Viewpoint
Calculations
and color
transforms
Lighting
Rasterization
TheFixed-FunctionPipeline
StreamingAlgorithmsinGraphicsHardwarep.7/22
ProcessingfragmentsintheGPU:Step2
−Test
















































































































































































































































































Stencil
Test
Depth
Test
α
???
Texture
Memory
Fragments
Blending
Frame buffer
GPUDisplay
TheFixed-FunctionPipeline
StreamingAlgorithmsinGraphicsHardwarep.8/22
Sowhere’sthecomputation?
Stenciltest
if
(buffer.stencil=K)continue
else
dropfragment.
Depthtest
if
(frag.depth<buffer.depth)continue
else
dropfragment.
Blendingoperations
buffer.color=buffer.color
op
fragment.color
Generalarithmeticandbooleanfunctionforblending.
Generalcomparisonfunctions.
Convolutionandhistogrammingoperators.
StreamingAlgorithmsinGraphicsHardwarep.9/22
ProgramablePipelines
Fragments
Viewpoint
Calculations
and color
transforms
Lighting
Rasterization
Vertex program
Fragment program
Vertexprogramexecutesoneachvertex.
Fragmentprogramexecutesoneachfragment.
StreamingAlgorithmsinGraphicsHardwarep.10/22
Capabilities
Largeinstructionset:generalpurposearithmeticandscientic
calculationsonscalarsandvectors
Programscanbelarge:hundredsofinstructionscanbe
executedinasinglepass.
Texturebuffersallowmoregeneralpurposememoryaccess.
Somelimitedpointerindirectionforarraylookups.
Noloopinginfragmentprograms;someloopingpermittedin
vertexprograms.
StreamingAlgorithmsinGraphicsHardwarep.11/22
Haven’tWeSeenThisBefore?
Standardstreamingmodelofcomputation
















































































































Memory
4
3
5
1
9
1
16
25
Output
Input
Stream Algorithm
What'sdifferent?
Limitedmemory(reallyaconstantvspolylogn).
Pipeliningrestriction:allitemshavetobetreatedthesameway.
Multi-passpotential:standardstreamingmodelsassume
exactlyonepass(withafewexceptions).
StreamingAlgorithmsinGraphicsHardwarep.12/22
MaybeWeHaveSeenThisBefore
SystolicArrays[Kung+Leiserson1978]
145
Specialcase(1-D)ofsystolicarrays
Havemorememoryaccess
Earlygraphicscarddesignwasintheframeworkofsystolic
computation!
StreamingAlgorithmsinGraphicsHardwarep.13/22
MaybeWeHaveSeenThisBefore
SystolicArrays[Kung+Leiserson1978]
145
Specialcase(1-D)ofsystolicarrays
Havemorememoryaccess
Earlygraphicscarddesignwasintheframeworkofsystolic
computation!
StreamingAlgorithmsinGraphicsHardwarep.13/22
MaybeWeHaveSeenThisBefore
SystolicArrays[Kung+Leiserson1978]
145
Specialcase(1-D)ofsystolicarrays
Havemorememoryaccess
Earlygraphicscarddesignwasintheframeworkofsystolic
computation!
StreamingAlgorithmsinGraphicsHardwarep.13/22
MaybeWeHaveSeenThisBefore
SystolicArrays[Kung+Leiserson1978]
145
Specialcase(1-D)ofsystolicarrays
Havemorememoryaccess
Earlygraphicscarddesignwasintheframeworkofsystolic
computation!
StreamingAlgorithmsinGraphicsHardwarep.13/22
MaybeWeHaveSeenThisBefore
SystolicArrays[Kung+Leiserson1978]
145
Specialcase(1-D)ofsystolicarrays
Havemorememoryaccess
Earlygraphicscarddesignwasintheframeworkofsystolic
computation!
StreamingAlgorithmsinGraphicsHardwarep.13/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
StreamingAlgorithmsinGraphicsHardwarep.14/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
StreamingAlgorithmsinGraphicsHardwarep.14/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
StreamingAlgorithmsinGraphicsHardwarep.14/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
StreamingAlgorithmsinGraphicsHardwarep.14/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
StreamingAlgorithmsinGraphicsHardwarep.14/22
GraphicsCard:StreamingPipelinedArchitecture
Objectsarepresentedtothecardone-by-one.
Onceprocessed,anobjectispassedtothenextphaseand
doesnotreturn.
SpatialParallelism:Eachpixelprocessesadifferentstream.
Thereislimitedlocalmemory:eachobjectsessentiallycarries
itsownstatewithit.
Pipelining:
Eachobjectisprocessedinthesameway
.
Signicantadvantagesaccruefromexploitingdataparallelismand
thepipelinemodel.
StreamingAlgorithmsinGraphicsHardwarep.14/22
E
XAMPLES
StreamingAlgorithmsinGraphicsHardwarep.15/22
AnExample:VoronoiDiagrams[HCKLM99]
StreamingAlgorithmsinGraphicsHardwarep.16/22
AnExample:VoronoiDiagrams[HCKLM99]
Renderright-angledconesforeachpoint.
StreamingAlgorithmsinGraphicsHardwarep.16/22
AnExample:VoronoiDiagrams[HCKLM99]
Renderright-angledconesforeachpoint.
StreamingAlgorithmsinGraphicsHardwarep.16/22
AnExample:VoronoiDiagrams[HCKLM99]
Renderright-angledconesforeachpoint.
Setdepth-testto
LESS,soonlytheclosestpointstothe
viewpointarerendered.
StreamingAlgorithmsinGraphicsHardwarep.16/22
AnExample:VoronoiDiagrams[HCKLM99]
Renderright-angledconesforeachpoint.
Setdepth-testto
LESS,soonlytheclosestpointstothe
viewpointarerendered.
StreamingAlgorithmsinGraphicsHardwarep.16/22
AnExample:VoronoiDiagrams[HCKLM99]
Renderright-angledconesforeachpoint.
Setdepth-testto
LESS,soonlytheclosestpointstothe
viewpointarerendered.
Alsogetdiameterforfree-using
GREATER
StreamingAlgorithmsinGraphicsHardwarep.16/22
BoundingBox[AKMV03]
Eachpointintheprimalisdualizedtoaplane.
Framebufferviewedasdualplane:Eachpixelrepresentsa
direction
Upperandlowerenvelopesindualgiveextremepoints(ala
convexhulls).
Superimposingdifferentduals(usingGaussmap),asimple
fragmentprogramcomputestheboundingbox
StreamingAlgorithmsinGraphicsHardwarep.17/22
QuantileComputation
Wewanttocomputethe



-highestelementofasequence.
Depthorderinginscenes.
Naturalstreamingprimitive(selectionandsorting).
Relatestovariousgeometricoptimizationproblems.
Easyinstreammodel:
[MP80]:Computingin

passesrequires

 



memory.
[MRL98]:
 


-approximationtorankinONEpasswith

 








memory.
[GK01]:

 


 




memory.
Noneofthesealgorithmsarepipelined.
StreamingAlgorithmsinGraphicsHardwarep.18/22
One-andtwo-sidedtests[GKMV03]
Withhardware,wehave

 

memory




 

passesfor
generalstreamingalgorithm.
Depthtestprovidesthe
one-sidedtest
Isfragment.depth


?
Lemma.Computing



highestelementofasequencerequires

passeswitha
one-sidedtest.
Supposewehada
two-sidedtest
Is


fragment.depth


?
Lemma.Withatwo-sideddepthtest,



highestelementcanbecomputedin
 
 
passes(randomized)
StreamingAlgorithmsinGraphicsHardwarep.19/22
Wheredowe?ndatwo-sidedtest?
Shadowtestinpipeline(onlyinnVidiachips)[C02].
Usedtorendershadowsonobjects.
Functionally,provides(texture)bufferforsecondsideoftest.

-testisusedtosimulatesecondside.
Thiscanalsobedoneusingfragmentprograms.
Otherareaswheretwo-sidedtestisuseful[GKMV03]:
Sweepinganarrangementofshapes
Usedtocomputebooleancombinationsofobjects.
StreamingAlgorithmsinGraphicsHardwarep.20/22
HowDoWeWritePrograms
Cg(fromnVidia):C-likesystemcallsarecompiledintovertex
andfragmentprograms.
Cancompilefordifferenttargets(OpenGl/DirectX)
Canincorporatelimitsonprogramsondifferentcards
HLSL:MicrosoftHighLevelShaderLanguage
GL2.0:OpenGLStandardforhigherlevelprogramming
constructs.
GeneralPurposeStreamProgramming
Highlevelstreamprogrammingconstructsbuiltovershader
languages(BROOK)
StreamingAlgorithmsinGraphicsHardwarep.21/22
PipelinedStreaming:Conclusions
Thesearchitecturesareevermoreprevalent.
Graphicschipsagoodplatformforgeneralpurposecomputing.
Numerousapplications;demonstrableperformancegain.
StreamingAlgorithmsinGraphicsHardwarep.22/22
PipelinedStreaming:Conclusions
Thesearchitecturesareevermoreprevalent.
Graphicschipsagoodplatformforgeneralpurposecomputing.
Numerousapplications;demonstrableperformancegain.
Whatcomputationalmodeldothesearchitecturestinto?
StreamingAlgorithmsinGraphicsHardwarep.22/22
PipelinedStreaming:Conclusions
Thesearchitecturesareevermoreprevalent.
Graphicschipsagoodplatformforgeneralpurposecomputing.
Numerousapplications;demonstrableperformancegain.
Whatcomputationalmodeldothesearchitecturestinto?
Strictlyweakerforgeneralstreaming;probablystrongerthan
circuits
Resultsfromsystoliccomputationuseful?
Newideasneededforprovingupper/lowerboundsbecauseof
multipassnatureofcomputations.
StreamingAlgorithmsinGraphicsHardwarep.22/22