Memory Management and Parallelization

reelingripehalfSoftware and s/w Development

Dec 14, 2013 (3 years and 7 months ago)

66 views

Memory Management and
Parallelization

Paul Arthur Navrátil

The University of Texas at Austin

Overview


Uniprocessor Coherent Ray Tracing


Pharr et al., 1997


Parallel Ray Tracing Summary


Chalmers, et al. 2002


Demand
-
Driven Ray Tracing


Wald, et al. 2001


Hybrid Scheduling


Reinhard, et al. 1999

Background: Reyes
[Cook et al. 87]


Inspirations


Texture cache, CATs


Programmable shader


Single primitive type


Dicing


Memory effects of
scan
-
line architecture


Pharr: System


Use both texture and geometry ‘cache’


Lazy loading, LRU replacement


One internal primitive


triangles


Optimize ray intersection calculation


Known space requirements to represent


Tessellation of other primitives increases space reqs


Procedurally generated geometry



Pharr: Geometry Cache


Geometry grids



regular grid of voxels


Few thousand triangles per voxel


Acceleration grid

of few hundred triangles for ray
intersection calculation


All geometry of voxel stored in contiguous block of
memory, independent of geometry in other voxels

spatial locality in scene tied to spatial locality in mem


Different voxel sizes causes memory fragmentation


Adaptive voxel sizes?

Voxel size bounded by cache size for hardware impl?

Pharr: Ray Grouping


Scheduling grid

--

Queue all rays inside voxel


Dependencies in ray tree prevent perfect scheduling


Store all information needed for computation with ray

each ray can be independently calculated (parallelism!)


Exploits coherence from beam of rays, disparate rays
that move through same space


Superior to: fixed
-
order traversal of ray tree;


ray clustering

Pharr: Radiance Calculation


Outgoing radiance is emitted radiance plus weighted
average of incoming radiances


f
r

is bidirectional reflectance distribution function (BRDF)


At intersection, weights calculated for each spawned
secondary ray


Final weight is product of all BRDF values of all surfaces
on path from point on ray to the image plane

Pharr: Voxel Scheduling


Naïve


iterate across voxels


Better


weight voxels by cost and benefit


Cost:
how expensive to process the rays in the voxel?


High geometry in voxel has higher cost


Much voxel geometry not in memory has higher cost


Benefit:
how much progress to completion from voxel?


Many rays in voxel yields more benefit


Large weights on rays yields more benefit

Pharr: System Summary

Pharr: Lazy Loading Results

Pharr: Reordering Results

Pharr: Scheduling Results

Pharr: Discussion


Parallelization


Ray independence, load balanced geometry, lazy
geometry loading helps


Will cache results hold in distributed model?


Modern architecture


Testing on 190 MHz MIPS R 10000 w/ 1GB RAM


Can modern arch hold scenes in memory

(no secondary storage usage)


Hardware Acceleration


Use memory/cache/GPU rather than disk/memory/CPU


Chalmers: Parallel Ray Tracing


Demand Driven


Scene divided into subregions, or
tasks


Processors given tasks statically or by a master


Balance with task balancing or adaptive regions

[Fig 3.4]


Data Parallel


Object data distributed across processors


Distribute objects according to spatial locality; a hierarchical
spatial subdivision; or randomly
[Fig 3.7]


Hybrid Scheduling


Run demand
-
driven and data
-
parallel tasks on same processors


DD ray traversal/DP ray
-
object intersect
[Scherson and Caspary 88]


DD intersection/DP ray generation
[Jevans 89]


Ray coherence
[Reinhard and Jansen 99]

Wald: Demand Driven

Ray Tracing

[Wald et al. 01]


Exploit cache and space coherence with modern
processors (Dual Pentium III 800 MHz, 256 MB)


Use SIMD instruction set to achieve data
-
parallelism (e.g., Barycentric coordinate test)


Wald: Performance
[Wald et al. 01]

Wald: Performance
[Wald et al. 01]

Reinhard: Hybrid Scheduling

[Reinhard et al. 99]


Data
-
parallel approach with demand
-
driven
subtasks to load balance


Data
-
parallel tasks preferred, DD subtasks requested
from master when no DP tasks are available


Reinhard: Hybrid Scheduling

[Reinhard et al. 99]

Reinhard: Performance
[Reinhard et al. 99]