Memory Management and Parallelization

reelingripehalfSoftware and s/w Development

Dec 14, 2013 (4 years and 4 months ago)


Memory Management and

Paul Arthur Navrátil

The University of Texas at Austin


Uniprocessor Coherent Ray Tracing

Pharr et al., 1997

Parallel Ray Tracing Summary

Chalmers, et al. 2002

Driven Ray Tracing

Wald, et al. 2001

Hybrid Scheduling

Reinhard, et al. 1999

Background: Reyes
[Cook et al. 87]


Texture cache, CATs

Programmable shader

Single primitive type


Memory effects of
line architecture

Pharr: System

Use both texture and geometry ‘cache’

Lazy loading, LRU replacement

One internal primitive


Optimize ray intersection calculation

Known space requirements to represent

Tessellation of other primitives increases space reqs

Procedurally generated geometry

Pharr: Geometry Cache

Geometry grids

regular grid of voxels

Few thousand triangles per voxel

Acceleration grid

of few hundred triangles for ray
intersection calculation

All geometry of voxel stored in contiguous block of
memory, independent of geometry in other voxels

spatial locality in scene tied to spatial locality in mem

Different voxel sizes causes memory fragmentation

Adaptive voxel sizes?

Voxel size bounded by cache size for hardware impl?

Pharr: Ray Grouping

Scheduling grid


Queue all rays inside voxel

Dependencies in ray tree prevent perfect scheduling

Store all information needed for computation with ray

each ray can be independently calculated (parallelism!)

Exploits coherence from beam of rays, disparate rays
that move through same space

Superior to: fixed
order traversal of ray tree;

ray clustering

Pharr: Radiance Calculation

Outgoing radiance is emitted radiance plus weighted
average of incoming radiances


is bidirectional reflectance distribution function (BRDF)

At intersection, weights calculated for each spawned
secondary ray

Final weight is product of all BRDF values of all surfaces
on path from point on ray to the image plane

Pharr: Voxel Scheduling


iterate across voxels


weight voxels by cost and benefit

how expensive to process the rays in the voxel?

High geometry in voxel has higher cost

Much voxel geometry not in memory has higher cost

how much progress to completion from voxel?

Many rays in voxel yields more benefit

Large weights on rays yields more benefit

Pharr: System Summary

Pharr: Lazy Loading Results

Pharr: Reordering Results

Pharr: Scheduling Results

Pharr: Discussion


Ray independence, load balanced geometry, lazy
geometry loading helps

Will cache results hold in distributed model?

Modern architecture

Testing on 190 MHz MIPS R 10000 w/ 1GB RAM

Can modern arch hold scenes in memory

(no secondary storage usage)

Hardware Acceleration

Use memory/cache/GPU rather than disk/memory/CPU

Chalmers: Parallel Ray Tracing

Demand Driven

Scene divided into subregions, or

Processors given tasks statically or by a master

Balance with task balancing or adaptive regions

[Fig 3.4]

Data Parallel

Object data distributed across processors

Distribute objects according to spatial locality; a hierarchical
spatial subdivision; or randomly
[Fig 3.7]

Hybrid Scheduling

Run demand
driven and data
parallel tasks on same processors

DD ray traversal/DP ray
object intersect
[Scherson and Caspary 88]

DD intersection/DP ray generation
[Jevans 89]

Ray coherence
[Reinhard and Jansen 99]

Wald: Demand Driven

Ray Tracing

[Wald et al. 01]

Exploit cache and space coherence with modern
processors (Dual Pentium III 800 MHz, 256 MB)

Use SIMD instruction set to achieve data
parallelism (e.g., Barycentric coordinate test)

Wald: Performance
[Wald et al. 01]

Wald: Performance
[Wald et al. 01]

Reinhard: Hybrid Scheduling

[Reinhard et al. 99]

parallel approach with demand
subtasks to load balance

parallel tasks preferred, DD subtasks requested
from master when no DP tasks are available

Reinhard: Hybrid Scheduling

[Reinhard et al. 99]

Reinhard: Performance
[Reinhard et al. 99]