3D Benchmarking - Understanding Frame Rate Scores 1. The Impact Of The Platform (Processor, Motherboard/Chipset, System Memory, Graphics Bus Type and Software + 3D Card Driver)

nebraskaslowSoftware and s/w Development

Oct 31, 2013 (3 years and 7 months ago)

56 views

3D Benchmarking
-

Understanding Frame Rate Scores







1. The Impact Of The Platform

(Processor, Motherboard/Chipset, System Memory, Graphics Bus Type and Software
+ 3D Card Driver)










Article Info



3D Benchmarking
-

Understanding Frame Rate Sc
ores

Created:

July 4, 2000

By:

Thomas Pabst

Category:

Graphics Cards

Summary:

Understanding 3D
-
cards is not quite as simple as many people are suggesting. The
first requirement for sensible 3D
-
card evaluation is a good understanding of 3D
-
benchmarks. T
his article about 3D
-
benchmarking is supposed to point out the
bottlenecks of current 3D gaming solutions.


The motherboard with the chipset, the memory, the processor and the PCI or AGP
-
slot can be seen as a unity when you want to benchmark a graphics ca
rd. For
simplicity I will call those components 'the platform' from now on.












The platform is responsible to provide the 3D
-
scene with all its players, objects, light
sources for each frame and it's calculating the 'game AI' as well as any speci
al kind of
motion. The geometry calculations, today called 'transform and lighting', have to be
done either entirely (for cards without T&L) or in parts (for cards with T&L) by 'the
platform' as well. Once a frame is calculated, the vertices and textures n
eed to be sent
to the 3D
-
card, obviously through the bus, which is PCI or AGP 1x, 2x, 4x. The faster
'the platform' is, the more frame data it can send to the 3D card. If 'the platform' is not
fast enough it is stalling the 3D card and thus lowering the fr
ame rate.


What is important to note is that 'the platform' doesn't care whatsoever about the
screen resolution of the 3D game. For 'the platform' it's just the same if Quake 3 runs
at 320x240 or at 1920x1440. The reason why is simple. 'The platform' send
s
VERTICES over to the 3D chip. The relative coordinates of those vertices don't
change with different resolutions.


The system also doesn't care much about the color depth. OK, it does care about it in
terms of memory and bus bandwidth to some extent (es
pecially if 32
-
bit textures
should be used), but this is negligible in most of the 3D benchmarks that are used
right now.


We can conclude, that 3D
-
benchmarks will hardly show any performance change over
the different resolutions and color depths if 'the
platform' is the bottleneck.




This graph shows you how a 3D
-
benchmark looks if 'the platform' should be the
limiting factor. The frame rate won't decrease at higher resolutions or higher color
depth, because the 3D chip is permanently waiting for the 3D
-
data from 'the platform'.
In this case a faster 3D
-
chip won't get you any higher frame rates.


Unreal Tournament has a rather ineffective engine when it comes to the usage of 'the
platform'. Therefore a fast 3D chip will always wait for 'the platform':






You can see that there's hardly any change in frame rate over the resolutions,
particularly not in case of the Celeron 600 system. However, what you can see is that
a faster CPU translates directly into higher frame rates. For Unreal Tournament you
rat
her want to go for a fast CPU than for the fastest graphics card.





The situation is similar if you use a processor that's not fast enough to evaluate
different 3D
-
cards. If the 'platform' is the bottleneck, you will get identical frame rates
with compl
etely different 3D
-
cards. Many 'reviewers' have in this case claimed "the
different cards perform almost the same!", simply because they were using a slow
platform. Make always sure that 3D
-
card evaluations are using a platform that is at
least as fast as
your own! Otherwise the results won't help you at all!


Understanding Frame Rate Scores











Article Info



3D Benchmarking
-

Understanding Frame Rate Scores

Created:

July 4, 2000

By:

Thomas Pabst

Category:

Graphics Cards

Summary:

Understanding
3D
-
cards is not quite as simple as many people are suggesting. The
first requirement for sensible 3D
-
card evaluation is a good understanding of 3D
-
benchmarks. This article about 3D
-
benchmarking is supposed to point out the
bottlenecks of current 3D gaming
solutions.

2. The Impact Of The Fill Rate


After taking care of 'the platform', the 3D
-
card is the only thing left. The 'fill rate'
describes the amount of pixels that a 3D
-
solution can render in a given amount of
time. We all know that a frame consists
of a certain amount of little dots, called
'pixels'. Each screen resolution requires a certain amount of pixels. The common
resolution 640x480 is made of 307,200 pixels, while a high resolution as 1600x1200
requires 1,920,000 pixels. The 3D
-
chip has to 're
nder' each pixel of a frame before the
frame can get displayed. The 'frame rate' is defined as the number of frames that can
be displayed in a certain amount of time. It's easy to see that it requires a lot more
rendering performance to supply a certain fr
ame rate at a high resolution than at a low
resolution. This is why typically 3D cards score high frame rates at 640x480 and
lower frame rates at 1600x1200. After all the 3D
-
chip has to render more than 6 times
as many pixels for each frame at 1600x1200 th
an at 640x480.





Nowadays 3D
-
chips have several rendering pipelines that can operate in parallel.
Such a pipeline is usually able to render one pixel per clock cycle. Thus the maximal
pixel fill rate is the 3D
-
chip clock times the number of rendering pi
pelines times the
number of chips in case that more than one 3D
-
chip is being used on a 3D
-
card. A
typical example would be NVIDIA's new GeForce2 GTS chip, which is clocked at
200 MHz and which comes with 4 rendering pipelines. 4 pixels x 200 million/s = 8
00
million pixel/s. 3dfx's Voodoo5 5500 is clocked at 166 MHz, each chip has two
rendering units and the card comes with two chips. 2 pixels x 166 million/s x 2 = 667
million pixel/s.


Now without taking in consideration triangle size, T&L and hidden surf
ace removal
one can still say that if the fill rate remains constant frame rate will go down as the
resolution goes up. Ideally, you find the highest frame rate at the lowest resolution
and see it coming down continuously as resolution increases.



2. The
Impact Of The Fill Rate, Continued


In most real world applications this isn't the case. Most of the time you will see the
frame rates at the lowest resolutions being almost identical, until the slope finally
begins. This is due to the limitation of 'the
platform' as discussed above. At low
resolutions the 3D
-
chip is stalled because it is able to process data faster than it is
delivered by the platform. This effect gets less as resolution increases, which is one
reason why the slope is usually starting slo
wly.






The next thing you might have seen in the schematic fill rate chart above is that I kept
the frame rate scores at 32
-
bit color at the same level as at 16
-
bit color. This might
appear strange to you, because you would never see this behavior in r
eal world
applications. In fact, from the 3D
-
chip point of view, rendering of frames in 32
-
bit
color is pretty much the same as rendering a frame in 16
-
bit color. As long as the
rendering engine is able to handle 32
-
bit wide data, something that e.g. is no
t the case
of 3dfx's Voodoo3 chip, the pixels can be rendered in exactly the same amount of
time. Thus, as long as an application should only be limited by pure fill rate, the frame
rates at 32
-
bit color should be the same as at 16
-
bit color. Don't ever fo
rget that!


3. The Huge Impact Of The Memory Bandwidth


In the past, and that means up to fairly recently, the memory bandwidth of the local
graphics memory didn't use to be much of an issue. Hardly any 3D
-
chip before
NVIDIA's GeForce256 was ever really l
imited by its memory. When GeForce256
was released in October 1999 it came with SDR memory at 166 MHz clock. The
release of the famous 'GeForceDDR' cards, which were nothing else than the same
chip, but with faster memory, showed how much a fast 3D chip ca
n be stalled by slow
memory. Things have become even worse with NVIDIA's latest high
-
end chip
GeForce2 GTS. 3dfx's latest Voodoo5 5500 card is suffering from the same problem
even a bit harder.






I am showing you this diagram once again to point out un
der how much threat the
local memory of a modern 3D
-
card really is. Each red arrow is stealing a bit more of
the available memory bandwidth.

First of all the local memory hosts the frame buffer, which consists of a front and a
back buffer and in case of t
riple buffering even a third one. Those buffers have
exactly the size of the screen resolution times the color depth. The frame buffer needs
to be accessed by the rendering unit for each pixel several times.

The Z
-
buffer is also as big as the screen resol
ution times the Z
-
buffer depth. It gets
accessed like crazy. You get an idea how hefty Z
-
buffer puts a threat on memory
bandwidth when you realize that Intel added the 'display cache' option to the
integrated 3D
-
graphics of i810, which is only supposed to
host the Z
-
buffer of i810.
This 'display cache'=external Z
-
buffer improves 3D
-
performance of i810
considerably, because the Z
-
buffer is the most accessed part of graphics memory.

Then there is the texture buffer, which holds compressed or uncompressed tex
tures
that can then be accessed faster by the rendering unit than if the rendering unit would
have to fetch it from main system memory through the AGP. Again, textures need to
be read for each pixel several times, depending on the filtering option and the
amount
of textures applied per pixel.

I am not quite in the picture of how much impact a T&L
-
unit has on memory
bandwidth, but you can be sure that it is taking at least a small part of it as well.

Last but not least there is the RAMDAC, which needs to r
ead the front frame buffer to
display it on the screen. The higher the resolution and the higher the refresh rate the
more often the RAMDAC has to access the frame buffer. You might think that this is
not an issue today anymore, but you are sadly mistaken!

A 3D
-
card that is already
limited by its memory bandwidth, such as e.g. a GeForce2 GTS card, reacts extremely
sensitive to high refresh rates. I measured an impact of over 15% at 1600x1200x32
-
bit
color when I switched between 60 and 85 Hz refresh rate. At

lower resolutions it is
still an issue.






3. The Huge Impact Of The Memory Bandwidth, Continued


After you've seen how often the local memory needs to be accessed for each pixel,
you can imagine why the impact of memory bandwidth on frame rates incre
ases as
screen resolution and color depth go up.





At low resolutions and 16
-
bit color the memory bandwidth doesn't usually limit the
chip. However, even at only 16
-
bit color depth and high resolutions the memory
bandwidth does already have a hefty impa
ct on the frame rate, regardless how high
the theoretical fill rate of the 3D
-
chip may be. Things get a lot worse at 32
-
bit color
depth. You will see the frame rate almost halve wherever the memory bandwidth was
already the bottleneck at 16
-
bit color. At 3
2
-
bit color the amount of data that needs to
be transferred between the 3D
-
chip and the local memory doubles almost exactly.
This is why the frame rates at 32
-
bit color are always lower than at 16
-
bit color,
unless there is excess memory bandwidth at 16
-
bi
t color.





This important issue with memory bandwidth has to be kept in mind when reading the
fill rates that are claimed for a chip. People who e.g. overclock a GeForce2 GTS chip
to 250 MHz are telling you complete crap if they claim a fill rate of 1 G
pixel/s. For
this fill rate GeForce2 GTS would require its memory to run at 600 MHz.


Summary





The chart above pretty much shows how an average 3D game benchmark chart would
look. In the low resolutions the platform is limiting the frame rate, keeping
the line
flat in this area. Then at 16
-
bit color the fill rate limitation comes in and at higher
resolutions the frame rate gets another hit by the memory bandwidth limitation. The
scores at 32
-
bit color are lower than the 16
-
bit color scores. At low resol
utions the
difference is only small and at higher resolutions the frame rates at 32
-
bit color are
only half of the scores at 16
-
bit color. The 3D
-
chip is never quite able to deliver its
theoretical fill rate maximum. At low resolutions it's limited by the
platform
performance, waiting for the CPU to deliver the 3D
-
data and at high resolutions the
memory bandwidth limitation makes high fill rates impossible.





The Future ..?


Future 3D
-
chips need much faster memory interfaces if we want high frame rates
at
high resolutions or full scene antialiasing. A chip that can render 2 Gpixel/s will be
stalled permanently if it doesn't get a memory bandwidth of at least 12 GB/s. The
alternatives are solutions that decrease the requirements for memory bandwidth, as
e
.g. Ati's 'hyper
-
Z' technology in the upcoming 'Radeon' chip. Besides that, faster
platforms with faster processors, faster memory (please no RDRAM!) and faster AGP
will help a lot too. However, more memory bandwidth is the most important
requirement for f
uture 3D
-
solutions.