An Overview of the NVIDIA UNIX Graphics Driver

pumpedlessSoftware and s/w Development

Dec 2, 2013 (3 years and 8 months ago)

126 views

An Overview of the

NVIDIA UNIX Graphics Driver

XDevConf, February 8, 2006

Andy Ritger, NVIDIA Corporation

Copyright © NVIDIA Corporation 2004

Contents

Unified Driver Architecture

Driver Components

Features

Direct
-
Rendering Client Interaction with X

Rendering and Scanout Interaction

Video Memory

ABI Compatibility and API Compatibility

Direct
-
Rendering OpenGL+Damage/Composite

Copyright © NVIDIA Corporation 2004

Unified Driver Architecture

Majority of code base for NVIDIA Graphics Drivers
leveraged on all the operating systems NVIDIA
supports:

Windows

Mac OS X

Linux

Solaris

FreeBSD

Everything OS
-
specific or window
-
system
-
specific
abstracted behind OS interface layers

One driver supports all GPUs

Copyright © NVIDIA Corporation 2004

Driver Components

kernel module (nvidia.ko)

X driver (nvidia_drv.so)

OpenGL library (libGL.so)

GLX driver (libglx.so)

OpenGL core library (libGLcore.so)

Copyright © NVIDIA Corporation 2004

Driver Components (cont.)

X server

OpenGL app

libGL.so

libglx.so

libGLcore.so

libGLcore

nvidia_drv

nvidia.ko

kernel

GPU

user space

kernel space

command

buffers

command

buffer

X protocol

shared memory

Copyright © NVIDIA Corporation 2004

Additional Utilities

nvidia
-
installer (only needed on Linux)

nvidia
-
settings

nvidia
-
xconfig

Copyright © NVIDIA Corporation 2004

Features

Hardware
-
accelerated direct and indirect OpenGL

Copyright © NVIDIA Corporation 2004

Features

2 display devices
scanning from same X
screen

One root window:
spanning comes "for
free"

What is DPI? Non
-
rectangular layouts?

TwinView

Copyright © NVIDIA Corporation 2004

Features (cont.)

Not as efficient as
TwinView for
spanning

Solves DPI and non
-
rectangular layout
problems of
TwinView

Can advertise
different capabilities
on each X screen

Multiple X screens on one GPU

Copyright © NVIDIA Corporation 2004

Features (cont.)

Support for OpenGL with Xinerama

OpenGL direct/indirect rendering can span X screens (even
across GPUs)

Important for CAVEs and Powerwalls

Oil & Gas

Copyright © NVIDIA Corporation 2004

Features (cont.)

Configurability

NV
-
CONTROL X extension: dynamically query/modify driver
attributes

nvidia
-
settings is sample NV
-
CONTROL client

Copyright © NVIDIA Corporation 2004

Features (cont.)

Quad
-
Buffered Stereo

OpenGL application renders Left/Right eyes

Driver toggles between eyes on every VBlank

Important for many workstation users, CAVEs

Above: CAVE immersive 3d
environment; stereo images
projected on walls and floor; stereo
images must be in sync across all
projectors.


Right: MRI Brain Visualization

CAVE images courtesy Brown University ~ http://graphics.cs.brown.edu/research/cave/home.html

Copyright © NVIDIA Corporation 2004

Features (cont.)

RGB/CI Workstation Overlays

16
-
bit RGB overlay

8
-
bit CI overlay

Rendering in overlay does not damage content in main
plane

Useful for user interface in overlay, complex rendering in
mainplane

Useful for legacy applications that require different depths

Used by workstation applications such as Maya

Copyright © NVIDIA Corporation 2004

Features (cont.)

FrameLock

Lock together scanout of displays across a cluster

OpenGL SwapBuffers Locked together

Important for CAVEs and powerwalls

ORNL visualization expert

Jamison Daniel uses the

EVEREST powerwall to

display data from a large

scale climate simulation.

Image courtesy of ORNL

Copyright © NVIDIA Corporation 2004

Features (cont.)

SDI

Serial Digital Interface: video format used in digital
broadcast industry

GPU sends data to SDI in 8, 10, or 12
-
bit per component

Copyright © NVIDIA Corporation 2004

Features (cont.)

SLI

Multiple GPUs drive one X screen

Alternate Frame Rendering (AFR)

Split Frame Rendering (SFR)

SLI AntiAliasing (SLIAA)

GPU

GPU

CPU

Chipset

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X

Motivation for Direct Rendering:

Avoid IPC overhead

Avoid moving large quantities of data between client and
server (e.g., OpenGL textures)

Avoid making GLX protocol requests for every OpenGL API
call (e.g., glVertex3f() millions of times per frame)

When OpenGL application is on same system as X server,
performance benefit to bypass GLX protocol

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Hardware
-
acceleration vs direct
-
rendering:

Hardware
-
acceleration: using GPU to perform some or all of
the OpenGL rendering pipeline

Direct rendering: by
-
passing GLX protocol and OpenGL
library renders directly to the hardware

Server
-
side must coordinate with OpenGL client library
for:

Data propogation

Synchronization

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

What data needs propogating?

Drawable's geometry

Drawable's cliplist

Other Drawable attributes:

SwapInterval

AntiAliasing

SyncToVBlank

etc...

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Control flow:

NVIDIA X driver pushes current drawable state into a
shared memory segment

OpenGL direct
-
rendering runs asynchronously to X
server

When OpenGL performs operation that must be up
-
to
-
date wrt window system, checks that it has current
drawable data

If stale, OpenGL retrieves current data from shared
memory and updates internal state

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Synchronization needed to ensure integrity of
drawable data in shared memory

Synchronization also needed to ensure correct
ordering of GPU commands issued by each driver (X,
each instance of OpenGL)

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Traditional GPUs: 1 command buffer:

Shared by all driver components

Synchronization needed to protect shared buffer

NVIDIA GPUs: multiple command buffers:

One command buffer for each OpenGL client, one for X
driver

Hardware context switches between command buffers

No need to negotiate shared command buffer

Instead, need to manage sequencing of GPU commands

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Why is sequencing important?

Consider moving an animating OpenGL window that
is clipped

Operations performed:

OpenGL SwapBuffers: blit back
-
>front per cliprect

X driver: blit from old position to new position per cliprect

Must make sure all outstanding OpenGL rendering is
complete and has reached the framebuffer before X's
blit commands are processed by the GPU

Copyright © NVIDIA Corporation 2004

Direct
-
Rendering Client Interaction
with X (cont.)

Inter
-
commandbuffer synchronization; driver
-
specific
problem to solve

Important concept whenever one client's rendering is
read by another client:

Direct
-
rendering OpenGL clients rendering to redirected
windows

X rendering to pixmaps that are used as OpenGL textures
(GLX_EXT_texture_from_pixmap)

Copyright © NVIDIA Corporation 2004

Interactions Between Rendering and
Scanout

Flipping vs Blitting

Blit: memcpy

Flip: change what portion of video memory is
scanned out

Flipping is faster, easier to sync to VBlank

Copyright © NVIDIA Corporation 2004

Interactions Between Rendering and
Scanout (cont.)

To flip while an OpenGL application is in a window:

Create a second copy of the desktop

Only the content within the OpenGL window is different

Flip between copies of the desktop

Requires keeping the desktop in sync between the two
copies

Copyright © NVIDIA Corporation 2004

Interactions Between Rendering and
Scanout (cont.)

Quad
-
Buffered Stereo:

Flip between Left/Right eyes every Vblank

Swaps can be done with either blit or flip

Copyright © NVIDIA Corporation 2004

Interactions Between Rendering and
Scanout (cont.)

Ideally, rendering and scanout would be orthogonal

In practice, they are not:

OpenGL needs to control when and where to flip

SyncToVBlank

Video Memory allocation/configuration may depend on
whether surface will be scanned out

Filtering for AA through scanout

SLI (SFR, AFR, SLIAA)

Frame delivery for video:

Time
-
sensitive

Driver needs precise control of frame display

Best accomplished with flipping

Copyright © NVIDIA Corporation 2004

Video Memory

Most modern NVIDIA GPUs are packaged with large
quantities of video memory

However:

Not all video memory is CPU mappable; SBIOSes limit how
much can be mapped to the CPU

Some GPUs support rendering to system memory over PCI
-
E bus

CPU mappable

GPU rendering is slower than to native vidmem

Layout of video memory may not be linear

Organization of bits within video memory optimized for
rendering and texturing

acquiring a linear CPU mapping may require sacrifices

Copyright © NVIDIA Corporation 2004

Video Memory (cont.)

Many attributes to the video memory

Selecting the optimal placement of data in the correct
memory space is non
-
trivial

Placement heuristics perform best when driver has
knowledge of how that data is going to be used

Copyright © NVIDIA Corporation 2004

ABI and API Compatibility

NVIDIA provides one X driver binary used in all X
servers since XFree86 4.0

This is accomplished through:

ABI compatibility

Dynamic loading of symbols

We understand that ABI compatibility needs to be
broken, and we can work with that

Copyright © NVIDIA Corporation 2004

ABI and API Compatibility (cont.)

However, here are a few suggestions:

Breaking ABI compatibility painful for anyone distributing a
driver separately from the X server tree (will that be more
common with the Modular X tree?)

To minimize pain, break ABI infrequently and only when
absolutely necessary

Add new entry points and deprecate old entry points, rather
than change old entry points, to give opportunity to phase
in driver support

Copyright © NVIDIA Corporation 2004

ABI and API Compatibility (cont.)

More Suggestions:

Update ABI version number appropriately

ABI version querable at install time and run time

Minimize number of incompatible ABI versions: minimize
number of driver versions to distribute

If there are several ABI breakages pending, get them all out
of the way at once

If ABI is going to be broken anyway, update APIs when
appropriate (Xv, Glyph management)

Copyright © NVIDIA Corporation 2004

OpenGL + Damage/Composite

Direct
-
rendering GL+Damage/Composite:

Clients aware that drawable has been redirected

Clients notify X when drawable is damaged

Clients and X drivers to handle synchronization:

Do not use direct
-
rendering content as source for
compositing operation until direct
-
rendering content has
reached framebuffer (tricky if direct
-
rendering client and
composite manager's rendering are in separate GPU
command buffers)

Or, do not notify X server of direct
-
rendered damage until
rendering has reached framebuffer; but this increases latency

The Synchronization Problem to be discussed later

Copyright © NVIDIA Corporation 2004

OpenGL + Damage/Composite

(cont.)

Compositing overhead will be substantial for direct
-
rendering clients, especially for applications with a
high framerate

Important that users can disable compositing when
they want:

Full OpenGL performance

Features that may not be possible with Composite:

Workstation Overlays

Quad
-
Buffered Stereo

Copyright © NVIDIA Corporation 2004

OpenGL + Damage/Composite

(cont.)

All the building blocks are here for OpenGL
implementors to support direct
-
rendering OpenGL
with Damage and Composite

Demo of NVIDIA direct
-
rendering OpenGL with
Damage and Composite; will be available in nvr85
-
series drivers

Copyright © NVIDIA Corporation 2004

Conclusion

NVIDIA Driver has many features important to our
users

Overview of direct
-
rendering client/X driver
interaction

Data Propogation

Synchronization

Rendering and Scanout Interaction

Video Memory

ABI and API Compatibility

Direct
-
rendering OpenGL + Damage/Composite

Copyright © NVIDIA Corporation 2004

Questions?

http://developer.nvidia.com/object/

xdevconf_2006_presentations.html