Overcomplicating OpenGL Rendering

boringtarpSoftware and s/w Development

Dec 13, 2013 (2 years and 10 months ago)


Overcomplicating OpenGL Rendering
wxpython, opengl, numpy, pypy
Constructing buffers for OpenGL while keeping the UI responsive and displaying the
intermediate results.
The problem
Solutions I wanted to try
Dependencies (versions used)
Results and Conclusion
The problem
This is intended to be the first of many little steps towards coming up with a way for efficiently handling the drawing
of practically unlimited amounts of geometry with smooth 2D navigation (panning and zooming), progressive
loading from disk and some kind of support for variable level of detail. To begin with, I want to tackle the problem
of constructing buffers in parallel with normal program execution.
Drawing on the screen when working with OpenGL can often mean having to prepare all kinds of buffers to pass to
OpenGL calls. These are usually vertices, colours, normals, texture coordinates, indices or edge flags. Sometimes,
these things are huge and take time to prepare, and the OpenGL calls themselves often take an insignificant amount
of time in comparison. Situations I want to deal with here are when these buffers can't be constructed in advance and
you don't want to disrupt the program's usage in any way while they are being constructed. Also, partial drawings
can be useful to the user in certain situations and it should be possible to display them.
So in short, I want to explore some of the possibilities for efficiently filling buffers in parallel with the normal
program execution with minimal impact on it's responsiveness. For the sake of simplicity, I'll be just filling one
buffer with vertices.
Solutions I wanted to try
We are talking about a desktop application written in wxPython here. Threading comes to mind at once but it will
not always be a good enough solution because of the GIL in python.
The next step is multiple processes. The GIL would not be an issue as each process has it's own, but the problem
becomes interprocess communication. Because these buffers can be rather large I would like to avoid having to copy
them from one process to another all the time, so there is really just one technique to try -- shared memory.
It would be advantageous to be able to use NumPy arrays for these buffers.
The additional processes don't really have to be started with the same python interpreter, so I want to see if using
PyPy is practical.
Page 1 of 5

Dependencies (versions used)

CPython 2.7.2

PyPy 1.8


PyOpenGL 3.0.2a4

NumPy 1.6.1

glutton revision 37
It should be said that glutton is something that I'm just starting to make and it's far from any kind of a proper release.
The code from this experiment is contained in the glutton repository as one of the examples. It is the intention that
the tools needed for accomplishing the stated goal at the beginning of this article ultimately be included in this
Also, the code in this article was only tested this on windows so far.
We can rely on glutton for a canvas to draw on with initialization, navigation and automatic refreshing handled for
from glutton import canvas
from glutton.mixins.navigate2d import Navigate2D
class TestCanvas(canvas.Canvas, Navigate2D):
def __init__(self, parent):
canvas.Canvas.__init__(self, parent, enforce_fps=60)
def init_context(self):
def render_scene(self):
The constructor is overridden to specify the enforce_fps argument because the initialization and running of the
canvas will be delegated to the glutton test runner which won't change the defaults. The init_context will be
executed once after the OpenGL context is created, and render_scene will be called to, well, render the scene
for every frame.
To set up our buffer and a counter for keeping track of how much of it is filled we will use the following code:
from ctypes import c_float, c_int32
import multiprocessing as mp
import numpy as np
self.raw_points = mp.RawArray(c_float, 2 * NUMBER_OF_POINTS)
self.points = np.frombuffer(self.raw_points, np.float32)
self.points_count = mp.Value(c_int32)
self.points_count.value = 0
Page 2 of 5

RawArray is used because we don't need a lock for the buffer. The rendering code in the main process/thread will
read the buffer up to the point it was filled, specified by the point_count counter, and the filling code will only
increase this counter after it is done filling the next chunk of the buffer. The counter itself will be using a lock as it's
updates are not guaranteed to be atomic in the multiprocessing context. Since we are drawing points in 2D we need
two floats for every point, and the frombuffer method from NumPy can be used to initialize a NumPy array
using the shared memory array.
Additionally, the shape of the NumPy array can be changed to make it easier to work with the points using
vectorization techniques from linear algebra.
points.shape = NUMBER_OF_POINTS, 2
To render a frame, we just need to bind and draw our array like this:
def render_scene(self):
glVertexPointer(2, GL_FLOAT, 0, self.points)
glDrawArrays(GL_POINTS, 0, self.points_count.value)
Now, to come up with something to draw. I've decided on drawing the Sierpinski triangle using the Chaos Game
method for it's simplicity and room for an arbitrarily large number of points.
def construct_sierpinski_triangle(points, size, count):
import time
from random import choice
vertices = ((0.0, 0.0), (10.0, 0.0), (5.0, 10.0))
x, y = 5.0, 5.0
start = time.clock()
for i in xrange(0, 2 * size, 2):
vx, vy = choice(vertices)
points[i] = x = (x + vx) / 2.0
points[i + 1] = y = (y + vy) / 2.0
if i % 1000 == 0:
count.value = i / 2
print "time taken:", time.clock() - start
I'm updating the counter only every 500 iterations (you see 1000 in the code because of the step in the iteration is 2)
to avoid it's locking from becoming a bottleneck. Also, I've included a very basic measuring for the time taken to fill
the buffer.
If a NumPy array is wanted here, the same frombuffer method can be used as before to initialize it using the
shared memory. Unfortunately, I found that this method is not yet implemented at the time of this writing for the
PyPy port of NumPy so I gave up on using it for this experiment.
Page 3 of 5

The construction of the Sierpinski triangle is started with the following code:
Worker = mp.Process if MULTIPROCESSING else threading.Thread
self.worker = Worker(
args=(self.raw_points, NUMBER_OF_POINTS, self.points_count)
It is convenient to write code where threads and processes can be used interchangeably as it is significantly simpler
to debug multiple threads than multiple processes. The resulting fractal looks like this:
To make multiprocessing use PyPy for starting a new process I used the set_executable method to specify the
location of the PyPy interpreter.
If any PyPy modules are to be imported by the process started in this way the path to pypy_lib must be
added to sys.path explicitly.
According to the python docs the set_executable function is windows only. On other operating systems using
Listeners and Clients might be an option.
Page 4 of 5

Results and Conclusion
The results were a bit surprising. It turns out that it takes around 3.5 seconds on my computer to fill the buffer with a
1000000 points for every method tried. The significant difference is that the navigation during those seconds is
choppy when using threads but smooth when using multiprocessing. It seems that the interprocess communication
overhead is the main limiting factor and for a simple algorithm like this PyPy doesn't get to shine.
The GIL seemed only to hinder the navigation, not the construction of the fractal. It would be interesting to play with
thread priorities here but unfortunately the python threading module doesn't support it. It is probable that the GIL
would pose more of an obstacle when trying to fill more than one buffer, but even with one buffer the smoother
navigation justifies using multiple processes.
Page 5 of 5