RS/GIS Python programming with Open Source libraries

adventurescoldΛογισμικό & κατασκευή λογ/κού

7 Νοε 2013 (πριν από 4 χρόνια και 1 μήνα)

246 εμφανίσεις

RS/GIS Python programming with Open Source libraries
Libraries
Libraries are collections of reusable bits of code. That’s the idea behind DLLs (dynamic link
libraries) on Windows machines. These files just contain functions that other Windows
applications can access and use. We can extend this same idea to Python. Libraries in Python
are usually called modules. They can contain a variety of things, including functions, constants,
and classes. We’ll talk a little more about modules later, but for now we’ll just introduce some
of the modules that we’ll be using for this part of the workshop.
My favorite Open Source RS/GIS libraries are GDAL (Geospatial Data Abstraction Library) and
the OGR Simple Features Library. Both were originally written by Frank Warmerdam but other
people help maintain them now. GDAL is used by several commercial software programs,
including ArcGIS (if you don’t believe me, search for gdal*.dll on your C: drive and you’ll find
gdal DLLs in your ArcGIS\bin directory). They are actually C++ libraries, but bindings exist for
several different languages, including Python. This means that someone has gone to the trouble
to create a Python module that can make use of the C++ libraries. These libraries also come with
several useful command line utilities that won’t be covered in this workshop (but maybe I should
do a short brown bag on them sometime, because they’re really useful). We will use GDAL for
raster data access and OGR for vector data.
We’ll also be using the Numeric module for Python. Numerical Python allows for sophisticated
array manipulation in Python (think IDL, for those of you who’ve had Mike’s class).
Installation
The main website for both packages is at
http://www.gdal.org
. There are a couple of ways to get
them for Windows, all of which are available from the Downloads link on the above site (the real
address is
http://trac.osgeo.org/gdal/wiki/DownloadingGdalBinaries
). Personally, I like
FWTools because you get all kinds of goodies with it. However, I haven’t been able to get
PythonWin to work with this, so this is not a good option if you want to use PythonWin. If you
do want to use FWTools, you can write your Python code in any text editor but you must run it
from the FWTools shell. I’m partial to the free Crimson Editor available from
http://www.crimsoneditor.com/
.
If you do use Crimson Editor, you can configure it so that it runs Python code from inside of it,
but you still do not have access to a debugger like you have in PythonWin. To set up Crimson
Editor to run Python code using the FWTools install, follow these directions after installing both
FWTools and Crimson Editor:
1. Create a file called runpython.bat in your FWTools directory (C:\Program
Files\FWTools1.3.4 on my system). Put this code in the file, changing the directory
names as needed:
call "C:\Program Files\FWTools1.3.4\setfw.bat"
python %*
2. Choose Conf. User Tools… from the Tools menu in Crimson Editor.
3. To create a tool that doesn’t request the user to enter parameters:
a. Highlight one of the Empty tools.
b. Put whatever you want in the Menu Text section (I used “Run with FWTools
Python”).
c. Put “C:\Program Files\FWTools1.3.4\runpython.bat” in the Command section (or
whatever your path to runpython.bat is).
d. Put “$(FileName)” in the Argument section.
e. You can set the Initial Dir and Hot Key values if you want, but I didn’t bother.
f. Check the Capture output box.
4. To create a tool that requests parameters from the user:
a. Highlight one of the Empty tools.
b. Put whatever you want in the Menu Text section (I used “Run with FWTools
Python and Arguments”).
c. Put “C:\Program Files\FWTools1.3.4\runpython.bat” in the Command section (or
whatever your path to runpython.bat is).
d. Put “$(FileName) $(UserInput)” in the Argument section.
e. You can set the Initial Dir and Hot Key values if you want, but I didn’t bother.
f. Check the Capture output box.
5. Hit OK.
6. Now there should be options to run Python scripts in the Tools menu in Crimson Editor.
If you want to use PythonWin, you need to follow these directions. I am not sure if this breaks
things for ArcGIS, however (Python doesn’t work correctly with ArcGIS on my PC, and I’m not
sure why).
1. The first thing to do is make sure you have Python 2.4 (FYI, that is the version installed
with ArcGIS 9.2).
2. Install the Numeric module for Python 2.4, which is available from
http://sourceforge.net/project/showfiles.php?group_id=1369
. Make sure you get Old
Numeric for Python 2.4.
3. Grab Python_GDAL-1.4.2.win32-py2.4.exe and gdalwin32exe142.zip from
http://download.osgeo.org/gdal/win32/
.
4. Install the Python GDAL module (Python_GDAL-1.4.2.win32-py2.4.exe).
5. Extract the contents of the bin directory from gdalwin32exe142.zip into your Python site-
packages directory (on my system that was C:\Python24\Lib\site-packages). The files
should just be in the site-packages directory, not in a bin subdirectory.
The OGR libraries are a subset of GDAL, so if you install GDAL you also have OGR.
Documentation
The basic Python documentation can be found at
http://www.python.org/doc/
. Documentation
for GDAL is available at
http://www.gdal.org/
(see the links for API Reference Documentation,
GDAL API Tutorial, etc. under the Developer Oriented Documentation heading). Similar
documents for OGR are available under the Resources heading at
http://www.gdal.org/ogr/
. The
Numeric module is described at
http://numpy.sourceforge.net/numdoc/HTML/numdoc.htm
and
the numdoc.pdf file available in the workshop directory.
Sometimes the best way to get documentation for GDAL and OGR is to just go look at the
Python files themselves. I do this a lot – just open the gdal.py or ogr.py file in your FWTools
pymod or Python site-packages directory (see the Installation section) and look through it. Just
don’t make any changes to it unless you really know what you’re doing! This is helpful because
much of the documentation on the website is written for C++ and is sometimes slightly off for
the Python version.
Objects
Python can be used as an object oriented language. This allows you to do a ton of interesting
things that are way beyond the scope of this workshop. However, since the code we wrote
yesterday and what we’ll be writing today is object-based, it helps to understand what objects
are, at least a little bit. This is not by any means a complete description of objected oriented
programming (OOP) – it barely even scratches the surface of what an object is and ignores
everything about how to actually program that way.
Think of objects as really complicated variables. A programmer can create new classes, which
are basically new definitions for these complicated variables. In other words, a class is
essentially an user-defined variable type, albeit a rather complicated one. An object is a variable
of some type of class. Just like you can have a variable of type integer, you can have a variable
of type Geoprocessor (like you did yesterday). Your Geoprocessor variable is called an object.
Objects can contain properties and methods (both of these are called members). Properties are
pieces of information about the object, and methods are functions and procedures that act on the
object in some way. For example, say we define a class called polygon. There might be a
property called area. This is just a bit of info that the user can access. This particular example
would be read-only (it wouldn’t make sense to allow the user to change the area of the polygon
without actually changing its geometry) but many properties are read/write. There also might be
a method in our polygon class called buffer() that would allow the user to buffer the polygon by
some distance. This is an example of a method that acts on the object in some way – it buffers it.
The user could have a bunch of different instantiations of the polygon class, with each holding a
different polygon.
Another OOP concept that we will run into today is the idea of inheritance. Basically this means
that there can be classes that derive some of their properties and methods from another class. For
example, OGR has a class called Geometry includes lots of subclasses, if you will. Some of
these other classes are Point and Polygon. This means that if you have a Point object, it is also a
Geometry object. There are many advantages to this design but they are beyond the scope of this
workshop. Just be aware that the concept exists so that you don’t get too confused.
In python, the members of objects are accessed through dot notation, meaning the property or
function is preceded by the object name and a dot. For example, if our polygon variable was
called poly1, then we could access the area of the polygon using poly1.area and we could buffer
the same polygon by 5 units using poly1.buffer(5). We could also have another polygon variable
called poly2, and then poly1.area and poly2.area would return different values (unless, of course,
they were both set equal to the same geometry!).
Although we won’t be doing any actual object-oriented programming today (for example, we
won’t be defining our own classes), we’ll be using objects in our code.
Importing Python modules
Before we can use a Python module we need to import it. We can import an entire module and
then qualify anything inside it (functions or constants) with dot notation and the module name,
just like with objects. This is an example of importing the entire gdal and gdalconst modules and
then using the Open function from gdal along with the GA_Update constant from gdalconst:
import gdal, gdalconst
dataset = gdal.Open(filename, gdalconst.GA_Update)
We can also import the entire module so that we don’t have to qualify members with the module
name, like this:
from gdal import *
from gdalconst import *
dataset = Open(filename, GA_Update)
Or we can import a subset of a module like this:
from gdal import Open
from gdalconst import GA_Update
dataset = Open(filename, GA_Update)
I think the first example is the best way to do things in most cases because that way we are
forced to make sure we’re using the correct module (for example, say we had imported another
module that also had a method called Open() – how would Python know which Open() we meant
to use unless we told it?).
However, sometimes it’s nice to mix import methods. In this example, we force ourselves to
qualify any methods in the gdal module, but we don’t have to qualify constants from the
gdalconst module (they all start with GA_ anyway, so are probably unique).
import gdal
from gdalconst import *
dataset = gdal.Open(filename, GA_Update)
We only need to qualify module members with the module name if the member is not dependent
on a preexisting object. The Open() method that we’ve seen here fits that description. However,
the method to get a band from the opened image is dependent on a previously opened dataset.
Because the dataset already exists and was created with the gdal.Open() call, Python already
knows that the dataset is part of the gdal module. So to get a band from the dataset, we would
have to qualify the GetRasterBand() function with the dataset we wanted it to operate on rather
than the gdal module name:
band = dataset.GetRasterBand(1)
Reading vector data with OGR
Now that we know some of the background, let’s learn about using the libraries. We’ll start with
reading the points in a shapefile. The first thing we need to do in order to open a vector dataset
is to get the appropriate ogr Driver object. The driver is simply an object that knows how to
interact with a certain file type. We’ll use the shapefile driver.
driver = ogr.GetDriverByName('ESRI Shapefile')
We can call our new Driver object variable whatever we want, but it is good practice to give it a
name that describes what it is, such as driver. It is possible to open an OGR data source without
getting the driver first, but it’s probably good to get into the habit of using the driver, especially
since it’s required for things like creating data sources.
There are a bunch of other file format drivers available which are described at
http://www.gdal.org/ogr/ogr_formats.html
. Because the names in this list do not always match
the name to use when fetching the driver in Python, we can get a list of available drivers with
this little Python program:
import ogr
cnt = ogr.GetDriverCount()
for i in range(cnt):
driver = ogr.GetDriver(i)
print driver.GetName()
The range function is built into Python. It creates a list of numbers from 0 to cnt-1. There are
actually a lot of other variations of its use and we’ll see some of them later.
Once we have the driver we can open the shapefile. The second argument is optional but it
doesn’t hurt to include it, especially since then we don’t have to remember what the default
behavior is. In this case the default behavior is to open the file for read-only. Pass a 1 in order to
edit the file.
datasource = driver.Open('d:/data/brownbags/python/data/sites.shp', 0)
if datasource is None:
print 'Could not open file'
sys.exit(1)
It’s always a good idea to check and make sure that the file was actually opened. If the file was
opened successfully, Open() will return a DataSource object, otherwise it will return None. Note
that we have to import the sys module before we can use the exit() method. It is convention to
return a 1 if the program is exiting because of an error. If the program exits normally, the
convention is to pass a 0. This is not just Python convention – it is widespread, and although I
don’t know about Windows, I do know of UNIX programs that rely on this standard.
Don’t get hung up on slash problems in filenames! I like to use forward slashes, but if that’s too
UNIX-like and you insist on using the Windows backslashes, use two of them, like
'd:\\data\\brownbags\\python\\data\\sites.shp'. This is because backslashes are used as escape
characters so that non-printing characters can be inserted into strings (for example, '\n' is a
newline character). Using two of them basically escapes the escape character so that Python
knows to treat it as a regular character. FYI, this is not unique to Python, either.
Now that we have our DataSource object, we need to get the data layer. Most of us are probably
not used to thinking of vector data this way, but think of it sort of like a raster image that can
have multiple bands. Some types of OGR data sources can have multiple layers, even though
shapefiles can’t. The GetLayer() function takes an index of the layer we want returned, with the
first layer indexed as 0. Because we are using a shapefile, there is only one layer and so the
index is 0. The GetLayer() function returns an object of type Layer.
layer = datasource.GetLayer(0)
Now that we have our layer, we can loop through its features with syntax like this:
feature = layer.GetNextFeature()
while feature:
# do something here
feature = layer.GetNextFeature()
The GetNextFeature() function returns the next Feature object in the Layer, starting with the first
one. If there are no more features then it returns None, so this loop executes until the last
GetNextFeature() call returns None and the while loop exits. We can reset it so the
GetNextFeature() function will return the first feature in the layer again with the ResetReading()
method.
For example, we could count the number of features in the layer (if for some reason we didn’t
want to use the GetFeatureCount() function!) like this:
cnt = 0
feature = layer.GetNextFeature()
while feature:
cnt = cnt + 1
feature = layer.GetNextFeature()
print cnt
We can get the geometry for a Feature object with the GetGeometryRef() method. This returns a
Geometry object, although it could be one of several different subtypes of Geometry (Point,
Polygon, etc). Point geometries allow us to get their coordinates with the GetX() and GetY()
functions.
geometry = feature.GetGeometryRef()
x = geometry.GetX()
y = geometry.GetY()
We can get the attributes for a Feature object with GetField(), GetFieldAsString(),
GetFieldAsInteger(), and GetFieldAsDouble(). All of these return the value of the field, but they
can be used to force the returned value to a specific type. For example, if we had numeric ID
values but wanted them returned in a string format so that we could print them out easily, we
might use GetFieldAsString(). To simply return an attribute value in its native type, we would
use something like this:
id = feature.GetField('id')
We need to make sure that we close a data source when we’re done with it. This is done by
calling the Destroy() method on the DataSource object:
datasource.Destroy()
It also doesn’t hurt to do this when done with a Feature object (for memory management
purposes). The above example to count features, in its entirety, would look like this:
# import modules
import ogr, sys

# open the data source
driver = ogr.GetDriverByName('ESRI Shapefile')
datasource = driver.Open('d:/data/brownbags/python/data/sites.shp', 0)
if datasource is None:
print 'Could not open file'
sys.exit(1)

# get the data layer
layer = datasource.GetLayer(0)

# loop through the features and count them
cnt = 0
feature = layer.GetNextFeature()
while feature:
cnt = cnt + 1
feature.Destroy()
feature = layer.GetNextFeature()
print cnt

# close the data source
datasource.Destroy()
Exercise 1: Reading coordinates and attributes from a shapefile
Use what you’ve learned to open the sites.shp point shapefile, loop through all of the points and
print out the id, x and y coordinates, and cover type for each point.
Hint: You can print more than one item on a line by separating them with commas, like this:
print id, x, y, cover
Writing data with OGR
Next we’ll learn how to add a new point to an existing shapefile. Before we can write the point
to a shapefile, we need to create a new point feature to hold the point geometry. The first step is
creating this geometry. This example creates a new object called point that is of type Geometry.
Moreover, we specify that it is the Point subtype of Geometry. The Geometry is empty until we
add coordinates to it on the second line.
point = ogr.Geometry(ogr.wkbPoint)
point.SetPoint(0,10,10)
Notice that we don’t depend on any existing objects to create a new Point Geometry on the first
line. We can have a point that isn’t associated with anything else. The first argument to the
SetPoint method is the index of the point that we’re setting X and Y for. Since our geometry is a
single point, this will always be 0. It might be different if we were using multipoints. The
second argument is the X coordinate, and the third one is Y. We can also add a fourth Z
parameter if we need to (it defaults to 0). This example would add a new point at the coordinates
(10, 10).
Now that we have our point, we need to create a Feature object for it to go in. In order to create
a new feature, ogr needs to know what type of feature it is and what attributes it will have. We
provide this information with a FeatureDefn object. We could create this object ourselves, but
because we plan on adding our feature to an existing shapefile (which can only have one type of
feature), we might as well just use the FeatureDefn that describes the features in the existing file.
featureDefn = layer.GetLayerDefn()
feature = ogr.Feature(featureDefn)
The first line gets the FeatureDefn object that applies to the entire layer (there is also a
GetFeatureDefn() function that works on a feature – in this case they would both return the same
thing). The second line creates a new Feature object using this FeatureDefn.
Now that we have a new Feature object, we can add stuff to it. The most important part of the
feature is the geometry, so let’s add our Geometry object:
feature.SetGeometry(point)
We also need attributes. This would set the id field to a value of 50:
feature.SetField('id', 50)
Once all geometries and attributes are added to the feature, it needs to be added to the layer. We
use the CreateFeature() method on the Layer object in order to do this:
layer.CreateFeature(feature)
Exercise 2: Writing a new point to a shapefile
Add a new point with an id value of 50 and cover type of water to the sites shapefile at the
coordinates (472000, 4646000). This point is in the middle of Bear Lake so it’s easy to use
ArcMap to check if it worked.
Hint: Remember some of the things we learned in the section about reading shapefiles,
especially about how to open the shapefile for updating instead of read-only. You will access the
data layer the same way you did when reading data.
Reading raster data with GDAL
It is not hard to simply read raster data using GDAL. The first difference between using GDAL
and OGR is that we need to import two modules in order to work with raster. The gdal module
does the hard work, but the gdalconst module includes the constants that are required for some
things. Because the constants are named with the GA_ prefix they are probably unique so we
can import that module so that we don’t have to qualify the constants with the module name.
import gdal
from gdalconst import *
Just like with vector, we need to get a driver before we open a file. The gdal module does not
automatically load all of its drivers, however, so we also need to register a driver before trying to
use it. Basically, this just forces the module to make the driver useable. The supported image
formats are described at
http://www.gdal.org/formats_list.html
and there is an associated driver
for each one. To register a driver, we get the Driver object by using the GetDriverByName()
function and then calling the Register() function on the driver. The GetDriverByName() method
requires a driver name, which is available from the website listed above. Use the short driver
name (code) from the table on the webpage. The short name for Erdas Imagine files is HFA (for
Hierarchical File Architecture).
driver = gdal.GetDriverByName('HFA')
driver.Register()
We can also register all drivers at once with the AllRegister() method, but we still need to grab a
specific driver in order to create a new raster data set.
gdal.AllRegister()
Once the driver is registered we can open a raster image using the Open() method. This requires
a filename and a flag telling gdal how to open the file. This value is either GA_ReadOnly or
GA_Update, depending on if we need to edit the file. The Open() method returns a Dataset
object if the file was successfully opened and None if not.
dataset = gdal.Open('d:/data/brownbags/python/data/aster.img', GA_ReadOnly)
if dataset is None:
print 'Could not open file'
sys.exit(1)
Once an image is opened, we can get its dimensions like this:
cols = dataset.RasterXSize
rows = dataset.RasterYSize
bands = dataset.RasterCount
Notice that the various dimensions are properties of the Dataset object. We can access them
directly rather than using a method to get them (notice they don’t use parentheses – that’s a good
sign that something is a property instead of a method).
The GDAL library uses something called a GeoTransform to define the georeferencing of an
image. In Python, this is implemented as a list of values that we can access with the
GetGeoTransform() method. From the GDAL documentation:
adfGeoTransform[0] /* top left x */
adfGeoTransform[1] /* w-e pixel resolution */
adfGeoTransform[2] /* rotation, 0 if image is "north up" */
adfGeoTransform[3] /* top left y */
adfGeoTransform[4] /* rotation, 0 if image is "north up" */
adfGeoTransform[5] /* n-s pixel resolution */
Python lists are indexed so that the first item in the list has index 0, so the first item in the
GeoTransform list is the longitude of the top left corner of the image. We can get the top left
coordinates using the first and fourth entries (indices 0 and 3), and we can get the pixel size from
the second and sixth entries (these are often the same number). The rotation parameters are more
complicated and not used as often, so we will ignore them in this workshop.
One important thing to understand is that the coordinates provided with the GeoTransform are
for the upper left corner of the upper left pixel. This is different from Erdas Imagine, which
provides the coordinates for the center of the upper left pixel.
Using the above information, we could get the origin
and pixel sizes like this:
geotransform = dataset.GetGeoTransform()
originX = geotransform[0]
originY = geotransform[3]
pixelWidth = geotransform[1]
pixelHeight = geotransform[5]
We can use this information and some simple algebra
to find the row and column offsets that match up with
a set of coordinates x, y (Figure 1).
xOffset = int((x – originX) / pixelWidth)
yOffset = int((y – originY) / pixelHeight)
Now that we have these offsets we might want to
know the pixel value at that location. First we have to
get the band(s). This is done by passing the index of
the desired band to the GetRasterBand() method on
the Dataset object. So we can get the first band in the image like this (note that the indices for
this method start with 1):
band = dataset.GetRasterBand(1)
Now we have a Band object but we still need to get the actual data out of it. There are a few
ways to do this, but the simplest way is to use the ReadAsArray() method. We’ll start with just
reading the one pixel we’re interested in.
data = band.ReadAsArray(xOffset, yOffset, 1, 1)
The first two parameters are the offsets to start reading at, and the last two parameters are the
numbers of pixels to read in the X and Y directions. Since we only want one pixel, we used
values of 1. The return value is actually a two-dimensional Numeric array, although in this
pixelHeight
pixelWidth

(originX, originY)
x

y

Figure 1. Looking at the figure, we see:

(x – originX) / pixelWidth ~= 3.25
(y – originY) / pixelHeight ~= 2.5

Take the int values and we get (3,2)
, which
is correct. Remember that the upper left
pixel has an offset of (0,0).

example the array only contains one value. Once again, the indices for the array start at 0, so in
order to get the one and only value in the array we request the first item in either direction:
value = data[0,0]
Set a dataset to None in order to close it once done with it. The Dataset class is designed to close
the image once this happens.
dataset = None
Exercise 3: Reading pixel values from an image
Read and print the pixel values for all three bands from aster.img at the coordinates (472000,
4646000). Your results should be something like this:
band 1: 53
band 2: 24
band 3: 15
Reading large amounts of raster data efficiently with GDAL
We saw in the last section how to read a single pixel. That’s great if all we need is the
occasional pixel value, but it’s not very efficient if we want to do some sort of processing on the
entire image. Reading and writing is expensive and reading one pixel at a time is about as
inefficient as we could get. For example, I tested this with a 100x100 pixel image on the PC in
my office. It took less than 0.01 seconds to read the entire image in at once, but it took it 0.29
seconds to read the same image in pixel by pixel. I then tried running the same script on
aster.img, but got tired of waiting for it to run and killed it (it took less than 4.5 seconds to read it
in all at once, but was taking several minutes to do it pixel by pixel).
To read an entire image in at once we use the same ReadAsArray() function that we used before,
but we pass different parameters:
data = band.ReadAsArray(0, 0, cols, rows)
This time our offsets are both 0 so that the read operation will start at the origin, and we set the
number of pixels to read in the X direction equal to the number of columns, and the number of
pixels to read in the Y direction equal to the number of rows. This will get us the entire image,
and it’s really simple to do. At this point, we could get the data for a specific offset like this:
value = data[yOffset, xOffset]
Notice that the order of the X and Y offsets are reversed! In a two-dimensional Numeric array,
the row offset (Y) is provided first. This is because it uses the mathematical matrix notation of
[row, column].
So reading the entire image at once is fairly efficient, and certainly easy, but if we have a large
image we might not be able to fit the entire thing into RAM. Or even if the image will fit, we
might not have enough left to do any sort of manipulation (I’ve run into this particular problem
more than once.). How do we read data in as fast as possible while conserving resources such as
RAM? Well, some of you may have noticed a “blocksize” value in the Image Information
window in Imagine. Now we finally have a use for that information. Block size has to do with
the way the data values are stored on disk. The most efficient way to access data is by blocks.
In many cases (like non-tiled GeoTIFFs) this is simply a row. However, the default block size in
Imagine is 64x64 pixels, which is a little harder to deal with, but it still doesn’t have to be that
bad.
Let’s look at the case of reading in data one row at a time, from an image such as a GeoTIFF:
for i in range(rows):
data = band.ReadAsArray(0, i, cols, 1)
# do something with the data here, before reading the next row
Each iteration of this loop sets i to a new value, which is then used for the Y offset for reading
the data. The X offset is always 0, meaning we start at the beginning of the row. In the X
direction, we always read in the number of columns in the image, which is equal to the number
of pixels in the row. We only read in one pixel in the Y direction, which means we only get one
row. This is probably the most efficient way to deal with untiled GeoTIFFs but will not be very
fast for a standard Imagine image.
Now let’s look at reading a tiled image, like an Imagine image.
This method is not the best, but it’s simpler than the one we’ll
look at next. We’ll use a block size of 5 for the example so it
corresponds to Figure 2. We’re still going to read in whole rows,
but we’ll read in an entire block in the Y direction. Using Figure
2 as an example, we’ll be reading in all of the data contained in
group A (Aa, Ab, and Ac) at once. Then we’ll read in all of the
data for group B, and lastly the data for group C. The code would
look like this:
for i in range(0, rows, 5):
if i + 5 < rows:
size = 5
else:
size = rows - i
data = band.ReadAsArray(0, i, cols, size)
# do something with the data here, before reading the next set of blocks
This example has some new syntax for the range() function. This creates an array of values from
0 to rows, but incrementing by our blocksize of 5. So if rows = 13, this call to range() would
return an array of [0, 5, 10]. The next value in the sequence would be 15, but that is larger than
13 so it isn’t included. The first thing we do inside the main loop is to check and see if we really
can read in 5 rows. If we can’t (when i=10, because 10+5=15 > 13), then we set our chunk size
to the number of rows we really can read (3 when i=10, for group C). This is necessary because
if we ask ReadAsArray() to read in rows that don’t exist in the image it will throw an error.
In terms of our real image, with a block size of 64, the code would look like this:
Figure 2. An image with 13
rows, 11 columns, and a blocksize
of 5.

a b c
A
B
C
for i in range(0, rows, 64):
if i + 64 < rows:
size = 64
else:
size = rows - i
data = band.ReadAsArray(0, i, cols, size)
# do something with the data here, before reading the next set of blocks
The best way to write the code, however, in case we ever need to change the block size, would
be more like this:
blockSize = 64
for i in range(0, rows, blockSize):
if i + blockSize < rows:
size = blockSize
else:
size = rows - i
data = band.ReadAsArray(0, i, cols, size)
# do something with the data here, before reading the next set of blocks
This method is a reasonable compromise that will work with tiled (as long as the block size is
64) or untiled images. The fastest way to read the data is to read in just one block at a time
instead of a group of blocks, however. We did this when we read in just one row at a time for a
GeoTiff. It is a little more complicated with an Imagine image, though. Referring back to
Figure 2, this would mean reading in block Aa, Ab, Ac, Ba, Bb, etc., all the way up to Cc. Each
one would be read in individually. This is how it’s done:
blockSize = 64
for i in range(0, rows, blockSize):
if i + blockSize < rows:
numRows = blockSize
else:
numRows = rows – i
for j in range(0, cols, blockSize):
if j + blockSize < cols:
numCols = blockSize
else:
numCols = cols – j
data = band.ReadAsArray(j, i, numCols, numRows)
# do something with the data here, before reading the next block
Out of curiosity, I did Exercise 4 (which computes an NDVI for aster.img) each of these three
different ways, and here are the timed results when running on my PC:
Entire image elapsed time: 31.5940001011 seconds
Row of blocks elapsed time: 30.9220001698 seconds
Single block elapsed time: 29.2650001049 seconds
Manipulating raster data with Numerical Python
If we’re going to go to the trouble to read in large amounts of data from a raster image, that
means we probably want to do something with it. Say we want to compute an NDVI on
aster.img. The formula for this is (NIR-RED)/(NIR+RED), where NIR is band 3 and RED is
band 2. We’ll assume that we have read the data from band 3 into an array called data3, and the
data from band 2 into data2. We could then compute the NDVI like this:
ndvi = (data3 – data2) / (data3 + data2)
There is a problem with this, however. The image uses 0 as the NODATA value, so for parts of
the image where there is no data, we get a zero division error (data3 + data2 = 0).
This is a good time to learn about the Numeric greater() and choose() methods. Take a look at
this example code:
>>> values = Numeric.array([100,500]) 1
>>> print values 2
[100 500] 3
>>> input = Numeric.array([0,3,5,0,2]) 4
>>> print input 5
[0 3 5 0 2] 6
>>> mask = Numeric.greater(input, 0) 7
>>> print mask 8
[0 1 1 0 1] 9
>>> output = Numeric.choose(mask, values) 10
>>> print output 11
[100 500 500 100 500]12
Lines 1 and 4 show how to create a one-dimensional Numeric array from scratch. Line 7 uses
the greater() function to create a mask that is filled with 0’s and 1’s. The greater() function
checks to see if the first argument is greater than the second. If so, the output is 1. Otherwise,
the output is 0. Because one of the arguments (input) is an array, the output is also an array.
Each value of the output is the function’s results when run on the corresponding value in the
input array.
Line 10 uses the mask created on line 7 to figure out what to do in various cases. The choose()
method uses the values of the first argument as indices to the second argument. Since our mask
array is only made up of 0 and 1, then the second argument to choose() only needs two values in
it (which it does – 100 and 500). Everywhere mask = 0, the first item in the values list (which
has index 0) is put into the output array. Everywhere mask = 1, the second item in the values list
(which has index 1) is put into the output array. This can be extended to more indices than just 0
and 1.
So now let’s apply this to the NDVI problem:
mask = Numeric.greater(data3 + data2, 0)
ndvi = Numeric.choose(mask, (-99, (data3 - data2) / (data3 + data2)))
The first line creates a mask image that is filled with 1 where data3 + data2 > 0, and 0
everywhere else. The second line uses this mask to set the output value to -99 everywhere the
mask is 0, but does the math on the remaining pixels. This still doesn’t work, however, for
reasons that I don’t understand. Through trial and error, I’ve discovered that the data have to be
cast to floating point or we still get the zero division error. This is easy to fix when reading in
the data, although it takes up more memory so there is no reason to do it if it’s not needed. We
can use the astype function to convert the values of an array to whatever type we need.
data2 = band2.ReadAsArray(0, 0, cols, rows).astype(Numeric.Float16)
data3 = band3.ReadAsArray(0, 0, cols, rows).astype(Numeric.Float16)

mask = Numeric.greater(data3 + data2, 0)
ndvi = Numeric.choose(mask, (-99, (data3 - data2) / (data3 + data2)))
Actually, we want our NDVI output to be floating point, anyway, so this is convenient. If we
had kept using integers then all of our output would have been truncated to integers, so we’d
have an entire image full of zeros!
Writing raster data with GDAL
It doesn’t make sense to compute an NDVI without writing it out to a file so we can do
something with it later. The first thing we need to do is create an empty image, which means we
need the appropriate driver. If we registered a specific driver in order to open our image instead
of using the AllRegister() method, we can just reuse the same driver (assuming we want to write
the image to the same file format). If we want to write our data to a different file type we can
fetch the driver we want the same way we did before, using the GetDriverByName() method.
Lastly, if we used the AllRegister() method and want to write to the same file type that we read
from, then we can grab the driver from the Dataset object that we opened for reading:
driver = inDataset.GetDriver()
Now that we have the driver, we can use it to create a new Dataset object with the same
dimensions as the input image:
outDataset = driver.Create(filename, cols, rows, 1, GDT_Float32)
The Create() method requires us to pass in the filename for the new image, the numbers of
columns, rows and bands, and a constant specifying the data type. Assuming that the cols and
rows variables contain the numbers of columns and rows in the input image, this example will
create an empty 1-band image with the same numbers of columns and rows. The pixel values
will be 32-bit floating point, which is sufficient for NDVI values. The Create() method allocates
the space on disk immediately. This means that if we use the Create() method but never add data
to the file, the file will still be just as large as it would have been if we had added our data.
We probably want to georeference the new image, so we can just copy the georeferencing and
projection information from the input file:
outDataset.SetGeoTransform(inDataset.GetGeoTransform())
outDataset.SetProjection(inDataset.GetProjection())
We also need to get a handle to the output band before we can put data in it:
outBand = outDataset.GetRasterBand(1)
Now we can write our NDVI array into the band, using the WriteArray() method:
outBand.WriteArray(ndvi, 0, 0)
The first parameter is the array of data to write into the band, the second is the X offset to start
writing at, and the third is the Y offset to start writing at. If we read in the entire image at once
so that our array contains the NDVI values for the entire image, then this is all we have to do to
write the data as shown here. However, if we read in 64 rows at a time, in order to take some
advantage of block size, we would have to change the offsets in the WriteArray() method to
match the current offsets being used for reading the input. Something like this:
blockSize = 64
for i in range(0, rows, blockSize):
if i + blockSize < rows:
size = blockSize
else:
size = rows - i
data = band.ReadAsArray(0, i, cols, size)
# do calculations here to create outData array
outBand.WriteArray(outData, 0, i)
Notice that we’re using the same Y offset (i) for both reading and writing the data.
We need to remember to close the output dataset when done writing to it by setting it equal to
None.
Exercise 4: Creating an NDVI image
Create an NDVI image from the aster.img file. There is probably plenty of RAM on these
machines so that you can read in the bands all at once, but you can try to break it up if you want
a challenge. It’ll run faster that way, too!
To look at your NDVI image in ArcMap you’ll have to tell ArcMap to calculate the statistics
while ignoring the value you used for NODATA (-99 in our examples). To do this, use the Data
Management Tools | Raster | Calculate Statistics tool in ArcToolbox. Choose your NDVI image
for the Input Raster Dataset, leave the numbers of rows and columns to skip at 1, put your
NODATA value in the Ignore Values field and hit the plus button so it shows up in the list at the
bottom of the dialog. Then hit OK. Now you should be able to display your NDVI image
correctly.
Passing arguments to Python scripts
In most cases a script is most useful if you can pass arguments into it instead of hardcoding them
into the script. An example of this would be passing input and output filenames into a script.
This is done with the argv list that is part of the sys module. When a script is invoked this list is
automatically filled with the passed values. The first value in the list is always the name of the
script, so there is always at least one value in this list. The first thing we might want to do if
users are passing in parameters is check to see if they provided the correct number of arguments:
import sys
if len(sys.argv) != 3:
print 'Usage:\npython ndvi.py <infile> <outfile>'
sys.exit(0)
The len() function just returns the number of items in the list. If the user didn’t pass in two
arguments (remember, the script name is always present, so there should be 3 items in the
sys.argv list), then this bit of code will print a message showing the syntax necessary to invoke
the script and then it will exit. Notice the '\n' in the middle of the string. This will print a
newline.
Now we can get the passed parameters and store their contents in variables. While we’re at it,
we might want to check that the user passed in strings since no other data type will work as a
filename.
inFilename = sys.argv[1]
outFilename = sys.argv[2]
if not (isinstance(inFilename, str) and isinstance(outFilename, str)):
print 'Usage:\npython ndvi.py <infile> <outfile>'
sys.exit(0)
The isinstance() function checks to see if a variable is of a certain type. Here we’re checking
that the filenames are of type string (str). This example just prints the syntax message again, but
we might want to print something more explicit that informs the user that filenames are usually
strings.
We might also want to make sure the output file doesn’t exist so that it isn’t accidentally written
over. We could also check that the input file exists, or we could just wait and see if the file can
be opened later in the script. If it doesn’t exist, it obviously can’t be opened. Here is how we
would check that the input does exist and the output does not already exist:
import os.path
if not os.path.exists(inFilename):
print inFilename + ' does not exist'
sys.exit(0)
if os.path.exists(outFilename):
if raw_input('\n' + outFilename + ' exists. Overwrite (y/n)? ') != 'y’:
sys.exit(0)
This code snippet contains several things we haven’t seen before. The first is the os.path.exists()
function. This returns True if the file exists and False otherwise. We need to import the os.path
module in order to use it.
The second new concept in this example is the built-in function raw_input(). This takes a string
that is displayed to the user and it returns whatever the user inputs before hitting the Enter key.
Here we’re checking to see if the user entered a 'y' to overwrite the file, and if not the script exits.
Exercise 5: Pass filenames to the NDVI script
Alter your script from Exercise 4 so that you pass in the input and output filenames instead of
hardcoding them into the script.
Creating a simple module
What if we want to use our NDVI code elsewhere? All we need to do is create a function and
put it in a module that we can import into other scripts. It’s easy to convert our script to a
function by putting it inside of a def wrapper that includes the parameters to be passed and
removing references to sys.argv. The sys.argv list is no longer needed because the parameters
are passed in through the function call.
def ndvi(inFilename, outFilename):
# put all processing code here, but remove references to sys.argv
If the function needs to return a value use the return statement. Here is a simple function to add
two numbers that demonstrates this concept:
def add(n1, n2):
return n1 + n2
Keep in mind that we still need to import any needed modules (such as gdal) at the beginning of
the module. We can put as many functions in a module as we want. Once we have created our
own module, we can import it into other scripts and use its functions.
import my_mod
my_mod.ndvi('d:/data/images/image1.img', 'd:/data/images/output/image1.img')
If the module is in the same directory that we are running Python from it will be found.
Otherwise we need to put it in Python’s search path (see the Python documentation for more
information about this). The name of the file becomes the module name, so in this example the
filename of my new module is my_mod.py, and this file contains a function called ndvi that
takes an input and output filenames as its parameters.
Batching
Batching a process is really easy. There is the obvious way of creating a batch file that calls a
python script multiple times, passing different parameters each time. This is just a text file with
a .bat extension (on Windows, anyway). The batch file would look something like this:
python ndvi.py image1.img output/image1.img
python ndvi.py image2.img output/image2.img
python ndvi.py image3.img output/image3.img
.
.
.
There is an easier way to do it, though, that doesn’t require using a batch file. We can use the
Python glob module instead.
import glob, os.path, os, my_mod
filenames = glob.glob('d:/data/images/*.img')
for inFilename in filenames:
outFilename = 'd:/data/images/output/' + os.path.basename(inFilename)
# put code to compute NDVI here
# or
# os.system('python ndvi.py ' + inFilename + ' ' + outFilename)
# or
# my_mod.ndvi(inFilename, outFilename)
The glob function returns a list of filenames that match the pattern provided. In this example, it
will return all *.img files in the d:/data/images directory. We could get a list of all files in the
directory, no matter the type, by passing in 'd:/data/images/*'.
Once we get a list of files, we loop through them using a for loop. The first thing we do inside
the loop is create an output filename. The glob() function includes the path as part of the
filenames, so we extract out just the filename using the os.path.basename() function and we add
our own output path onto the beginning of that base filename. This way we keep the same
filename, but save it in a different directory.
After we get the output filename we compute the NDVI. Notice there are three different ways of
doing that shown here. The first is to simply include all of the processing code inside the for
loop. Personally, I would rather have the code that did the hard work elsewhere so that I could
reuse it in something else if I needed to. The second example, using os.system, simply makes a
system call and runs the ndvi.py script essentially the same way that the above batch file does –
with the advantage that no batch file is required. The third way shown here assumes that the
NDVI code has been turned into a function called ndvi and put in the module called my_mod.
This is the slickest way to do things.
Exercise 6: Create a module and use it in a batch process
Create a module containing a function to print file size (we’re giving up on the NDVI example
for now in the interest of processing time). Then create a script that calls this function on every
file in the workshop data directory.
Hint: You can get the size of a file, in bytes, like this:
import os, stat
bytes = os.stat(filename)[stat.ST_SIZE]
Divide this by 1,048,576 to get the size in megabytes. The easiest way to get a floating point
answer, as opposed to an integer, is to use 1048576.0 instead of 1048576 when doing the
division.
Appendix 1: Summary of OGR methods and constants
This is only part of what is available through OGR. I have included the constants and methods
that we used in the workshop along with a few others. See the OGR website (
www.gdal.org/ogr
)
and the ogr.py file in your FWTools pymod directory for a full listing.
Constants
Geometry types
wkbUnknown wkbPoint wkbLineString
wkbPolygon wkbMultiPoint wkbMultiLineString
wkbMultiPolygon wkbGeometryCollection wkbNone
wkbLinearRing
Field types
OFTInteger OFTIntegerList OFTReal
OFTRealList OFTString OFTStringList
OFTWideString OFTWideStringList OFTBinary
OFTDate OFTTime OFTDateTime
Stand-alone methods
Feature(feature_defn): Returns a new Feature object created using the FeatureDefn object
passed in. The FeatureDefn object defines the attributes fields and geometry type.
Geometry(type): Returns a new Geometry object of the type specified by the type constant (see
above for possible values). There are multiple other ways to create Geometry objects using this
constructor, but this is the only one covered in the workshop.
GetDriverByName(name): Returns a Driver object. Use this code to get a list of available
driver names:
import ogr
cnt = ogr.GetDriverCount()
for i in range(cnt):
driver = ogr.GetDriver(i)
print driver.GetName()
Open(filename, update): Opens the file specified by filename and returns a DataSource object.
The update parameter defaults to 0 for no updates – pass 1 in order to allow editing.
Driver methods
Open(filename, update): Opens the file specified by filename and returns a DataSource object.
The update parameter defaults to 0 for no updates – pass 1 in order to allow editing. This
requires that the file being opened is of the type that can be opened by the given Driver object.
DataSource methods
Destroy(): Closes the DataSource.
GetLayer(layer_index): Returns the layer (as a Layer object) at the given index in the
DataSource. For shapefiles, the index will always be 0.
Layer methods
CreateFeature(feature): Adds the feature Feature object to the layer.
GetExtent(): Returns the spatial extent of the layer as [minx, maxx, miny, maxy].
GetFeatureCount(): Returns the number of features in the layer.
GetLayerDefn(): Returns the FeatureDefn object that defines the attribute fields and geometry
type for the layer.
GetNextFeature(): Gets the next feature in the layer (starting with the first one) and returns it
as a Feature object. If there are no remaining features, returns None.
ResetReading(): Sets the reader back to the start of the set of features.
SetAttributeFilter(where_clause): Sets a SQL WHERE clause that will limit which features
are returned with GetNextFeature(), based on attribute values. For example, “population >
100000” will limit GetNextFeature() so it only returns features that have a population attribute
greater than 100000.
SetSpatialFilter(geom): Sets the geom Geometry object as a spatial filter on this layer. If a
spatial filter is set then GetNextFeature() will only return features that intersect the Geometry
being used as the filter.
Feature methods
GetField(field_index), GetField(field_name): Returns the value of the attribute field specified
with the passed index or name.
GetFieldAsString(field_index), GetFieldAsString(field_name): Returns the value of the
attribute field specified with the passed index or name, but the value is coerced into a string.
GetFieldAsInteger(field_index), GetFieldAsInteger(field_name): Returns the value of the
attribute field specified with the passed index or name, but the value is coerced into an integer if
possible. Will return 0 if the value cannot be converted to an integer.
GetFieldAsDouble(field_index), GetFieldAsDouble(field_name): Returns the value of the
attribute field specified with the passed index or name, but the value is coerced into a double if
possible. Will return 0 if the value cannot be converted to an integer.
GetGeometryRef(): Returns a Geometry object that is the actual geometry (point, line,
polygon, etc.) associated with the feature.
SetField(field_index, value), SetField(field_name, value): Sets the attribute specified with
either index or name to the passed value.
SetGeometry(geom): Associates the geom Geometry object with the Feature.
Geometry methods
Buffer(distance): Returns a Polygon Geometry that is this Geometry buffered by distance.
Centroid(): Returns a Point Geometry that contains the centroid of this Polygon Geometry.
Contains(other_geom): Returns True if other_geom contains the Geometry.
Difference(other_geom): Returns the difference between the Geometry and other_geom.
Distance(other_geom): Returns the distance between the Geometry and other_geom.
Equal(other_geom): Returns True if the Geometry and other_geom are equal.
GetArea(): Returns the area of the Geometry.
GetX(), GetY(): Return the x and y coordinates of a geometry. For Points only, or you can pass
in an index to get vertices for a LinearRing.
Intersect(other_geom): Returns True if the Geometry and other_geom intersect.
Intersection(other_geom): Returns the intersection between the Geometry and other_geom.
SetPoint(index, x, y, z): Sets the x and y coordinates for a geometry. The index is always 0 for
a Point. The z parameter will default to 0 if not provided.
Union(other_geom): Returns the union between the Geometry and other_geom.

Appendix 2: Summary of GDAL methods and constants
This is only part of what is available through GDAL. I have included the constants and methods
that we used in the workshop along with a few others. See the GDAL website (
www.gdal.org
)
and the gdal.py and gdalconst.py files in your FWTools pymod directory for a full listing.
Constants
Data types
GDT_Unknown GDT_Byte GDT_UInt16
GDT_Int16 GDT_UInt32 GDT_Int32
GDT_Float32 GDT_Float64 GDT_CInt16
GDT_CInt32 GDT_CFloat32 GDT_CFloat64
GDT_TypeCount
Access types
GA_ReadOnly GA_Update
Resample types
GRA_NearestNeighbour GRA_Bilinear GRA_Cubic
GRA_CubicSpline
Stand-alone methods
AllRegister() : Registers all drivers.
GetDriverByName(name): Returns a Driver object. Get a list of names from
www.gdal.org/formats_list.html
(Code column).
GetDriverList(): Returns a list of available drivers (as Driver objects, not just their names).
Open(filename, access): Opens the file specified by filename and returns a Dataset object. The
access parameter defaults to GA_ReadOnly, but GA_Update can be used to edit the file.
CreateAndReprojectImage(src_ds, dst_filename, src_wkt , dst_wkt, dst_driver,
create_options, eResampleAlg, warp_memory, maxerror): Reprojects an image. The src_ds
is a Dataset object. A new image will be created with the given filename using the Driver object
provided (so for example, provide the HFA driver to create an Erdas Imagine file) and the list of
creation options provided (see formats and their options at
www.gdal.org/formats_list.html
).
The default resample algorithm is GRA_NearestNeighbour. Source and destination projections
are defined with Well Known Text (see
http://www.gdal.org/ogr/osr_tutorial.html
for more
information). Warp_memory and maxerror can safely be left out if you’re not sure what to do
about them.
Driver methods
Create(filename, xsize, ysize, bands, datatype, options): Creates and returns a new Dataset
with the given filename and x & y sizes. The number of bands defaults to 1, datatype defaults to
GDT_Byte, and options is a list of strings specifying creation options (see
www.gdal.org/formats_list.html
).
Delete(filename): Deletes a data set.
Register(): Registers the Driver object.
Dataset properties
RasterXSize: Number of columns.
RasterYSize: Number of rows.
RasterCount: Number of bands.
Dataset methods
GetRasterBand(i): Returns a Band object that is the band at index i in the Dataset.
GetGeoTransform(): Returns a list of GeoTransform information for the Dataset. These are
the values in the list:
adfGeoTransform[0] /* top left x */
adfGeoTransform[1] /* w-e pixel resolution */
adfGeoTransform[2] /* rotation, 0 if image is "north up" */
adfGeoTransform[3] /* top left y */
adfGeoTransform[4] /* rotation, 0 if image is "north up" */
adfGeoTransform[5] /* n-s pixel resolution */
GetProjection(): Gets the WKT projection string for this Dataset.
SetProjection(projection): Associates a projection with the Dataset. The projection parameter
is a string in either WKT or PROJ.4 format.
SetGeoTransform(transform): Takes a list such as that returned by GetGeoTransform() and
associates it with the Dataset.
Band properties
DataType: Data type constant from above.
XSize: Number of columns.
YSize Number of rows.
Band methods
ComputeRasterMinMax(): Returns a list with the minimum and maximum values for the
Band.
Fill(value): Fills all pixels in the Band with a constant value.
GetNoDataValue(): Gets the NODATA value for the Band (I have noticed that this doesn’t
work for Imagine Images).
GetMinimum(): Returns the minimum value in the Band.
GetMaximum(): Returns the maximum value in the Band.
GetStatistics(): Returns a list with minimum, maximum, mean, and standard deviation values
for the Band.
ReadAsArray(xoff, yoff, win_xsize, win_ysize, buf_xsize, buf_ysize, buf_obj): Reads data
from the Band object into a Numeric array. The xoff and yoff parameters define what x and y
offsets to start reading at and both default to 0 (the upper left corner, usually). The win_xsize
and win_ysize parameters define how many columns and rows to read, respectively. You can
safely ignore the other parameters unless you know what you’re doing.
WriteArray(array, xoff, yoff): Writes a Numeric array to a Band object. The xoff and yoff
parameters define where in the band to start writing the data; both default to 0.

Appendix 3: Summary of Numeric methods and constants
This is only a small sampling of what is available in the Numeric module. See
http://numpy.sourceforge.net/numdoc/HTML/numdoc.htm
(HTML) or
http://numpy.sourceforge.net/numdoc/numdoc.pdf
(115-page PDF) for much more
documentation.
Typecodes
Typecodes can be used with the astype() function. The number of bytes each takes up is
machine dependent – it is possible that data declared as one type will really take up more space.
See
http://numpy.sourceforge.net/numdoc/HTML/numdoc.htm#pgfId-35852
for more
information.
Int0, Int8, Int16, Int32, Int
Float0, Float16, Float32, Float8, Float, Float64
Complex0, Complex16, Complex32, Complex8, Complex, Complex64
Creating arrays from scratch
a = Numeric.array([2, 3.5, 5])
print a
[ 2. 3.5 5. ]

b = Numeric.array([2, 3.5, 5], Numeric.Int)
print b
[2 3 5]

c = Numeric.array([[2, 3.5, 5], [6, 10, 11.3], [1, 9, 4], [5.2, 7.1, 3.8]])
print c
[[ 2. 3.5 5. ]
[ 6. 10. 11.3]
[ 1. 9. 4. ]
[ 5.2 7.1 3.8]]
The first example creates a one-dimensional array. The Numeric module determines which data
type best fits the data, and in this case it would choose floating point (because of the 3.5 value).
We can also pass in a typecode after the data that specifies the data type we would like to use. In
the second example, the 3.5 will get truncated to 3. The third example shows how to create a
floating point two-dimensional array with three columns and four rows.
Accessing array values
Individual array elements can be accessed with either [index] (for one-dimensional arrays) or
[row, column] (for two-dimensional arrays) notation. Using the examples from above:
print a[1]
3.5
print b[1]
3
print c[1,2]
11.3
It is also possible to access more than one element at a time using the : operator to specify
starting and ending indices (the ending one is not inclusive). Using this operator alone is the
same as specifying all indices.
print a[:] # all indices
[ 2. 3.5 5. ]
print a[0:3] # indices 0-3 (same as all in this case)
[ 2. 3.5 5. ]
print a[1:3] # indices 1-3
[ 3.5 5. ]

print c[:,:] # all rows, all columns
[[ 2. 3.5 5. ]
[ 6. 10. 11.3]
[ 1. 9. 4. ]
[ 5.2 7.1 3.8]]
print c[2,:] # row 2, all columns
[ 1. 9. 4.]
print c[:,1] # all rows, column 1
[ 3.5 10. 9. 7.1]
print c[2:4, 1:3] # rows 2-4, columns 1-3
[[ 9. 4. ]
[ 7.1 3.8]]
Array functions
This is not even close to a comprehensive list – in fact, it ignores most of the available functions.
Please see the documentation listed above if what you’re looking for is not listed here.
choose(a, (b
0
, b
1
, …, b
n
))
The a argument is an array of integers from 0 to n. For each value v in a, the output will be b
v
.
Using the arrays from above:
print Numeric.choose(Numeric.greater(c,5), (0, c))
[[ 0. 0. 0. ]
[ 6. 10. 11.3]
[ 0. 9. 0. ]
[ 5.2 7.1 0. ]]
Where c > 5 then the choose() function chooses the value with index 1 (c) from its second
argument. Everywhere that c <= 5, the value with index 0 is used (0).
where(condition, x, y)
Returns an array that is the same size as the condition array, but if condition is true for an entry,
the output has value x, otherwise it has value y. Using the arrays from above:
print Numeric.where(c > 5, 1, 0)
[[0 0 0]
[1 1 1]
[0 1 0]
[1 1 0]]
The output has a 1 everywhere c > 5 and 0 everywhere else.
clip(m, m_min, m_max)
Returns an array that is the same shape as m, but all values in m that are less that m_min are set
to m_min, and all values greater than m_max are set to m_max. All other values of m are left as
they were. Again, using the arrays from above:
print Numeric.clip(c, 2, 10)
[[ 2. 3.5 5. ]
[ 6. 10. 10. ]
[ 2. 9. 4. ]
[ 5.2 7.1 3.8]]
All values in the c array that are less than 2 are set to 2, and all values greater than 10 are set to
10.
Self-explanatory
add (+) subtract (-) multiply (*) divide (/)
remainder (%) power (**) arccos arccosh
arcsin arcsinh arctan arctanh
cos cosh exp log
log10 sin sinh sqrt
tan tanh maximum minimum
conjugate equal (==) not_equal (!=) greater (>)
greater_equal (>=) less (<) less_equal (<=) logical_and (and)
logical_or (or) logical_xor logical_not (not)