Environmental Analysis through integration of

coatiarfAI and Robotics

Oct 17, 2013 (3 years and 8 months ago)

104 views

Environmental Analysis through integration of
Geographical Information and Machine Vision systems
by
Paul D.Kelly,M.Eng.
A thesis presented on application for the degree of
DOCTOR OF PHILOSOPHY
Faculty of Engineering
The Queen’s University of Belfast
School of Electrical & Electronic Engineering,
May 2004.
Chapter 2
Camera/Video Image
Acquisition and Initial Processing
Section 1.5.1 gave a review of the techniques available for image acquisition and ini-
tial processing.In this chapter the techniques chosen and their integration in and
application to the overall system will be described.
Most of the analysis techniques described in the rest of this thesis are independent
of the means of image acquisition.However to achieve an exact correspondence between
pixels in a digitised camera image and the 3-D world,accurate alignment of the camera
image and understanding of the optical properties of the camera are very important.
These factors must be corrected for before further analysis can take place.The subject
of this chapter is the identification,characterisation and,where necessary,correction
of these factors.
The discussion will include both factors that affect the position of objects in the im-
age frame (e.g.lens distortion) and factors that affect the appearance (surface features)
of the objects themselves (e.g.colour and illumination response of the camera).
2.1 Camera Selection
Section 1.5.1 discussed the requirements of change detection systems in terms of cameras
and other image acquisition systems,and reviewed the solutions available.Although
selection of the most appropriate camera to be used in the experimental implementation
of the system is not vital (as the techniques will be applicable to any camera),it has
nonetheless been given attention.This resulted in the following list of requirements:
• Inexpensive
• Mobile
• Capable of capturing a continuous sequence of images
22
Chapter 2.Camera/Video Image Acquisition and Initial Processing 23
• Images must be easily transferred to a computer for processing
• Able to use a reliable short-and long-term storage medium
A digital video (DV) camera has already been used in related vision-based environmen-
tal monitoring research (McMenemy et al.,2003;Zatari et al.,2003) and techniques
and algorithms have been developed for its use in this application.Because of this such
a camera was the obvious choice to investigate for suitability.It was found to meet
the requirements and,since selection of an appropriate camera is not vital anyway,no
further investigation of alternative cameras was carried out in detail.
This type of camera is a commercially available consumer product and operates on
the principle of a charge-coupled device (CCD) array.Light incident on the cells in
the array is converted to an electric charge and the camera processes and converts this
information into a continuous stream of images (forming a video sequence),which are
stored in the DV format on magnetic tape.
2.2 Image Capture
The images from the DV camera must be transferred to a computer-readable format,
i.e.captured,for processing.It is possible to do this digitally (without any loss of
quality) by using the IEEE 1394 data bus,commonly known as a Firewire connection.
The Firewire protocol allows computer control of the tape transport controls (play,
fast forward etc.) on the camera:this provides much potential for automation of image
capture (and hence automated analysis).
DV cameras are marketed as a consumer product and the capture software com-
monly supplied with them (e.g.Ulead Mediastudio) runs on Microsoft Windows and is
operated via a graphical user interface (GUI).The GUI can be quite labour-intensive
to use for large numbers of operations (e.g.capturing many image frames) and is not
flexible with regard to automation or scripting.Experience during this work has also
shown such software to be unreliable and crash regularly.Because of these problems
an alternative to the software supplied with the camera was investigated.
A Linux solution was investigated based on the assumption that the customary low
cost and high reliability associated with Linux-based solutions would apply here.The
outcome of this was that a reliable modular system for handling Firewire video was
assembled,as described in the rest of this section.
The Linux operating system supports access to Firewire devices after insertion of
the ieee1394 kernel modules and installation of the relevant libraries.With the pre-
requisites installed on the Linux system,camera control is accomplished by the use of
three commands:
Chapter 2.Camera/Video Image Acquisition and Initial Processing 24
• dvcont controls play,stop,pause,fast forward and rewind,etc.and can pause
the tape at a specific time-coded frame,print the frame the tape is currently at
etc.
• dvgrab captures video frames in DV format from the video sequence currently
playing or paused in the camera.
• playdv decodes files saved in the raw DV format and displays them on screen or
(more usefully) saves them to disk in a conventional image format (see below).
The Linux solution for DV capture employs a typical Unix modular approach and is
not a monolithic program.It is thus less susceptible to errors in one part of the system.
However the primary advantage of control by separate utilities is that the whole system
is scriptable and customisable using normal Unix tools such as the standard Bourne
shell.A sample shell script for saving the current frame from the video to a bitmap file
is given in Listing 2.1.
Listing 2.1:DV Frame Capture Script
#!/bi n/sh
#capt ure
#
#Capt ures one frame from DV camera and s aves i t t o
#bmp f i l e wi t h t he camera t i me as t he f i l ename
#PK March 2003
#
base=‘echo\‘ dvcont ti mecode\‘ | sed ’ s/:/−/g ’ ‘
dvgrab −−frames 1 −−format raw $base
dvf i l e=$base.dv
ppmf i l e=$base.ppm
bmpf i l e=$base.bmp
playdv −n 1 −d 1 −−dump−frames=$ppmf i l e $dvf i l e
ppmtobmp $ppmf i l e > $bmpf i l e
rm $dvf i l e
rm $ppmf i l e
Experience of using the Linux solution during this work confirmed the advantages
mentioned above.The system did not crash even when capturing long video sequences;
the Windows software frequently crashed while performing this operation.
Storage format and compression The raw format in which images captured from
the camera are received is the DV format.Unfortunately it includes inherent compres-
sion of the image which results in some loss of quality.Thus the DV format is not
particularly suited for this application and ideally would not be used,however because
of the constraints imposed by creating a low-cost solution it will be used here in spite
of the compression.To preserve the maximum amount of detail to enable identification
of objects in the image it is important to convert the image to a lossless format as
early in the processing cycle as possible.Windows bitmap (BMP) or Tagged Image
File Format (TIFF) are suitable for this.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 25
DV uses Intraframe Discrete Cosine Transformation (DCT) to compress the video.
The compression loss will not be quantified here but it is known from personal obser-
vation that the grid-based pixel grouping scheme sometimes manifests itself as blocky
artefacts in the resulting images,particularly when sudden movement is involved.
The DV format together with the PAL television standard specifies the image size,
aspect ratio and pixel size,colour,sound etc.Individual details of these features and
calibrating and compensating for them will be discussed in the rest of this chapter.
2.3 Pixel Aspect Correction
An important point to be considered at this stage is the image size and aspect ratio of
the pixels in the digital images captured fromthe camera.Computer displays use square
pixels and for accurate display representation it is important that displayed images have
the same proportions (i.e.a pixel width to height ratio of 1:1).Many image processing
algorithms also assume square pixels;it simplifies measurement of distances within the
image frame and eases the transformation between discrete integer (row,column) pixel
co-ordinates and continuous (u,v) cartesian co-ordinates (The MathWorks,Inc.,1997,
p1-18).
For the camera used,the images conform to the PAL DV standard (International
Telecommunication Union,1995),known as “Rec.601”.PAL DV is based on the ana-
logue PAL standard,in which an image frame consists of 625 discrete ‘lines’ each
comprising a time-variant analogue signal.As only 576 of the 625 lines are visible,a
digital PAL image will always have 576 rows.However the number of columns (i.e.the
number of pixels that each line is divided into) and the width of these columns/pixels is
determined by the horizontal line sampling frequency.For PAL DV,Rec.601 specifies
this so that a line contains 720 pixels each with an aspect ratio of 59:54.
Thus if a DV image is displayed on a square-pixel medium without correction it
will be distorted.This is shown in Figure 2.1 (a),where on close inspection it can be
seen that the squares in the test pattern appear slightly narrower than they are high.
Each small square in the patterns in Figure 2.1 represents 20 units (where the height
of a pixel is equal to 1 unit).
Pixel correction using GRASS GIS Although for many image processing op-
erations it is desirable to have square pixels,it is feasible to process the images in
their native rectangular format,at the expense of some extra complexity as described
above.Suitable software for this processing is a Geographical Information System such
as GRASS (Neteler and Mitasova,2002).When using a GIS for image processing,an
unprojected (‘XY’) co-ordinate system is used,with each raster cell representing one
image pixel.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 26
Assigning a value of 1 unit to the height of a pixel implies a width of 59/54,or
1.09259259.The width of an image in ‘square pixels’ (i.e.with a width of 1 unit) is
(59/54)×720 = 786
2
3
pixels.The image can now be displayed with its correct geometric
proportions,as shown in Figure 2.1 (b).Note however that as the image is not a whole
number of square pixels wide,it would need to be cropped if converting permanently
to this format.
Cropping off 1/3 pixel width from either side would reduce the image to a whole
786 pixels,however it is worth considering that the standard image aspect ratio for
PAL and other video image standards is 4:3 (International Telecommunication Union,
1995).For the standard size of 576 rows,this corresponds to 768 columns (576×(4/3)).
So at a width of 786
2
3
,the DV image has been over-sampled and has an aspect ratio
of 4.0972:3.The reason for this over-sampling is to allow for compatiblity between
different equipment,and only the central 4:3 portion of the image is required to contain
valid image data.In practice with the cameras used however,the whole width of the
scanline always appeared to contain valid image data.
Several methods are possible for converting the image into 4:3 768 × 576 format.
In the GRASS GIS the horizontal resolution can be changed from 1.09259259 units to
1 unit and the edges of the active region changed so that only the central 768 units
are selected,with 9
1
3
‘over-sampled’ pixels being discarded on either side.GRASS
will then automatically re-sample the image to suit the new pixel/cell layout,using
nearest neighbour re-sampling (see Section 2.4.3 for a discussion of different re-sampling
methods).
Image Co-ordinate Space (Pixel edges shown as straight lines) At this stage
it is helpful to define the co-ordinate space used when describing a standard 4:3 aspect,
768 ×576 square-pixel image.This is described as
0.5 ≤ u < 768.5
0.5 ≤ v < 576.5
where u and v are continuous variables,and the pixel arrangement is defined as
1 ≤ column ≤ 768
1 ≤ row ≤ 576
where row and column are discrete integer variables.The origin is in the top left hand
corner and the rowand v values increase downwards.Each pixel is one unit high and one
unit wide so the pixel number (row,column) = (1,2) has the co-ordinates of its centre
point at (u,v) = (2.0,1.0),and the pixel is bounded by the lines u = 1.5,u = 2.5,v = 0.5
Chapter 2.Camera/Video Image Acquisition and Initial Processing 27
(a) 720 ×576 image displayed uncorrected using square pixels
(b) 720 ×576 image displayed in correct proportions (i.e.786
2
3
units wide) using rectangular pixels
(c) 768 ×576 4:3 aspect,square pixels image
Figure 2.1:Correction of images to account for rectangular pixels
Chapter 2.Camera/Video Image Acquisition and Initial Processing 28
and v = 1.5,and so on for the other pixels (The MathWorks,Inc.,1997,p1-18).This
arrangement is illustrated in Figure 2.2.
Figure 2.2:Image co-ordinate space (lines represent pixel edges)
As a sidenote to the discussion on representing rectangular pixel images in GIS,it
is convenient to be able to specify the horizontal and vertical resolution,i.e.pixel size,
and the extents of the region of co-ordinate space that the image should cover,in a
generic format prior to importing the image into the GIS.If the image is stored in TIFF
format,this can be achieved by creating a ‘TIFF world’ file (TheFreeDictionary.com,
2004) to accompany the image.The TIFF world file is an ASCII text file with the same
name as the TIFF file,with the suffix ‘.tfw’ instead of ‘.tif’.It contains six lines,each
of which holds one of the co-efficients of a rotation and translation matrix as described
below.
The transformation from discrete image (row,column) co-ordinates into GIS (x,y)
cartesian co-ordinates can be described by
￿
x
y
￿
=
￿
A C
B D
￿￿
row
col
￿
+
￿
E
F
￿
(2.1)
If there is no rotation,only a scaling and offset,then B and C will be 0.A is
equivalent to the horizontal resolution (i.e.pixel size,1.09259259) and D the vertical
resolution.To be consistent with the standard image row ordering (i.e.increasing
downward from the top as described above),the vertical resolution is given as −1 (i.e.
y = −v) and within the GIS the y co-ordinates will be negative,however the absolute
value is equivalent to the v co-ordinate or (when rounded) the row number.
Using the defined co-ordinate system,the image will have its top and bottom edges
parallel to the lines y = −0.5 and y = −576.5 respectively,and its left and right-hand
edges parallel to x = 0.5 and x = 768.5 respectively.However,as the image is being
Chapter 2.Camera/Video Image Acquisition and Initial Processing 29
imported in its original uncropped format,it will actually be 9
1
3
units wider than this
in each direction,and so the edges of the uncorrected image will actually be parallel to
x = −8
5
6
and x = 777
5
6
.Then if the region boundaries are reset to the original values
of 0.5 and 768.5,the central 4:3 part of the image can easily be extracted.
The translation co-ordinates for the TIFF world refer to the centre of the pixels,
so to derive the values for the translation co-efficients E and F,half a pixel must be
added to the left-hand and top edges.This gives
E = −8
5
6
+
59
54
×
1
2
= −8
31
108
F = −0.5 +(−1) ×
1
2
= −1
The contents of the TIFF world file to accompany any DV image are therefore (to 15
significant figures,i.e.‘double’ precision in the C programming language):
1.09259259259259
0.0
0.0
−1.0
−8.28703703703704
−1.0
If the image is imported into GRASS using the r.in.tiff command and the TIFF world
file is present with the same basename as the TIFF file but a suffix of ‘.tfw’,it will be
read and used to correctly position the image.The same format can be used by many
other GIS image processing software.
Pixel correction using MATLAB An alternative approach to converting the im-
age to standard 4:3 format is to fill a grid of 768 ×576 grid of square pixels by back-
projecting from the 720 ×576 (rectangular pixels) image.In this case the cropping is
done to the rectangular pixels,and instead of 9
1
3
square pixels,the number of rect-
angular pixels discarded on either side is 9
1
3
×
54
59
= 8
32
59
.The image is then in effect
stretched out to fill the 768-pixel width.The pixel selection algorithm is
row
rect
= row
sq
(2.2)
col
rect
= 8
32
59
+
￿
col
sq
×
54
59
￿
(2.3)
and linear interpolation is done between the two rectangular pixels either side of the
calculated col
rect
value to give the correct pixel value for the square pixel.This can
Chapter 2.Camera/Video Image Acquisition and Initial Processing 30
be done using a MATLAB script (listing given in Appendix A.1).The image thus
corrected to the standard 4:3 aspect ratio with square pixels is shown in Figure 2.1 (c).
Re-sampling and resolution There will be no significant resolution loss during
this re-sampling operation,as the DV cameras used contain fewer sensors on the inter-
nal CCD array than pixels in the output image,so re-sampling and interpolation has
already taken place inside the camera (see Section 2.5.3).Were this not the case,con-
sideration could be given to first converting the image to a device-independent colour
space that has optimum potential for preserving colour gradients when interpolating,
e.g.CIE L

a

b

(Sangwine and Horne,1998).
It is clear that if a more expensive camera with high resolution were used,re-
sampling and interpolation could become significant problems for subsequent image
processing.However the aim of this research is to investigate ways in which the sub-
sequent processing can be performed despite the problems inherent in using a low-cost
camera.
The assumption of square pixels will be made in a number of different algorithms
used in the overall system.
2.4 Distortion Correction
The method used for calculating the distortion parameters for the camera is based on
that of Helferty et al.(2001) and involves modelling and compensating for radial lens
barrel distortion.Firstly,the mathematical basis for the correction algorithm will be
defined and then a step-by-step description of the experiments and measurements to
derive the distortion parameters will be given.
2.4.1 Pixel correction algorithm
Let (u
￿
,v
￿
) be the co-ordinates of the centre of a pixel in the distorted image.(u
c
￿
,v
c
￿
) is
the centre of distortion,which will be determined experimentally.The position of each
pixel can be expressed in polar co-ordinates (r
￿

￿
) relative to the distortion centre:
r
￿
=
￿
(u
￿
−u
c
￿
)
2
+(v
￿
−v
c
￿
)
2
(2.4)
θ
￿
= tan
−1
v
￿
−v
c
￿
u
￿
−u
c
￿
(2.5)
In the corrected image (with distortion compensated for),the polar co-ordinates
(r,θ) are given relative to the corrected centre (u
c
,v
c
).This is normally the same as
the distortion centre.A simplifying assumption is made that the distortion is strictly
radial.Therefore,the corrected polar angle will remain unchanged,i.e.θ = θ
￿
.A
Chapter 2.Camera/Video Image Acquisition and Initial Processing 31
polynomial approximation to the radial distortion will be used to determine r for given
r
￿
,i.e.
r =
N
￿
n=0
a
n
(r
￿
)
n
(2.6)
The co-efficients a
0
,...,a
N
must be determined experimentally.
Once r has been determined,the corrected location (u,v) of the original pixel can
be calculated as
u = u
c
+r cos θ (2.7)
v = v
c
+r sinθ (2.8)
2.4.2 Experimental determination of distortion parameters
Helferty et al.(2001) presents a mathematical description of a method for determining
the distortion parameters (i.e.distortion centre (u
c
￿
,v
c
￿
) and correction polynomial co-
efficients a
0
,...,a
N
) and a high-level overview of one possible implementation of this
(in Visual C++).Further details of this implementation were unavailable,so as part
of this work the equations from the paper were implemented as MATLAB functions.
Additional MATLAB scripts were created to fully automate the process of calculating
the parameters,given a calibration image.
Description of Algorithm
The main principle of the technique is to capture an image of a regularly-spaced array
of dots and find the co-ordinates within the image frame of all the dots (whose positions
will be distorted).The degree to which the rows and columns of dots are not parallel
can then be used to measure and quantify the lens distortion.The steps involved are:
1.Capture test pattern image and perform feature extraction to find dot centroid
co-ordinates
2.Calculate centre of distortion
3.Calculate optimum parameters to correct the distortion so that the rows and
columns of dots are all parallel to each other
In the rest of this section,the terms rowand column are used to refer to the arrangement
of the large dots in the test pattern.Each of these dots is composed of a number of
pixels.The detailed implementation of the steps will now be described.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 32
Capture of test pattern
The distortion will change with the camera zoom setting (i.e.focal length),as this
changes the lens characteristics (owing to a different part of the lens being used to focus
light onto the CCD,depending on the width of the field of view).However the zoom
level on the cameras used is continuously variable and cannot be set with measurable
repeatability to any intermediate setting.The image will therefore be captured at the
maximum and minimum zoom settings and the disortion characterised at these.
Helferty et al.(2001) use a special test rig,but it was found adequate to use a
laboratory set-up consisting of a metal sheet with a regularly-spaced array of holes
punched in it.These were back-lit to accentuate the pattern,with the light being
diffused through a sheet of white card to improve the uniformity of illumination.A
photograph of this set-up is shown in Figure 2.3.
Figure 2.3:Distortion calibration equipment
Although the camera was mounted on a tripod,its position was not rigidly fixed
with respect to the calibration target (i.e.metal sheet) and it was found necessary to
take care to ensure that the two were positioned parallel to each other.If they were
not parallel then additional trapezoidal distortion of the rows of dots would occur.
Making use of floor markings and the tripod spirit level,together with trial and error
using the camera view-finder,made it possible to achieve the necessary configuration.
Illumination The test pattern image as captured is shown in Figure 2.4 (a).It is
intended to convert this to a binary image (where each pixel either forms part of a dot
or of the background) using a simple thresholding technique.However the greyscale
value of the dots varies across the image,darkening particularly in the corners.Setting
Chapter 2.Camera/Video Image Acquisition and Initial Processing 33
the threshold at a low enough level to extract all the darkened corner dots was found
to be unsatisfactory with regard to extracting the dots in the brighter areas near the
centre of the image.Here,so many pixels were classified as part of dots that the dots
almost merged together with very few pixels being classified as background.
A method that has been used with some success in lighting compensation for out-
door scenes (see more detailed description in Section 6.3.2) was used to ‘flatten’ the
illumination and make it more even across the image.This was done by:
1.Smoothing by convolution with a large kernel,to produce an approximation to
image brightness with detail removed (Figure 2.4 (b))
2.Division of original image by smoothed image to recover uniform brightness (Fig-
ure 2.4 (c))
The process is illustrated for one of the calibration images in Figure 2.4.A variation on
(a) Original image (b) Converted to greyscale and
smoothed with circular filter
(c) Result of dividing (a) ÷ (b)
Figure 2.4:Illumination Flattening Example
this method was also used by Helferty et al.(2001),where image (b) was substracted
from image (c) rather than divided into it.It is not clear if or how this would produce
better results,however the division has proved to be very satisfactory.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 34
Much the same effect could be achieved by calibrating the camera using a uniform
luminance source,e.g.an integrating sphere (McMenemy et al.,2003),to compensate
for the vignetting and flat field effects in the camera lens.These effects cause pixels
in the outer regions of the image to appear darker (McMenemy,2002) and will be the
primary reason for this effect as seen.
The ‘illumination-flattened’ image may now be converted to greyscale (if necessary)
and thresholded to give a binary image where all the pixels that formpart of the dots in
the pattern have a value of 1.The ‘1-valued’ pixels should be grouped into 4-connected
regions (representing dots in the original image).
The centroid of each dot is found by averaging the u and v co-ordinates of all the
pixels that form it.Near the edge of the image there are some incomplete dots (this
is due to the continuous nature of the test pattern used).The centroid of one of these
incomplete dots (as calculated by averaging the visible pixels) would not be the correct
true centroid (i.e.that which would be found if all the pixels forming the dot were
visible).These incorrect centroid co-ordinates would result in errors in the calculation
of the distortion parameters if propagated further.Instead a rule was added to discard
dots that contained fewer than 0.8 or more than 1.2 times the median number of pixels
per dot.These values were found by trial and error (over a range of test pattern images)
to discard all incomplete dots while retaining most complete ones.The median was
used rather than the mean,as it was found that the latter was sometimes distorted
to an artificially large value by dots in the outermost rows merging together to form a
single very large dot.
At this stage the dot centroids can be plotted overlaid on the original image,to give
a visual check that they have been detected correctly.This is illustrated in Figure 2.5.
Preparation of test data
The distorted locations of all the dots are nowknown but,particularly if the distortion is
severe,it is not trivial to determine which of the original undistorted rows and columns
they belong to.For example,if there is a lot of curvature then the outermost dots in the
first row may have a larger v co-ordinate than the centre dots in the second row.The
following algorithm was developed as part of this work to reliably and automatically
perform this row/column sorting:
1.Assign the first centroid encountered to row 1,column 1.
2.Searching first in one direction along the row and then the other,find the hor-
izontally closest centroid that has not yet been assigned a (row,column) value
and is within a certain perpendicular vertical distance.Increment the column
number by 1 and assign a (row,column) value pair to this centroid.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 35
Figure 2.5:Calculated dot centroids (X) in distortion grid pattern
Note that within each row,the centroids will not be sorted in any particular
order,but this could be done if required.
Also note that the co-ordinates of every centroid are checked at each iteration
through this stage.This ensures that no point will be missed and as there is never
a particuarly large number of centroids (typically around 1000) the performance
penalty is not significant.
The magnitude of the vertical distance threshold is the determinant of whether
centroids are in the same row (a value of 4 pixels was used with success for all
the test images encountered).
3.If no more centroids can be found that satisfy the criteria for the current row,
increment the row number by 1 and reset the column to 1.Assign these (row,
column) values to the next centroid (which will be in a new row),and repeat step
2 for other centroids in this new row.
4.Repeat steps 2–3 until all the centroids have been assigned (row,column) values.
5.Repeat steps 2–4,searching along columns rather than rows,to create the second
matrix.
The distortion in the vertical direction (by analysing the deviation of the rows frompar-
allel) and horizontal direction (by analysing the deviation of the columns from parallel)
will be calculated separately.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 36
Calculation of centre of distortion
The centre of distortion (COD) will be found by analysing the curvature of the distorted
rows of dots.These form arcs centered on the COD.A quadratic equation is fitted to
each row of dot centroids and its curvature calculated.A quadratic equation has only
one turning point and is thus appropriate for modelling the arc formed by the row of
dots.The maximum absolute curvature and the point at which it occurs is found for
each line,and repeated until the point at which this value changes sign is found.The
COD will lie between the closest two lines of opposite sign.
The process is repeated for each column to give two more lines.The COD is now
known to lie within the region bounded by these four lines.The maximum absolute
curvature is known for each line;bilinear interpolation (see Section 2.4.3) is used to
find the (u,v) co-ordinates where the maximum curvature would be zero.This point
corresponds approximately to the centre of distortion.All lines that pass through this
point will remain straight and undistorted,hence the curvature of zero.
Figure 2.6 illustrates this step.The binary (thresholded) calibration image is shown
Figure 2.6:Calculation of centre of distortion from curvature of quadratics fitted to
rows and columns of dots
in the background.The centroids of all the detected dots are plotted using blue crosses.
The fitted quadratics are shown in red (rows) and green (columns) and the found COD
is marked by a red circle.It can be seen that no further quadratics were determined or
plotted after it was determined that the curvature had changed sign,as this provided
enough information to locate the COD.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 37
Calculation of radial distortion parameters
Helferty et al.(2001) describe mathematically an algorithm for determining the opti-
mal parameters a
0
,...,a
N
by numerical optimisation,to correct the rows of dots so
they are straight and parallel.The algorithm involves many operations along all the
dots in each row or all the columns in the image;it was found that these can be con-
veniently represented using matrix notation and element-wise operations in MATLAB.
Appendix A.2 contains full listings of the scripts used and the method is also described
non-mathematically below.
To allow element-wise operation on a matrix of dot centroid co-ordinates requires
all the rows and columns to have the same number of dots.To achieve this the shortest
row or column will be chosen as the common length and extra dots in any longer ones
discarded.The main problem with this is that at the edges of the image there may be
short rows or columns caused by not all of the dots being visible.These need to be
removed so that the others are not curtailed to this length.This was accomplished by
implementing a rule whereby rows and columns that contain less than 0.9 times the
number of dots in the longest row or column are discarded.Again,the value of 0.9
was chosen after trial and error to achieve a situation where the majority of the dot
centroids were included with only the incomplete rows being discarded.
The parameters were calculated using constrained non-linear optimisation in MAT-
LAB (fmincon command).The following steps were performed at each iteration:
• Using an initial approximation to the distortion correction polynomial a
0
,...,a
N
and the distortion centre (u
c
￿
,v
c
￿
) already calculated,calculate the corrected lo-
cation of all the dot centroids using equations 2.4 to 2.8.
• Derive a best-fit straight line for each corrected row of dots,forcing the gradient
to be the same for every row.
• Calculate the error (i.e.the degree to which the rows of dots are not parallel) as
the mean of the magnitudes of the deviations of all the dots in each row from the
corresponding best-fit line.
For further details of the algorithm used see the MATLAB scripts in Appendix A.2 or
equations 3 to 5 in Helferty et al.(2001).
The optimisation will converge when all the rows are parallel.Then,using the
matrix of dot centroids that was ordered by ascending columns,and transposing the
u and v-axes,the process is repeated to find the correction polynomial that makes all
the columns parallel.
In theory the two results should be the same,but in practice there was always found
to be a small difference.In the second pass,when correcting the columns,the error
at the final iteration was consistently found to be smaller than when correcting the
Chapter 2.Camera/Video Image Acquisition and Initial Processing 38
rows.A possible reason for this is that the physical dimensions are less in the vertical
direction,and there is less distortion apparent simply because the distance from the
COD is shorter.
The average of the vertical and horizontal correction polynomials was taken as the
final correction polynomial.
Listings of MATLAB scripts for performing all the procedures in this section are
given in Appendix A.2.
2.4.3 Practical application of distortion correction
Equations 2.7 and 2.8 in Section 2.4.1 can be applied to any pixel in the original
distorted image to find its correct undistorted location.However the resulting array
of pixels will no longer form a regularly-spaced grid (Figure 2.7) and re-sampling of
Figure 2.7:Forward mapping of pixels for distortion correction:original distorted
locations (blue),corrected locations (red).N.B.This plot is for the Canon MV500i
camera,which exhibited less severe distortion than the Panasonic NVDS series cameras
used for most of this work.
the pixel values (i.e.representing the same image information in a differently sized or
shaped regular grid of pixels) is necessary to create an undistorted image.
Methods of creating an undistorted image considered here include:
1.Forward Mapping For each (row,column) pixel position in the original dis-
torted image,calculate the corresponding (u,v) point in a new image (using the
distortion model,Section 2.4.1) and map the original pixel value to the pixel in
the new image that has its centre nearest the calculated point.If two or more
distorted pixels map to the same new pixel it will be overwritten with the most
recent value,and there may be gaps in the new image that no pixels map on to
(Figure 2.8 (b)).
Chapter 2.Camera/Video Image Acquisition and Initial Processing 39
2.Reverse Mapping For each (row,column) pixel position in the new image,
calculate a corresponding (u,v) point in the distorted image (using the distortion
model in reverse) and map the nearest neighbour pixel value to this point (i.e.
value of the pixel whose centre is closest to the point) into the (row,column)
location in the new image (Figure 2.8 (c)).
The first method is primarily of academic interest as the undistorted image it pro-
duces is not continuous,i.e.has ‘null’ pixels with no value.The undistorted images
produced by the second method are of more practical use.While retaining the property
that there are no pixel values in the undistorted image that did not occur in the original
image,nevertheless the pixel centroid that these values are attached to may no longer
accurately correspond with their position in the original scene.
This is because simple re-sampling techniques,such as nearest neighbour interpo-
lation,involve the use of rounding to determine re-sampled pixel locations,which can
introduce small errors in the location of pixel values in the re-sampled image.This
can result in,among other effects,narrow lines that cross the image raster grid at a
shallow angle becoming disjointed (see Figure 2.8 (c)).These types of effects may cause
errors when segmentation and edge detection are being used to precisely determine the
positions of objects in a scene.
A more sophisticated interpolation method is bi-linear interpolation,which means
the value used is a weighted average of the four pixels surrounding the (u,v) point
in the distorted image,rather than just the value of the nearest pixel the point falls
within.The value will thus be positioned accurately with regard to the pixel centroid
(Figure 2.8 (d)).
The bi-linear interpolation algorithm was taken from that used in the GRASS
s.sample program (McCauley,1993) and uses three stages of interpolation as follows.
Let (u,v) be the co-ordinates of the point in the original image that is to be resam-
pled (these will have already been calculated using the distortion model).(row,col) is
the pixel whose centroid falls to the immediate upper left of this point,i.e.
row = floor (v)
col = floor (u)
u
off
and v
off
are the horizontal and vertical offsets respectively of the point from the
centroid of the upper left pixel,i.e.
u
off
= u −row
v
off
= v −col
Chapter 2.Camera/Video Image Acquisition and Initial Processing 40
row and col are integer values and are used as indices into the image matrix,to ex-
tract the pixel values φ(row,col).u
off
and v
off
are continuous values and are used as
multiplying factors to proportion the pixel values correctly in the weighted average.
The first two stages of interpolation are between the two pixels above and two below
the point,to calculate two intermediate interpolated values as follows:
p1 = u
off
φ(row,col +1) +(1 −u
off
)φ(row,col) (2.9)
p2 = u
off
φ(row +1,col +1) +(1 −u
off
)φ(row +1,col) (2.10)
p1 and p2 are the interpolated values directly above and below the point respectively;
the third stage is to interpolate between them to give a final value
p = (1 −v
off
)p1 +v
off
p2 (2.11)
2.4.4 Comparison of results
Visual Interpretation of Differences
Figure 2.8 contains images of approximately the same region (a field boundary with a
hedge containing small trees) in both (a) the original undistorted image and (b)–(d)
with the application of the three distortion correction techniques discussed (Method 1,
Method 2 and Method 2 with bi-linear interpolation,respectively).
Image (a) shows a continuous dark line at a shallow angle along the bottom of
the main hedge.In the corrected images (b) and (c),which use nearest neighbour
re-sampling the line appears disjointed,which indicates that the pixel values (i.e.the
dark colour) are not all in the correct positions relative to each other.In corrected
image (d) the line no longer appears disjointed,although all the features are still in
their correct undistorted positions.
However,somewhat apparent from the image also is the slightly reduced contrast
and increased smoothing in the interpolated image.It cannot be estimated at present
how much of a drawback this may be.Very fine detail will be blurred and may be made
less obvious,however with the nearest neighbour approach very fine detail less than a
pixel in width has the potential to be overwritten and disappear;this will not happen
with bi-linear interpolation.
Quantitative Results relevant to Edge Detection
Edge detection based on colour segmentation was carried out on images corrected using
both nearest neighbour re-sampling and bi-linear interpolation.A difference image of
the two edge images is shown in Figure 2.9,where edges that are present in the nearest
neighbour image but not in the bi-linear are shown in black,and edges that are present
Chapter 2.Camera/Video Image Acquisition and Initial Processing 41
(a) Original Distorted (b) Forward Mapping with
Image Nearest Neighbour Re-sampling
(c) Reverse Mapping with (d) Reverse Mapping with
Nearest Neighbour Re-sampling Bi-linear Interpolation
Figure 2.8:Simple Comparison of Re-sampling Methods
Figure 2.9:Edges only in bi-linear image (white),only in nearest neighbour image
(black) and unchanged or no edge (grey)
Chapter 2.Camera/Video Image Acquisition and Initial Processing 42
in the bi-linear image but not in the nearest neighbour are shown in white.All other
parts of the image including unchanged edges and no edges are shown as grey.
On close inspection it can be seen that in many regions the edge has moved by one
or two pixels in the bi-linear image.As the bi-linear image represents the object edges
in their correct locations,this can be taken to be a truer approximation to the correct
location of the edge.
However there are also some regions where there is only a black edge,indicating
that that edge has not been detected in the bi-linear image.It is suggested that this is
as a result of the smoothing,contrast-blurring effect of the interpolation,as mentioned
earlier.It may however be possible to be overcome this problem,and this raises issues
regarding the approach used by edge detection algorithms (a Sobel detector is used in
this example).
If an edge actually occurs in the middle of a pixel,it will be spread out over three
pixels and be harder to detect than if it occurs at the boundary between two pixels,when
it will only be spread over two pixels.It is possible that in correcting the pixel locations
the bi-linear interpolation algorithmmay ‘spread out’ an edge that was contained within
two pixels under the nearest neighbour re-sampling.
Increasing,e.g.doubling,the resolution of the image before re-sampling would en-
able these edges to be restored to a position closer to a pixel boundary and enable easier
detection.However if the image was to be down-sampled to its original resolution,care
would have to be taken to choose an appropriate re-sampling algorithm and data loss
may still take place.
2.4.5 Application of distortion correction—conclusions
Using bi-linear interpolation as opposed to nearest neighbour re-sampling for distortion
correction removes the occurence of anomalous pixel artifacts such as disjointed lines
and restores features to their correct relative positions.It also has a smoothing effect
which can reduce edge contrast in some regions and results in mixing pixel values to
produce colour values that were not present in the original camera image.However
as discussed earlier (Re-sampling and resolution;Section 2.3) this argument is not
particularly relevant as re-sampling and interpolation has already taken place inside
the camera.
Overall,it is a slight improvement on the nearest neighbour algorithm,but this
depends on the application;in particular for extremely accurate colour measurement
using knowledge of the camera spectral response,bi-linear interpolation may not be
suitable.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 43
2.4.6 Look-up table implementation of reverse mapping and bi-linear
interpolation
Finding distortion correction parameters involves many calculations.However these
only need to be performed once and the results may be stored in a look-up table (LUT)
for later use when images are actually being corrected.This will avoid the need to
perform a lot of calculations at the correction stage,thus making the procedure much
faster.
This procedure may be followed to generate the LUT:
1.For each (row,column) pixel location in a 576 × 768 matrix (i.e.size of the
corrected image) calculate the radial distance r from the centre of distortion to
the pixel centre and the angle θ that this radius makes with the horizontal.
2.Solve the distortion correction polynomial (equation 2.6) at each point to find the
equivalent distorted radial distance r
￿
.
3.Use equations 2.7 and 2.8 (with distorted parameter values;note θ
￿
= θ) to find
the (u
￿
,v
￿
) point co-ordinates in the distorted image that correspond to the centre
of each corrected pixel.
4.Convert the square pixel co-ordinates to rectangular pixel DV image co-ordinates
(i.e.the format the distorted image is obtained in) using the relationships given
in equations 2.2 and 2.3 (replace row by v and column by u).
5.Dimension the LUT as a 576 ×768 ×12 matrix.
6.Using the bi-linear interpolation equations (2.9,2.10 and 2.11),populate the 12
values for each pixel with four groups of 3:(i) row number (integer);(ii) column
number (integer);(iii) weighting (floating point number).The four sets of values
correspond to the four pixels surrounding the interpolation point.If it is near the
edge or off the edge of the image,one or more can be given a weighting of 0.
Thus,for each pixel in the corrected image,the actual correction function needs
only to look up the 4 sets of row,column and weighting values that correspond to its
row and column (the “look-up” values).The input distorted image is then sampled at
these 4 locations.The pixel values thus obtained are multiplied by the weighting factor
and then summed to arrive at the corrected interpolated pixel value.
MATLAB scripts for performing all these operations are provided in Appendix A.2
and A.3.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 44
2.5 Pixel Values,Illuminance and Colour
With the lens distortion and pixel aspect having been corrected,at this stage the
meaning of the information (pixel values) associated with each pixel must be consid-
ered.Each pixel contains a numerical representation of the illuminance sensed by the
corresponding region of the CCD array (see Section 2.5.3 for a discussion of image
resolution).This consists of a 24-bit number,i.e.three 8-bit numbers,each ranging in
value from 0 to 255.
The three separate illuminance values for each pixel are arrived at by filtering
the light reaching the CCD cells through one of three filters,which conventionally
correspond to the red,green and blue components of the light (see Section 2.5.2 for a
discussion of the colour representation used by the camera).
2.5.1 Illuminance calibration
The absolute values obtained from the filtered CCD cells are scaled internally to the
camera so that peak white corresponds to values of (255,255,255).This scaling factor
is an internal characteristic of the camera and is fixed.(This is different from the
arrangement for an analogue camera where the scaling would be a characteristic of the
framegrabber or video capture device).The scaling factor is linear until the intensity
reaches a value of 250;above that the relationship between incident light and absolute
pixel value is non-linear (McMenemy,2002).The sensitivity of the camera to light
may be varied (e.g.for viewing very brightly or dimly-illuminated scenes) by restricting
the amount of light reaching the CCD,either by reducing the aperture (f-number) or
increasing the shutter speed.
Calibration may be carried out (using a light meter) to relate the pixel values to
actual measured light levels for a given aperture and shutter speed (McMenemy et al.,
2003).This will be important if,during change detection,it is required to use the
camera images for absolute measurements for comparison with other data.Also,data
extracted fromthe camera images may in future need to be used as a reference base-line
for further measurements using different instruments (e.g.spectrophotometers).
For accurate measurements the response of the camera must also be measured and
compensated for under these conditions:
1.Dark Image Calibration (when no light enters the camera)
2.Uniform Intensity (flat field) Image Calibration (using an integrating sphere)
McMenemy et al.(2003) describe the calibration experiments in detail.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 45
2.5.2 Colour measurement
A 3-dimensional colour space is a fairly arbitrary and information-poor method of
describing the continuous frequency response of reflected radiation,however it is based
on the human eye and is in common industrial use.The RGB values obtained from the
image are absolute values and contain information on illuminance as well as colour (as
described above).To separate the colour information fromthe luminance it is necessary
to normalise the measured values.This process together with further use of the various
colour spaces to identify objects has been discussed in Section 1.5.5;it only remains
here to introduce the colour space for the DV camera used.
As the video camera conforms to the PAL DV standard it will use the RGB colour
space defined by the European Broadcasting Union (EBU) (Sproson,1983).The defini-
tion of this colour space allows the 3-parameter colour values to be easily transformed to
other colour spaces (by matrix multiplication) if comparison to other measured colours
is required (see Section 1.5.5).
Figure 2.10 indicates the general relationship between the RGB components of
the video signal and the spectral reflectance characteristics these represent.This is
determined by coloured filters in front of the CCD cells.The response curves are given
relative to the standard white illuminant for the PAL TV system(D
65
) (Sproson,1983)
and the ‘Chromaticity Co-ordinate’ scale on the y-axis reflects this.
Figure 2.10:European Broadcasting Union Standard RGB Colour Filters
2.5.3 Image resolution
It is important to note that there is no direct relationship between pixels in the image
and cells in the CCD array.If the image resolution was to truly relate to the smallest
discernible object in the image,then there would be 720 × 576 = 414720 RGB cells
in the CCD array.In fact as each pixel has three colour values,which can only be
Chapter 2.Camera/Video Image Acquisition and Initial Processing 46
physically measured by having three CCD cells each with a different filter in front of
them,there should be 414720×3 = 1244160,i.e.1.2 million CCD cells for the resolution
of the DV image to be equal to the true physical resolution of the image.
Most commercially available DV cameras currently have many fewer cells than this
optimum value,normally just a few hundred thousand (although over the past year
commercial cameras with the theoretically ‘correct’ number of CCD cells are starting
to become available as an affordable consumer product).Accurate cell counts are not
widely published and hard to obtain for commercial video cameras,but the figure is
approximately 800,000 for the Pansonic NV-DS38B.For optimumperformance the light
should be split between three arrays of CCD cells,with a red,green or blue filter in
front of each CCD.In practice one CCD array is used,and the various cells are filtered
to respond to either red,green or blue light in a fairly complex pattern (McMenemy,
2002).The values are then interpolated to fill the 720 ×576 image grid.
No quantitative experiments have been carried out during this work to measure the
effect of this sub-optimal number of CCD cells,but the general quality and blurred
features in most DV images support the theory.
2.6 Photogrammetric Parameters
Detailed interior orientation parameters are not available for a consumer-grade camera;
in this work it is being used beyond its designed purpose.The manufacturer specifies
the minimum focal length as 2.9 mm and the sensor size as 1/4-inch,which corresponds
to the industry standard charge-coupled device array (CCD) dimensions of 3.2 mm×
2.4 mm.
In an attempt to validate these measurements,the CCD dimensions were taken
as correct and focal length calculated using a weak perspective model.This involves
capturing images of targets of known size,positioned at known distances from the
camera (McMenemy,2002).It was found that the calculated focal length (3.45 mm)
did indeed differ from that provided by the manufacturer.With no further details
about camera interior parameters available the new calculated value was assumed to
be correct and was used with success in later measurements.
However it is possible that the CCD dimensions may not conform to the industry
standard (no information is available in this regard) and the adjusted focal length
compensates for this.It is believed that as long as the ratio of the two parameters
is correct,later measurements will be unaffected.McMenemy (2002) discusses this in
more detail.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 47
2.7 Old Photographs
Old photographs will be transferred to digital format by use of a scanner.The interior
orientation parameters for the camera used will normally be unknown,and techniques
from the field of repeat photography (Strausz,2001) will be useful here for identifying
landmarks and measuring parallax errors in order to align the images correctly with
respect to the world co-ordinate reference system.
Possible sources of old photographs have been considered and investigated.The
Cambridge University aerial photograph collection (Cambridge University,2004) is
extensive and there are oblique aerial photographs of scenes in all areas of Ireland
taken between 1951–5 and 1963–73.Loan copies of some of this collection are available
at the Monuments and Buildings Record,Hill St.,Belfast.
Cambridge University (2004) states that approximately half the collection are oblique
photographs and half vertical.Normally vertical are considered the most useful,but by
appropriate use of camera modelling it should be possible to make use of the oblique
images through perspective transformation.
Other types of images that might be useful are most likely to be with other govern-
ment departments,e.g.photographs taken as general references of landscape,afforesta-
tion etc.Several extensive collections of scenes taken by amateur photographers also
exist (Anderson,1991).
One such collection of amateur photographs are those taken by W A Greene in
the Lecale area of south Co.Down;some of these are published in a book by Magee
(1991).Figure 2.11 shows a sample scanned image from the St John’s Point test region
together with two recently acquired digital video images from the same area (more
data from this area will be used in later chapters).This particular old photograph is
of relatively good quality.The row of cottages and layout of the road and sea defences
can be identified in both the modern and old images and there is obviously scope for
using ground control points to help align the old photograph with the new calibrated
images.This is a subject for future work.
2.8 Conclusions
Initial processing techniques relevant to use of a DV camera for calibrated photogram-
metric measurement were presented.Testing was carried out to select a reliable solution
for image capture.The detailed pixel-level specification of the Digital Video format
was researched to ensure image scenes were being viewed on-screen and processed in
their correct geometric proportions.The image geometry was found to be quite dif-
ferent from that assumed in previous work (Kelly and Dodds,2002),confirming the
importance of this step.
Chapter 2.Camera/Video Image Acquisition and Initial Processing 48
Figure 2.11:Sample old photograph at Killough,near St.John’s Point test area and
two DV images (22 January 2003) from the same area
Chapter 2.Camera/Video Image Acquisition and Initial Processing 49
Appropriate pre-processing methods for correcting the non-square pixel aspect ratio
and scan line oversampling (which are inherent in DV images) were developed and used.
The advantages of geographical information systems compared to conventional image
processing systems in processing DV images were presented and discussed.
An existing method for measurement of camera lens distortion was adapted and im-
proved for use with the DV image systempresented here.It was implemented using im-
age processing functions in MATLAB.This increased the accessibility and re-usability
of the method for others,rather than having the algorithmincorporated in a standalone
piece of software.Use was made of MATLAB matrix processing functionality to further
automate and improve the efficiency of the method.
The practical application of distortion correction in creating an undistorted image
was researched.Various re-sampling algorithms were compared and the conclusion
drawn that pixel reverse mapping using bi-linear interpolation was the most appropriate
method.The chosen method was implemented in a look-up table for fast application
to multiple images.
The meaning of image pixel values with respect to illuminance,colour and image
resolution was researched and relevant conclusions drawn in these three areas:these
are important limiting factors in relation to the analysis and measurement capabilities
to which the DV camera system described here can be put.
The relevant work on camera interior orientation and integration of ‘old’ pho-
tographs taken with an unknown camera was referenced.
This chapter has presented and discussed all the initial processing steps necessary
to use acquired image data in an enviromental analysis system.After application of
the initial processing techniques described in this chapter,all of the images will have
been corrected to conform to the pinhole camera model and any distorion caused by
the camera used will have been removed.Chapter 3 will analogously discuss the pre-
processing steps necessary to use various sources of geographical data in the same
system.