Information from Images of Transparent Objects

wastecypriotInternet and Web Development

Nov 10, 2013 (4 years and 8 months ago)


Information from Images of Transparent Objects

Sam Hasinoff

Department of Computer Science, University of Toronto

Toronto, Ontario, Canada

M5S 1A4

1 Introduction

Many traditional computer vision techniques are couched in a series

of restrictive assumptions:
the objects of interest are opaque, distinctively textured, and approximately Lambertian; their
motion is well approximated by affine transformations; the cameras are well
calibrated; viewing
conditions are clear. These assumpt
ions are typically met by using synthetic data or by operating
in carefully crafted industrial and laboratory settings. We would like to see these assumptions
relaxed and computer vision systems deployed in increasingly real
world applications.

For this t
o happen, computer vision techniques must be made more robust and extended to
broader classes of objects. This paper focuses on just one of these assumptions, namely that the
objects of interest are opaque. In particular, we are interested in performing re
construction of
dimensional shape from images of scenes containing transparent objects.

Previous work dealing with transparent objects is somewhat disparate and preliminary.
Nevertheless, we will try to integrate existing research into a coherent w
hole. We are motivated
to see how can information be extracted from images of scenes containing transparent objects.

2 Perception of Transparency

Transparency arises in everyday life from a number of different physical phenomena. These
include soft shado
ws, dark filters in sunglasses, silk curtains, city smog, and sufficiently thin
smoke. When we refer to transparency in a perceptual context, we usually mean that the
transparent medium itself is at least partially visible. This disqualifies from our analy
sis the air
on a perfectly clear day or a well
polished window without reflections.

2.1 Loose constraints

The perception of transparency is only loosely constrained by the laws of optics. In fact, figural
unity is perhaps just as important a factor in th
e perception of transparency as the relationship
between the brightnesses of different image regions [5,18]. If there is an abrupt change of shape
at the border between media, the perception of transparency can break down, even if
transparency is actually
present (Figure 1b). There are other circumstances where transparency
actually exists but is not perceived. For example, a square filter placed over a plain background
will normally be perceived as a painted patch, presumably because this is the cognitivel
y simpler
explanation (Figure 1c).

. The regular perception of transparency is illustrated in (a). Most subjects will report seeing a small
transparent square above a large dark square. However, the pe
rception of transparency can be disrupted, as
shown in (b), by a sudden change of shape. Transparency will not be perceived in (c) either. Even if the
smaller square is in fact transparent, the scene can be explained more simply without transparency.

yama also demonstrated experimentally that transparency is fundamentally achromatic in
nature [21]. Combinations of colour which are unlikely to arise in real
world scenes still give the
perception of transparency.

Previous theories of visual perception h
ave described cognitive models in which more
complicated images can be described economically using primitive images and combination
rules. In particular, Adelson and Pentland formulated an explicit cost model for performing such
a decomposition, as a conc
rete illustration of this idea [1]. Using this formulation, the most
plausible interpretations of a scene are the cheapest interpretations (in terms of shape, lighting,
and reflectance) that are consistent with the image. It seems at first glance that tran
would fit nicely into this cost
based framework, however Adelson and Anandan argue that this is
not in fact the case [2]. According to them, transparency is essentially pre
physical and heuristic
in nature, and not part of a full intrinsic image a

2.2 The importance of X

Metelli was the first to analyze constraints on the perception of transparency with layers of
transparent two
dimensional shapes [20]. In his model, each layer can attenuate the luminance
beneath it by a factor
, 0 <

1, and emit its own luminance

0. Thus, the luminance at

is given by the relation I



The constraints imposed by this model have been recently examined at so
called X
which are places in an image where fo
ur different regions are created from the intersection of
two lines. X
junctions have been shown to be especially important in establishing local
constraints for scene interpretation. These local constraints propagate quickly to constrain the
n of the entire scene.

It has been proposed that the human visual system employs heuristics based on X
junctions to
categorize different types of transparency [2]. X
junctions can be classified into three groups
based on the ordinal relationship between
the brightnesses of the regions in the horizontal and
vertical directions. The non
reversing (Figure 2a) and single
reversing (Figure 2b) cases are both
readily perceived as transparent, but in the non
reversing case the depth ordering is ambiguous.
On the

other hand, the double
reversing case (Figure 2c) is not perceived as transparent.
Satisfyingly enough, a mathematical analysis of Metelli’s rule at X
junctions leads to constraints
which justify heuristics based on the degree of reversingness.

. We classify X
junctions into three different groups based on whether sign is preserved in the
horizontal and vertical directions. Transparency is perceived in both the non
reversing (a) and single
ersing (b) case, but the non
reversing case has two plausible interpretations for the depth ordering. The

case (c) is not perceived as transparent.

Note that transparency has also been shown to be eagerly perceived by (untrained) observer
even in a number of physically impossible situations, as when the ordinal relationships required
at X
junctions are violated [5]. However, this effect may be partially due to demand
characteristics in the experimental design.

2.3 The perception of tran
sparency in 3D

It has been demonstrated experimentally to be difficult to make judgments about 3D transparent
shapes. One technique suggested for improving the visualization of 3D transparent shapes
involves overlaying strokes at points on the surface of
the shape, where the orientation and size
of the strokes are chosen to correspond to the direction and magnitude of principal curvature
[18]. This texturing is shown to improve performance in judging relative distances for a medical
visualization task. Int
erestingly, these results also suggest that reconstruction of a transparent 3D
shape might not have to be as accurate as that of a textured opaque object, so long as its contour
and other salient features are preserved.

3 Computerized Tomography

A large
field of research involving the imaging of transparent objects is computerized
tomography (CT), which can be defined as the reconstruction of sectional slices of an object
from image measurements taken from different orientations [17]. The applications of
CT are far
reaching and diverse, including medical imaging, airport security, and non
destructive testing in

In standard computer vision, we only consider images formed using visible light, and most
models of transparency assume a discrete
number of object layers with associated uniform
transparencies. The volumetric models which incorporate transparency are an uncommon and
notable exception. CT systems, by contrast, involve images taken using high frequency
electromagnetic radiation capable

of penetrating objects opaque to the naked eye. In this way,
CT image intensities (suitably transformed to account for attenuation) can be interpreted as
proportional to masses along the imaging rays between the source of the radiation and the

The tomographic imaging process can be described mathematically by a parallel projection
known as the Radon transform (Figure 3). The mass density
) over a 2D slice of the object
is projected in many different viewing directions,

, giving rise to 1D

images parameterized by
, as follows:




) is the Dirac delta function.

. The geometry of the Radon transform.

To give a concrete example, X
ray CT has found wide use in medical applicat
ions due to its
penetrating power through human tissue and its high contrast. Using tomographic techniques, the
internal structure of the human body can be reliably imaged in three dimensions for diagnostic

There are, however, a few limitations

to the technique. Good reconstructions require a great deal
of data (meaning many rays from many directions), so a large detector typically need be spun
completely about the patient for full coverage. Quite aside from concerns of efficiency, human
e to X
rays should be limited for health reasons. Moreover, because the reconstruction is
typically very sensitive to noise, the patient is instructed to remain immobile throughout the
procedure. This may be especially difficult for those injured patients
for whom the CT scan is
most valuable. Finally, any objects embedded in the body that are opaque to X
rays (for
example, lead shrapnel) can cause significant shadowing artefacts in the reconstruction.

3.1 Filtered backprojection

The Fourier slice theorem

gives a simple relationship (in the Fourier domain) between the object
and its projections. Specifically, the Fourier transform of a 1D projection can be shown to be
equivalent to a slice of the 2D Fourier transform of the original object in a direction
erpendicular to the direction of projection [17].

In the case of continuous images and unlimited views, the Fourier slice theorem can be applied
directly to obtain a perfect reconstruction. Each projection can be backprojected onto the 2D
Fourier domain b
y means of the Fourier slice theorem, and the original object can then be
recovered by simply taking the inverse 2D Fourier transform.

In real applications, the results are less ideal. This is because only a discrete number of samples
are ever available
but also because backprojection tends to be very sensitive to noise. The
sensitivity to noise is due to the fact that the Radon transform is a smoothing transformation, so
taking its inverse will have the effect of amplifying noise. To partially remedy thi
s problem, the
CT images are usually filtered (basically using a high
pass filter) before undertaking
backprojection. These two steps, filtering and backprojection, are the essence of the filtered
backprojection (FBP) algorithm which has dominated CT recon
struction algorithms for the past
thirty years. In practice, FBP produces very high quality results, but many views (hundreds) are
required and the method is still rather sensitive to noise.

FBP has even been extended in an
ad hoc

way to visible light ima
ges of (opaque) objects [12].
This technique mistreats occlusion, but for mostly convex Lambertian surfaces, it provides a
simple method for obtaining high
resolution 3D reconstructions.

Wavelets have also been used to extend FBP for CT reconstruction. Th
e idea is to apply the
wavelet transform at the level of the 1D projections, which in turn induces a multiscale
decomposition of the 2D object. For a fixed level of detail, this method is equivalent to FBP, but
has the advantage of obtaining multiresolutio
n information with little additional cost. More
importantly, the wavelet method also provides a better framework for coping with noise. Bhatia
and Karl suggest an efficient approach to estimating the maximum
a posteriori

reconstruction using wavelets
, in contrast to other more computationally intensive regularization
approaches of doing this [6].

3.2 Algebraic methods

An alternative method for CT involves reformulating the problem in an algebraic framework. If
we consider the object as being compose
d of a grid of unknown mass densities, then each ray in
for which we record an image intensity will impose a different algebraic constraint.
Reconstruction is then reduced to the conceptually simple task of finding the object which best
fits the projection

data. This best fit will typically be found through some kind of iterative
optimization technique, perhaps with additional regularization constraints to reduce non
smoothness artefacts. While algebraic methods are slow and lack the accuracy of FBP, they a
the only viable alternative for handling very noisy or sparse data. Algebraic methods have also
been proposed for handling cases like curved rays which are difficult to model using insights
from Fourier theory.

The first method developed using the alge
braic framework was the algebraic reconstruction
technique (ART) [17]. Starting from some initial guess, this method applies each of the
individual constraints in turn. The difference between the measured and the computed sum along
a given ray is then used

to update the solution by a straightforward reprojection.

Unfortunately, the results obtained using basic ART are rather poor. The reconstruction is
plagued with salt
pepper noise and the convergence is unacceptably slow. Results can be
improved som
ewhat by introducing weighting coefficients to do bilinear interpolation of the
mass density pixels, and adding a relaxation parameter (as in simulated annealing) at the further
expense of convergence. Improvements have also been demonstrated by ordering t
he constraints
so that the angle between successive rays is large, and by modifying the correction terms using
some heuristic to emphasize the central portions of the object (for example, adding a longitudinal
Hamming window).

Mild variations on ART, incl
uding the simultaneous iterative reconstruction technique (SIRT)
and the simultaneous algebraic reconstruction technique (SART), differ only in how corrections
from the various constraints are bundled and applied [3,16]. In SIRT, all constraints are
ered before the solution is updated with the average of their corrections. SART can be
understood as a middle ground between ART and SIRT. It involves applying a bundle of
constraints from one view at a time.

3.3 Statistical methods

Another group of iter
ative techniques built around the same algebraic framework are more
statistical in nature. The expectation maximization (EM) algorithm is one example of such a
technique. Statistical methods seek explicitly to maximize the maximum likelihood (ML)
ction which most closely matches the data, however properties expected in the original
object, such as smoothness, may be lost. Bayesian methods have also been proposed, in which
prior knowledge can be incorporated into the reconstruction [13].

To improve

the quality of reconstruction, penalty functions are often introduced to discourage
local irregularity. This regularization, however, comes at the cost of losing resolution in the final
image. The overall cost function is typically designed to be quadrati
c, to permit the application
of gradient methods. Gradient ascent and conjugate gradient methods have been suggested, as
well as other finite grid methods similar to Gauss
Seidel iteration [24].

4 Volumetric Reconstruction with Transparency

The voxel col
ouring algorithm of Seitz and Dyer was the first method for performing volumetric
reconstruction from images (of opaque objects) that properly accounted for visibility [25].
Kutulakos and Seitz later extended this method to obtain the space carving algorit
hm [19]. Both
algorithms operate on the principle that cameras must agree on the colour of an (opaque) voxel,
though only if that voxel is visible from all of those cameras. In this framework, voxels are
labelled as either completely transparent or complet
ely opaque. Note that even for opaque
objects, a perfect voxel reconstruction would require transparency (partial occupancy)
information to be computed for the voxels at the boundaries of the objects.

Several recent volumetric reconstruction techniques ha
ve attempted explicitly to recover
transparencies along with voxel colouring information. In this model, observed pixel intensity is
a weighted combination of voxel colours along the ray, where the weights are a function of the
voxel transparencies. None o
f these methods, however, attempt to recover the scene illumination
or estimate surface reflectance. As a result, complicated lighting phenomena such as specular
highlights and inter
reflections are ignored completely. In other words, the reconstructed sha
pe is
in some sense derived from shading.

4.1 Volumetric backprojection

et al.

report one such reconstruction method they call volumetric backprojection [9].
The method is a straightforward extension of SART, made efficient with hardware acceler
and a parallel architecture for performing backprojection on a slice
slice basis. If each image
is available with both black and white backgrounds, a simple matting technique (see Section 6.1)
can be applied to extract the cumulative effect of tra
nsparency on each pixel. Then, correction
terms for both voxel colours as well as voxel transparencies can be backprojected onto the
reconstruction volume.

Results are demonstrated successfully for synthetic data, where the exact lighting conditions are
made available to the reconstruction algorithm. While this represents an interesting proof of
concept, it is unclear that this technique would work well on real reconstruction tasks, where
noise and unknown lighting conditions may be serious issues.

4.2 R
oxel algorithm

Another volumetric reconstruction technique incorporating transparency is the Roxel algorithm
proposed by De Bonet and Viola [10]. In this approach, responsibilities (weights) are assigned
along rays to describe the relative effects of the
voxels in determining observed pixel colour.
Responsibilities are thus due to the cumulative effect of transparency in attenuating the colours
of the voxels. While the relationship between responsibilities and transparencies is non
linear, a
closed form is

available for converting directly between the two representations.

Initially the entire volume is set to be empty and completely transparent. Each iteration of the
method comprises the following sequence of three steps. First, colours are estimated using

generalization of backprojection, in which voxel colours are weighted by their responsibilities.
Then, responsibilities are estimated based on the disagreement between the estimated pixel
colour and the real image data. To distribute responsibility amon
g the voxels in each ray, a
softmax style assignment (as in reinforcement learning) is made. Voxels that agree better with
the data are given a greater share of the responsibility, and the overall distribution is controlled
by a single parameter expressing

belief in the noisiness of the data. Finally, both the
responsibilities and the related transparencies are renormalized to be globally consistent over the
entire set of view estimates.

Results are demonstrated for a variety of synthetic and real data, a
nd the reconstructions are of
fairly good quality, especially for opaque objects. Another interesting aspect of the Roxel
algorithm is that uncertainty in the reconstruction is also represented by transparency. Thus, an
opaque but uncertain voxel will be r
endered as semi
transparent, corresponding to its expected

One serious criticism of the Roxel algorithm is that the method is too heuristic. Its image
formation model for opaque surfaces has also been criticized, because visibility is assessed only

by assuming that the voxels in front are transparent [7]. This could lead to unusual artefacts for
more complicated geometries. Furthermore, no proof of convergence is given, although in
practice the convergence behaviour does seem stable.

5 Mixed Pixels

Some standard vision algorithms, such as estimating depth from stereo or computing optical
flow, have recently been extended to handle transparency to some degree. However, most of
these extensions were not motivated by the desire to better handle genera
l transparent objects.
Rather, their typical goal was to improve behaviour at troublesome mixed pixels, which are
pixels whose colour is due to a combination of different objects. Mixed pixels arise at object
boundaries, where object detail is small relati
ve to pixel size, and of course, wherever
transparency is a factor.

Properly coping with mixed pixels involves managing multiple explanations per pixel, so these
algorithms tend to be more complicated than their predecessors. Although these methods may
dvertise themselves as supporting transparency, the degree of transparency modeled is usually
rather limited, and their performance with more general transparent objects poor.

5.1 Stereo algorithms

Szeliski and Golland present a stereo algorithm within o
perates over discretized scene space, but
also adds an explicit representation for partially transparent regions [27].

Like other volumetric
stereo methods,
evidence for competing correspondences is considered. This method differs in
allowing the detectio
n of mixed pixels at occlusion boundaries in order to obtain sub
accuracy. Estimates for colour and transparency are refined using a
global optimization method
which penalizes non
smoothness and encourages transparencies to be either 0 or 1.


stereo algorithm which accounts for transparency was proposed by Baker, Szeliski, and
Anandan [4]. Their method models the scene as a collection of approximately planar layers, or
sprites. These sprites are initially estimated without considering transpar
ency, then this solution
is refined iteratively. First, residual depths perpendicular to the planes are estimated, then per
pixel transparencies are estimated as well. The images are re
synthesized using a generative
model that incorporates transparency, a
nd error with respect to the input images is minimized
using gradient descent.

Both methods achieve impressive gains over previous results, and this improvement may be
directly attributed to better handling of mixed pixels. However, in neither case are re
presented for scenes containing real transparent objects.

5.2 Motion analysis

Irani, Rousso, and Peleg present a high
level motion analysis technique, geared at extracting a
small number of dominant motions, where the
objects may be occluded or tra
nsparent [15].

technique proceeds recursively. First, the dominant affine motion is estimated using
integration. This produces an integrated image that is sharp for pixels involved in the dominant
motion and blurred for the others. The sharp p
ixels are then segmented and removed from the
original sequence, so that the algorithm can estimate the next most dominant motion.

The integration process is robust to occlusion, and the algorithm even handles certain types of
transparent motions. The su
ccess of the algorithm in the presence of transparency depends on
different levels of contrast to allow the identification one dominant transparent motion. It also
depends on the assumption that the dominant transparent motion can be approximately removed
from the sequence by simply considering the difference between the original frames and the
integrated image. Good results are presented for a picture frame partially reflecting objects
behind the camera.

The skin and bones method suggested by Ju, Black, a
nd Jepson also handles transparency in its
estimation of optical flow [16]. Robust statistical techniques and mixture models are used to
estimate multiple layers of motion within fixed patches of the scene, and regularization
constraints ensure that smooth
ness between nearby and related patches is maintained.

Transparency is handled in that multiple depth measurements are allowed to exist for each pixel,
and the regularization step will automatically segment the data into different surfaces. The skin
bones method can be viewed as a hybrid between dense optical flow methods and global
parametric models which extract only the dominant motions.

5.3 Volumetric reconstruction

Accounting for mixed pixels is also important when performing volumetric reconst
ruction from
images. The space carving technique, as previously described, assesses visibility by determining
whether a given voxel projects to roughly the same colour in a set of images [19]. However, if a
given voxel is located at a depth discontinuity w
ith different backgrounds from different views,
then the resulting (mixed) pixel projections may appear inconsistent and lead the algorithm to
label the voxel (incorrectly) as empty.

The approximate space carving method [18] attempted to remedy this probl
em (and other
sources of noise) by broadening the notion of colour consistency between image pixels.
Kutulakos describes the shuffle transform, a looser matching criterion where arbitrary reordering
of pixels within a local neighbourhood is permitted. Othe
r approximate matching criteria such
those based on rank order within a local neighbourhood can also improve performance in this

Space carving has lately been extended using probabilistic methods to permit the representation
of partial or expec
ted voxel occupancy [7]. The basic space carving algorithm was modified to
incorporate stochastic sampling, and the algorithm is shown to generate fair samples from a
distribution which accurately models the probabilistic dependencies between voxel visibil
This approach provides a principled manner of dealing with uncertainty in the reconstruction and
handles mixed pixels with good results.

6 Matting Techniques

Matting is another important application for transparency. The process of matting involve
separating foreground objects from the background, typically so that they can then be
composited over a new background. In matting, transparency is modeled on a per
pixel basis,
rather than in scene space. This model nevertheless allows good results to b
e obtained at mixed
boundary pixels and permits the foreground objects themselves to be transparent.

The end result of the matting process is an image (a matte) that is transparent in regions of the
foreground object, opaque in the background regions, an
d semi
transparent at object boundaries
and wherever the foreground object is semi
transparent. This is completely analogous to the
alpha channel used in computer graphics.

6.1 Blue screen matting and extensions

Historically, the driving force behind mat
ting technology has been the film industry, which
employs these techniques to produce special effects. Vlahos pioneered the well
known blue
screen matting technique over forty years ago, and refinements of this technique are still in use
today. The basic i
dea is to film the foreground object (usually an actor) in front of a bright screen
of known colour (usually blue). If the object is relatively unsaturated, perhaps near
grey or flesh
coloured, a simple equation involving the background and observed foregr
ound colours allows
the transparency to be calculated at every pixel [26]. Human intervention is usually needed to
identify a good matte when using commercial matting machines.

Smith and Blinn demonstrate mathematically that the general matting problem is

unsolvable, but
justify traditional approaches to matting by showing how certain constraints reduce the space of
solutions [26].

They also suggest a novel approach to matting that they call triangulation. This method requires
the object to be filmed aga
inst two or more backgrounds which must differ at every pixel. The
solution to the matting problem then becomes overdetermined, so that fitting can be performed in
the least
squares sense. The main difficulty with the method is its lack of applicability wh
either the object or the camera is moving. If the object is static and the camera is computer
controlled, then in theory, two separate passes can be made, however maintaining proper
calibration remains a issue.

6.2 Environment matting

The environment
matting framework proposed by Zongker
et al.

extends these ideas even further
in order to capture the refractive and reflective properties of foreground objects [28]. The
extraordinary results obtained using this method (for static objects) are practically

indistinguishable from real photographs. While additional research has been done in capturing
environment mattes from moving objects in front of a single known background, the simplifying
assumptions required to make this possible cause the accuracy to su
ffer significantly [8].

The basic environment matting method operates by photographing the foreground object with a
series of structured images behind it and to the sides of it. Then, for each pixel, a non
optimization problem is solved, which att
empts to find the coverage of the foreground object and
the axis
aligned rectangle in the background which best reproduces the pixel colour over all of
the input images.

The backgrounds are chosen to be a hierarchical set of horizontal and vertical strip
ed patterns,
corresponding to one
dimensional Gray codes. This simplifies the calculation of average value
within axis
aligned rectangles in the background, and reduces the dimensionality of the
optimization to a more manageable three.

In further work, ad
ditional accuracy was obtained by using swept Gaussian stripes as background
images [8]. By considering stripes with different orientations, the environment matte could be
recovered as a set of oriented elliptical Gaussians instead of axis
aligned rectangl
es. This allows
much greater realism for objects made from anisotropic materials, such as brushed steel.

7 Vision in Bad Weather

In computer graphics, sophisticated physically
based models have been suggested to animate and
render atmospheric phenomena s
uch as clouds, fog, and smoke. One proposed method for
visualizing smoke was based on radiosity style techniques to approximate the effects of multiple
scattering [11]. Computer vision techniques, on the other hand, often assume that viewing
conditions are

clear and are rarely explicit in considering the effects of atmospheric phenomena
on image formation.

Nayar and Narasimhan have proposed that the effects of bad weather should be considered when
designing vision systems [23,22]. Moreover, they describe h
ow bad weather conditions might
even be turned to an advantage, if we take the view that the atmosphere acts to modulate
information about the scene. They describe algorithms that estimate additional information about
the scene such as relative depth, give
n an atmospheric model.

This extraction of additional information is made possible by making simplifying assumptions
about the atmospheric model. Namely, these techniques work best under uniformly hazy
conditions, and do not cope well with more sophistica
ted scattering phenomena. For example,
the situation where the sun selectively breaks through certain areas of low
lying clouds would be
modeled quite poorly.

7.1 Depths from attenuation

Using multiple images of the same scene taken under different weath
er conditions, relative
depths of point light sources can be estimated by comparing the degree of attenuation in the
images [23]. If the images are taken at night, illumination from the environment is negligible and
a simple scattering model suffices. The
optical thicknesses of the different weather conditions
can then be easily computed in order to estimate the relative depths.

7.2 Depths from airlight

Another technique involves estimating relative depths from a single image of a scene that is
in dense grey haze [23]. In this situation (all too common in many urban areas), the
dominant source of image irradiance is a phenomenon known as airlight. Airlight causes the
hazy atmosphere to behave as a source of light, through the scattering effect of

its constituent
particles, so that brightness will increase as a function of path length. By measuring the image
intensity at the horizon, which will correspond to an infinite path length, the airlight formula can
be used to extract relative depths from d
ifferent image intensities.

7.3 Dichromatic model of bad weather

Narasimhan and Nayar also derived a dichromatic model for atmospheric scattering, showing
that the colour of a point under bad weather conditions is a linear combination of its clear
r colour and its airlight colour [22]. The direction of the airlight colour is constant over
the entire scene, so this direction can be computed robustly by intersecting the dichromatic
planes corresponding to the different colours which appear in the imag
e. This also allows the
relative magnitudes of airlight colours to be determined for all the points in the scene.

Then, given multiple images of the scene taken under different weather conditions, relative
depths can be estimated even more reliably by us
ing only the airlight components of the image
colours. Moreover, true colours can also be determined for the entire scene given the clear
weather colour of just a single point in the image.

8 Discussion

Transparency complicates things. Algorithms that wo
rk reliably on opaque objects fail
embarrassingly in the presence of transparency. The algorithms that do attempt to handle
transparency often make serious compromises, accepting simplified models of transparency but
still suffering from added complexity.

What worsens things from an analytical point of view is that the amorphous concept labelled
transparency is often really multiple concepts rolled together. Transparency has variously been
used to represent partial coverage of a pixel by a foreground objec
t, the degree to which light
from the background can pass through an object, uncertainty about the contents of a pixel, and
different combinations of these. Much of the research described could have benefited from more
clarity in defining precisely which o
f these features were intended to be modeled.

The perception of transparency is by all accounts heuristic and not very well constrained. This
has the implication that three
dimensional reconstructions of transparent objects might not need
to be so accurat
e in order to convince a casual observer. It also suggests additional experiments
for determining the nature of these heuristics. One might investigate the process by which the
rules governing the perception of transparency are learned by studying its earl
y development in
animals and the young.

While transparency is a fertile area for research in computer vision, the accurate modeling of
transparency is by no means an end goal. Research on transparency is being broadened to even
richer models of light tran
sport, including such phenomena as refraction, reflection, and
scattering. The environment matting technique and various work on vision in bad weather are
early examples of this trend. The modeling and recovery of physical scene attributes continues to
rove, and so the gap between the representations used in computer vision and computer
graphics will continue to narrow over the next decade.


[1] E. Adelson and A. Pentland. The perception of shading and reflectance. In D. Knill and W.

Perception as Bayesian Inference
. Cambridge University Press, New York, pp.
423, 1996.

[2] E. Adelson and P. Anandan. Ordinal characteristics of transparency. In
90 Workshop
on Qualitative Vision
, pp. 77
81, 1990.

[3] A. Andersen and A.

Kak. Simultaneous Algebraic Reconstruction Technique (SART): A
superior implementation of the ART algorithm.
Ultrasonic Imaging
, 6:81
94, 1984.

[4] S. Baker, R. Szeliski, and P. Anandan. A layered approach to stereo reconstruction. In
Computer Visi
on and Pattern Recognition
, pp. 434
441, 1998.

[5] J. Beck and R. Ivry, On the role of figural organization in perceptual transparency.
Perception and Psychophysics
, 44 (6), pp. 585
594, 1988.

[6] M. Bhatia, W. Karl, and A. Willsky. A wavelet
based meth
od for multiscale tomographic
IEEE Trans. On Medical Imaging
, 15(1), pp. 92
101, 1996.

[7] R. Bhotika, D. Fleet, and K. Kutulakos. A probabilistic theory of occupancy and emptiness.
In submission, 2001.

[8] Y. Chuang,
et al
. Environment m
atting extensions: Towards higher accuracy and real
capture. In
, pp. 121
130, 2000.

[9] F. Dachille, K. Mueller, and A. Kaufman. Volumetric backprojection. In
Proc. Volume
Visualization and Graphics Symposium
, pages 109
117, 2000.

[10] J. DeBonet and P. Viola. Roxels: Responsibility weighted 3D volume reconstruction. In
Proc. Int. Conf. on Computer Vision
, pp. 418
425, 1999.

[11] R. Fedkiw, J. Stam, and H. Jensen. Visual simulation of smoke. In
pp. 23
30, 2001

[12] D. Gering and W. Wells III. Object modeling using tomography and photography. In
IEEE Workshop on Multi
View Modeling and Analysis of Visual Scenes
, pp. 11
18, 1999.

[13] K. Hanson and G. Wecksung. Bayesian approach to limited
angle reconst
ruction in
computed tomography.
Journal of the Optical Society of America
, 73(11), pp. 1501
1509, 1983.

[14] V. Interrante, H. Fuchs, and S. Pizer. Conveying the 3D shape of smoothly curving
transparent surfaces via texture.
IEEE Trans. On Visualization a
nd Computer Graphics
, 3(2), pp.
117, 1997.

[15] M. Irani, B. Rousso, and S. Peleg.
Computing occluding and transparent motions
International Journal of Computer Vision
, 12(1), pp. 5
15, 1994.

[16] S. Ju, M. Black, and A. Jepson. Skin and bones: Multi
layer, locally affine, optical flow and
regularization with transparency. In
Proc. IEEE Conf. on Computer Vision and Pattern
, pp. 307
314, 1996.

[17] A. Kak and M. Slaney.
Principles of Computerized Tomographic Imaging
. IEEE Press, New

[18] K. Kutulakos, Approximate N
View Stereo, In
Proc. European Conference on Computer
, pp. 67
83, 2000.

[19] K. Kutulakos and S. Seitz. A theory of shape by shape carving.
International Journal of
Computer Vision
, 38(3), pp. 197
216, 2000.

[20] F. Metelli. The perception of transparency.
Scientific American
, 230 (4), pp. 90
98. 1974.

[21] K. Nakayama, S. Shimojo, and V. Ramachandran. Transparency: relation to depth,
subjective contours, luminance, and neon color spreading.
, 19,

pp. 497
513, 1990.

[22] S. Narasimhan and S. Nayar, Chromatic framework for vision in bad weather. In
IEEE Conf. on Computer Vision and Pattern Recognition
, pp. 598
605, 2000

[23] S. Nayar and S. Narasimhan, Vision in bad weather. In
Proc. Int.
Conf. on Computer Vision
pp. 820
827, 1999.

[24] K. Sauer and C. Bouman. A local update strategy for iterative reconstruction from
IEEE Trans. On Signal Processing
, 41(2), pp. 5234
548, 1993.

[25] Seitz, S. and C. Dyer. Photorealistic sce
ne reconstruction by voxel coloring.
Journal of Computer Vision
, 35(2), pp. 151
173, 1999.

[26] A. Smith and J. Blinn, Blue screen matting. In
, pp. 259
268, 1996.

[27] R. Szeliski and P. Golland, Stereo matching with tran
sparency and matting. In
Proc. Int.
Conf. on Computer Vision
, pp. 517
524, 1998.

[28] D. Zongker, D. Werner, B. Curless, D. Salesin. Environment matting and compositing. In
, pp. 205
214, 1999.