1
Building Vision Based Interaction Systems
Btech Project 1
st
Stage Report
Nekhil Agrawal
Roll N
o: 040
05
007
Under the guidance of
Prof.
Sharat Chandran
Department of Computer Science and Engineering
Indian Institute of Technology,
Powai
Mumbai
2
Table of Contents
ABSTRACT
................................
................................
................................
................................
.....................
4
1
INTRODUCTION
................................
................................
................................
................................
.............
5
2
PREVIOUS WORK
................................
................................
................................
................................
...........
6
3
CONCEPTS INV
OLVED
................................
................................
................................
................................
....
7
4
THREE DIMENSIONAL VI
RTUAL BATTLEFIELD TO
TRAIN SOLDIERS
................................
...............................
13
4.1 Aim and Basic Setup
................................
................................
................................
........................
13
4.2
Terminologies Used
................................
................................
................................
....................
14
4.3
As
sumptions and Validations
................................
................................
................................
......
15
4.4 Procedure for Development of the Project
................................
................................
...................
18
5
EXPLANATION OF WORKI
NG THROUGH AN EXAMPL
E
................................
................................
.................
21
6
FURTHER ADDITIONS
................................
................................
................................
................................
...
23
7
CONCLU
SION
................................
................................
................................
................................
...............
23
3
Acknowledgement
I would like to thank Prof. Sharat Chandran for devoting his time and efforts to provide me with
vital
directions to work in the field. It would have been difficult without his continued support.
Nekhil Agrawal
.
4
Abstract
The main aim of this project is to build a virtual model and enable multiple users to
interact
with it
. The development of project can be divided into four major portions.
a)
When user presses
trigger of
the
camera mounted on his gun then using techniques of
camera calibration we figure out where the user is located.
b)
C
orresponding location in the model is c
alculated.
c)
T
he point of
hit is calculated.
d)
A
ccuracy of the shot by user is re
ported.
5
1
Introduction
Currently huge amount of money is wasted to develop training grounds and to buy
ammunitions required for training soldiers in an army.
Still when a trainee is training on these
grounds he does not get a real life battle experience and even if the training is done in a way to
get experience of real battle the chances of trainee getting injured is more owing to his
inexperience. The trainee
does not get to learn how to apply team work during a battle just by
training on these grounds. There is no proper feedback to the trainee as by how much he
missed his target, was this miss closer to the target then the last miss and whether he is
progres
sing well or not. Imagine the additional cost required to change stationary targets to
targets that are moving as in real battles the targets won’t be sitting still waiting for you to aim
properly and hit them. Proper calculations and feedback of the hit i
ncluding time frame (like for
example he should have hit a little forward before 5 seconds) is almost impossible in these
situations.
Imagine a scenario where
a
trainee goes to a simple training room picks up a head mounted
display and a simple gun with ju
st a camera mounted on it. Upon wearing the
head mounted
display he visualizes
himself in an actual battle field where enemies are moving and trying to hit
him. He then tries to hit the enemy and finally gets feedbacks on how well he performe
d,
how
many
targets was
he able to shoot, how many he missed and by how much distance
. He can
visualize his progress by comparing his curr
ent performance by saved data from
his last
trainings. He can even get to know if this was
a real b
attle how many times he would h
ave been
shot
dead by the enemies.
An extension
to this
would be to
use this setup for multiple users
where they can train together and learn the ethics of team work and cooperation.
T
his is what we intend to do.
Our Past works in this field were mainly ai
med at two dimensional
simulations and ar
e discussed in section 2
. The
first step towards three dimen
sional simulations
deals with the
following problem: Given
either a three dimensional simulation or a two
dimensional
projection of
a
model if the user
hit
s a point on the
projection from a given
position at a given rotation matrix with respect to the world coordinate system we want to fin
d
out the exact point on the three dimensional
model where hit occurs
.
The same setup would
work for multiple users also.
The
challenges in this project do not lie only on understanding concepts related to projections
and image processing but
also
to visualize the problem in three dimensional. Many difficulties
were faced in the development of algorithm.
U
nlike any theoretic
al project the main
challenging part lies ahead in the implementation of the
algorithm
. Understanding the
limitations of camera used, working with practical constraints and testing our work is what
makes this project both interesting and challenging.
6
The
target application scenario uses concepts of image processing and projective geometry to
determine location of soldiers
and their direction of shot.
The location determined is then
mapped to a virtual model where the point of hit is calculated
and thereby calculating
efficiency
of
the
shot
.
Section 3 introduces these
concepts
.
Section 4 gives details about
setup,
terminologies, assumptions and whole algorithm. Section 5 explains it using an example.
Section 6 and 7 then concludes the report.
2
Previous
Work
The past work done on this project includes the development of a two dimensional shooting
range simulator which
enables the users to train using
projection of a two dimensional model.
A virtual environment was created and rendered as first pe
rson view similar to the environment
shown in the figure below.
Figure 1: A
n example of
projected two dimensional
models
for shooting range simulator
The algorithm
used
can be summarized as:
Compute the homography of projector camera
system.
Calculate the point of hit
as the camera centre.
Using the homography find the corresponding point of hit in the image.
Report the accuracy of shot by calculating the difference between point of hit and target
point.
7
3
Concepts Involved
Two major
concepts involved in the project include:
a)
Given a three dimensional model, its two dimensional projection and a point on the
projection how to find the corresponding point on the model.
b)
Given an image of a calibrated pattern calculate the extrinsic parame
ters of the camera.
Let us deal with first concept by considering a simple example and solving it.
Problem Statement
1
:
Given a cube i
n 3 dimensional spaces, its two dimensional projection
and
a point (
x
,y
) on the projection
. If this point is projection
of a point (X
, Y, Z
) on the cube find the
coordinates X, Y and Z.
The cube is rotated at an angle α along x

axis, β along y

axis and ω along
z

axis.
Solution:
Assumptions:
1)
Without any loss of generality assume the size of cube to be 2 units.
2)
The
centre of the cube is the origin of coordinate system.
3)
If the original position of cube is as shown in figure 2 then t
he screen is placed at positive
z

axis and
is
at a distance of more than 1 unit
from the origin
.
4)
Without any loss of generality we can ass
ume that all the three angles are less than 90
degrees as the cube is at a similar position after rotation of 90 degrees than it was at 0
degrees. Hence at a point of time the three faces visible are the ones coming from faces
along the plane z = 1, y = 1
and x =

1.
5)
The projection is orthographic projection.
Procedure:
Suppose the cube was originally positioned with all angles as 0 degrees as shown in the figure
below
.
8
Figure 2: Figure showing initial position of the cube.
Consider a point (x, y,
z) on the cube. Now if the cub
e is rotated at an angle α around the
x

axis
the X coord
inate will remain
same while new Y and Z coordinates can be given as:
(
)
√
(
)
√
Where,
T
aking these as (X,Y,Z) we can
find new X’, Y’, Z’ along r
otation in Y axis as well as
Z axis.
F
rom first person’s view only three faces of the cube are visible
for any angles of rotation
around the three axes
. Hence the two dimensional projection will contain projection of these
thr
ee faces only. If we project lines
of intersection of these faces
then the screen can be divided
into three sections each representing points on a particular face of the cube. The figure below
shows the division of screen into three sections.
Figure
3:
Figure showing the projection of three edges AB, BC and BD which are intersection of
the three visible faces of the cube dividing the screen into three sections.
9
With these three sections given a point (x,y) on the screen
we can tell which region that
poi
nt
lies and hence
is a proj
ection of a p
oint on which face of the cube. Let us number the faces
from 1 to 8.Suppose the above point (x,y) is a projection of point on face 1.Consider the cube in
its original position.
Take any general
point (a,b,1) on the 1
st
face of the cube
. Find the final
coordinates o
f the point after rotation around
the three axes by the above formula. The final
coordinates(X,Y,Z) depends on two variables
“a” and “b”
. The projection of this point on the
screen is simply (X,Y) which is k
nown to be (x,y). This can be used to calculate the values of
these
two parameter and hence we can calculate the coordinate (X,Y,Z) of the point in world
coordinate system.
Problem Statement 2:
Given an image of a calibration pattern how to estimate th
e Intrinsic
Parameters and extrinsic parameters of the camera.
Solution:
The main idea here is to write the projection equations linking a set of known three
dimensional points and their projections. We can then solve these equations for camera
parameters. To get this set of three dimensional points we need an image of calibrated pattern
which is an image having some points whose three dimensional position in space is known and
these points can be easily located in the image. The figure below sho
ws an image of calibrated
pattern.
Figure
4:
Figure
showing an image with 4 calibration points
Note that it is not necessary to calibrate image with calibration points we can calibrate using
corner of the room or any other information in image which
can be easily located and
corresponded with real points in space. The figure below shows another type of image with
calibration pattern.
10
Figure
5: Figure
showing another type of calibration pattern
. Here the edges of two black
squares are used for cali
bration.
In the above image if we know the positions of edges of the two black squares then we can use
simple image processing techniques to locate these edges on the image and hence solve the
projection equation using them.
Intrinsic par
ameters of camera basically include the
focal length
f
, pixel width
s
x
, pixel height
s
y
,
u and
v
the
pixel coordinates of camera center in terms of image pixel coordinate system.
Extrinsic parameters of camera are basically the rotation matrix of axi
s of c
amera coordinate
system with respect to
world coordinate system
(R)
and the translation vector of camera center
(T)
.
Terminologies Used:
World Coordinate System
: A coordinate system defined on the real space and any real point
location can be defined in
terms of this coordinate system with coordinates known as world
coordinates denoted as (X
w
, Y
w,
Z
w
).
Camera Coordinate System
: A coordinate system defined with origin as the camera location and
positive z axis along the camera optical axis. Any pint in c
amera coordinate system will be
represented as (X
c
, Y
c,
Z
c
).
Image Pixel Coordinate System:
A two dimensional coordinate system defined with origin at
top left hand coordinate of the image plane. Any point in image pixel coordina
te system will be
represe
nted as
(x
im
, y
im
).
We can write the relation between a point in image pixel co
ordinate system and a
corresponding point in world coordinate system as:
(
)
(
)
(2.1)
(
)
(
)
(2.2)
11
Where,
The
se
equations suggest that given a sufficient number of corresponding points i.e. points
whose
(X
w
, Y
w,
Z
w
) and (x
im
, y
im
) are known we can solve to get the camera parameters.
Now the algorithm to find the camera parameters can be divided into
two phases:
a)
Assuming that image centre is the origin of image reference plane and then solving for the
camera parameters.
b)
Finding the coordinates of image centre.
Now for the first part:
Assuming u and v to be 0, we can rewrite the equations 2.1 and 2.2 as
(
)
(
)
(6.3)
Considering
The equation (6.3) can be seen as an equation of 8 unknowns
namely
,
,
,
,
,
,
and
.
Hence writing the equation for N corresponding points leads to the homogeneous system of N
linear equations given by
Solving these equations we can get the
̅
.
Now all we need to do is to find the
unknown
scale factor
and
thereby calculating
the
parameters using the
following
equation:
̅
(
)
12
Since
from the first three components of
̅
we obtain
√
̅
̅
̅
̅
̅
̅
̅
̅
̅
√
(
)


Similarly, since
and α > 0 so from fifth, sixth and seventh component we
have,
√
̅
̅
̅
̅
̅
̅
̅
̅
̅
√
(
)


Using these equations we can get the value of


as well as the aspect ratio α.
Till now we have the first two row
s of the rotation matrix R and first two components of the
translation vector determined up to an unknown common sign. The third row of rotation matrix
can be found out by cross multiplication of first and second row. Now using the orthogonality
constrain
ts of the rotation matrix we can find out this common sign.
Now we just need to determine the third component of the translational vector and
.
From equation 2.1 we can write:
(
)
(
)
S
olving
this for N different linear equations which can be represented as
equation below,
we
can get the leftover two unknowns
:
(
)
Now f
or the second part of the problem i.e. estimating the
image center we need to
know
definition of vanis
hing point and orthocenter theorem.
Vanishing Point
: Because of perspective projection, projections of parallel lines in three
dimensional spaces seem to meet at a point p on the image. This point p is called vanishing
point.
Orthocenter Theorem
: The ortho
center of the triangle formed by three vanishing points of an
image is the image center.
13
Hence for the second part we can calculate three vanishing points of the image and use
orthocenter theorem to find the image center.
4
Three D
imensional
Virtual Battlefi
eld to Train S
oldiers
4.1
Aim
and Basic Setup
This project aims to solve the basic algorithm which can be extended to
a
battlefield simulation.
A virtual model will be created in open graphics library. This model will then be projected from
a simple projector to
a flat screen in the room. The screen will have black dots along its edges
which will be used for calibration pur
poses.
The figure below shows a snapshot of screen from
the position of the projector which contains calibrating points.
Figure 6: Image showing the projector screen for the scenario where there are two poles Q and
R on the virtual model. The encircled d
ot is the black dot that is used for calibration.
The user will have a camera in hand. Images taken by this camera are our only source of
information about the location of user and his direction of shot. The details about how
everything is calculated usin
g these images is given in section 4.4.
14
4.2
Terminologies Used
Virtual Space
/ Virtual World
: This is the space where model of battle field is defined. Model can
be defined either in opengl or in any other language. For this project we are assuming that it is
defined in opengl.
Virtual Coordinate System:
This is the coordinate system in opengl with respect to which the
whole battlefield is defined.
Point of Projection
: This is the point in the virtual space from where we are taking a snapshot of
the model whic
h
is displayed by the projector.
Real Space
/ Real World
: This is the space where
trainee
is actually standing. The snapshots
taken by the camera will be of things which exist in this world.
Real Coordinate System
:
This is the coordinate system defined in the real space and location of
every point in the space will be measure with respect to this coordinate system.
Projector Screen
:
This is the screen where projector is projecting a snapshot of the virtual
space. Th
e traine
e does his training by seeing this snapshot only.
Trainee
:
The soldier who is here to train.
Trainee
’s
Camera
:
This is the camera mounted on the gun used by the trainer. The optical axis
of this camera shows the direct
ion along which the bullet wil
l move
when the
trigger is
pressed.
Trainee’s Real World Position:
This is a point defined in real world coordinate system showing
the position where
trainee or more precisely
camera of trainee
’s gun
is lo
cated.
Trainee’
s Virtual World Position:
This is a
point in virtual world coordinate system
corresponding to the trainee’s real world position. In other words trainee standing at a
particular position in real world will
see
himself
, on his head mounted display,
standing on this
corresponding virtual world
point.
Calibration Points:
These a
re the points on the real world with known position which are used
for camera calibration and hence locating the current position of trainee.
Scaling Factor
(k)
:
This is the factor by which the
distances in
real world and
virtual world are
related.
For example if the trainee moves by a distance of 5 meters in the real world then the
trainee in virtual world will move by a distance of 5
*
k meters.
15
4.3
Assumptions and Validations
For the current sta
ge we are working with a proje
ctor rather than a head mounted display.
This work can be further extended to work for a head mounted display.
Trainee visualizes the three dimensional virtual model on the basis of the projected image and
then takes his shot
.
This assumption can be valid
ated by considering the following scenario:
The virtual model consists of two poles Q and R and the projector projects them as Q1 and R1
on the screen.
There are two trainees using the setup. The figure below shows
the scenario
when the trainee 1 is asked
to hit at point Q and he presses the trigger of his gun in the shown
position.
Figure 7: Figure showing a scenario in real world when two trainees are doing their training.
For this case the snapshot taken by the trainee
1
’s camera will look like
16
Figure 8: Figure showing a snapshot of image taken by trainee1’s camera.
The center of camera
coincides with the projection of pole Q.
Now suppose the trainee did not visualize the three dimensional model then we would have
reported the point of hit as Q
which is the inverse projection of Q1. But this is not the case as
we can clearly see from figure
3.2 that the trainee has hit point R and hence R should be
reported as the point of hit not Q which so seems by figure 3.3. Also this needs to be extended
to
work for
either
a head mounted display where user need not visualize the three dimensional
model
or for a fast projector where each user will be able to view the model from his
viewpoint. In both the cases he need not visualize the model
and hence this ass
umption
is
correct
.
Projector screen
is
far
away from the projector
.
This assumption can be verified by considering the same scenario where virtual world has two
poles Q and R projected as Q1 and R1 on the screen
and a condition like the one in figure belo
w
happens when the user hits
.
17
Figure
9: Figure
showing a snapshot of the whole setup when the trainee presses the trigger
and the line of hit does not intersects with the projective screen.
In this case the trainee is
asked to shoot at point Q. H
e
visualizes the model and hits the correct
point. But as the line joining trai
nee1 and Q does not pass through
the projective screen we
might not get any calibration point and hence
will be
unable to perform further calculations. To
avoid this situation we
need to have this assumption as we are having calibrated points only on
projective screen and we want them for performing our calculations. Also in the final product
the whole room will be dedicated for training purpose and hence we can have calibration
pa
ttern on all these walls and hence condition for this assumption will already be satisfied.
Position of user in real space coincides with that of the camera on his gun
.
This assumption can clearly be made as we are
currently
not interested in
user
’s body
m
ovements.
W
e just
want to calculate the point of hit.
The projector is fixed and is projecting the snapshot from one position only
.
This assumption is taken because if we make the projector free to move then it will
unnecessary complicate the current stage
which is not at all needed in future stages. Also we
are projecting from only one position because if we will project from different positions then it
makes sense only if we are projecting from the trainee’s virtual world position
and if we do that
then t
here will be problems for multiple users to use it.
18
The scale
factor (k)
is 1
.
It is safe to assume that the scale
f
actor is 1 for now as if it is not 1 the user has to visualize
the scaling also while taking a hit which would make things even more
complicate
d
for him.
4
.4
Pr
ocedure for Development of the P
roject
The whole procedure of development can be divided into two parts:
1)
The development of virtual space
: This part mainly includes the
development of
battlefield in opengl or in any other
model. This can further include additional time frame
to enable the battlefield to change with time as in real battle things wont be stationary
and hence to provide real battle training we should have targets that are moving.
2)
Interaction with virtual model
: The soldier will be training in the real world and hence we
should develop method to set relation between the snapshots taken by the camera and
the movements of user in the virtual world. This part can be further divided into four sub
sections.
a)
Camera c
alibration
: This includes determining the camera parameters from the
snapshots taken by the camera. As discussed
in
second problem this section can be
further divided into two subsections. First dealing with the identification of calibrating
points by imag
e processing techniques and other estimating the camera rotation and
translation vectors using these calibrating points with known location in the space.
b)
Finding the point and direction in virtual world corresponding to a point and direction
in real world
:
By camera calibration techniques we can determine the location of
trainee and direction of shot with respect to the real world coordinate system.
We
know the point and direction in the virtual world corresponding to the location of
projector in the real
world (which corresponds to the point in the virtual world from
where we are doing gnulookat). Now suppose the z axis of camera (optical axis or
direction in which trainee shot) makes an angle α with the z axis of the projector
(direction perpendicular to
the projector axis passing through the projector) then in
the vir
tual world
also the direction
of hit will make an angle α with the direction at
which projection is taken. Also the
distances between trainee’s location and projector
location along
x, y
and z axes will be multiplied by scale factor k to give corresponding
distances in the virtual world and hence the corresponding point can be calculated.
The figure below
explains this concept.
1
9
Figure 10: Figure showing the correspondence between real a
nd virtual world.
The left hand side of the figure shows scenario at the real world and the right hand
side of the figure shows the scenario at t
he virtual world. We know that P and P
’ are
corresponding points. Now using camera calibration techniques we ha
ve already
calculated the location of the trainee and using that we can calculate X, Y, Z and
α
(the
angle).Note that Y is not shown in the figure as the figure is two dimensional. With
these information and known value of
s
(the scaling factor) we can cal
culate
sX
,
sY
,
sZ
and
α
for the virtual world; w
ith these values we can calculate the location of T’ in the
virtual world and the direction of shot.
c)
Computing the point of hit
: Once we know the direction of shot the first point where it
intersects the virt
ual model gives the point of hit.
d)
Computing the accuracy of shot
:
Knowing the point of hit and target we can easily
calculate the accuracy of the shot. Things here become complicated when we will
involve a time frame and giving more detailed feedback to us
er for moving objects like
if you had shot in this direction 5 seconds later there would have been a hit.
20
The flow chart below summarizes the procedure.
Figure 11:
Figure showing the flow chart of development procedure.
Image Processing
Techniques to Locate
Calibrating P
oints
Computation of the
C
amera
P
arameters
Camera
Calibration
Setting the
Correspondence
Computing
the Point of
Hit
Computing the
Accuracy of
the S
hot
Development of
Battlefield Model in
OpenGL
Interaction with Virtual
Model
Development
Procedure
Development of
Virtual Space
21
5
Explanation of Working Through an E
xample
Let us consider an example to show the complete flow of the project. The virtual model consists
of two
poles Q and R projected by the projector as Q’ and R’. The figure below shows the scenario when the
trainee1 presses his trigger.
Figure 12
: Figure showing an example for explaining the working of the algorithm.
The flow chart below shows the
complete flow of the working of
this example.
22
The location of trainee and
direction of shot w.r.t.
projector is defined
.
The corresponding point
and direction in the virtual
space is computed
.
The point of hit is
computed
.
The user is given
feedback
of accuracy of his shot.
By methods of camera
calibration camera
parameters calculated
.
The image is processed
to
locate the calibrating
points
.
Trainee1 presses the
trigger
.
A
snapshot is taken by the
camera.
23
6
Further A
dditions
1)
Replacing the
projector by a head mounted system:
This will obviously a better case as
the trainee then need not visualize anything and final aim of our project
2)
Adding body movements
:
Trainee can be asked to wear a multiple
sensor suit and then
his body movements can be tracked. Many additional features can be added some of
which include:
a)
The user can see his movement in the virtual model and hence more realistic training can be
done.
b)
Features like
the trainee needs to
protect himself from the bullets of the enemies and at the
same time kill them can be added.
c)
Features like the trainee needs to infiltrate enemy base without letting the enemy notice can be
added.
7
Conclusion
Building virtual interaction systems is not a n
ew concept and many projects on the same have
already been done. But this project will help Indian army to not only provide their soldiers with
training environment close to a real battle but will also save the huge amount of money
required for training.
Comments 0
Log in to post a comment