1
Building Vision Based Interaction Systems
Btech Project 1
st
Stage Report
Nekhil Agrawal
Roll N
o: 040
05
007
Under the guidance of
Prof.
Sharat Chandran
Department of Computer Science and Engineering
Indian Institute of Technology,
Powai
Mumbai
2
Table of Contents
ABSTRACT
................................
................................
................................
................................
.....................
3
1
INTRODUCTION
................................
................................
................................
................................
.............
4
2
CONCEPTS INVOLVED
................................
................................
................................
................................
....
5
3
THREE DI
MENSIONAL VIRTUAL BA
TTLEFIELD TO TRAIN S
OLDIERS
................................
...............................
11
3.1
Aim
................................
................................
................................
................................
..............
11
3.2
Terminologies Used
................................
................................
................................
....................
12
3.3
As
sumptions and Validations
................................
................................
................................
......
13
3.4
Procedure for development of the project
................................
................................
................
16
3.4
Explanation of working through an example
................................
................................
..............
19
3.5
Further additions that can be done
................................
................................
............................
21
4
CONCLUSION
................................
................................
................................
................................
...............
21
3
Abstract
The main aim of this project is to build a virtual model and enable multiple users to interact
with this virtual model. The development of project can be divided into four major portions.
Firstly, when user presses the trigger of camera mounted on his gun
then using techniques of
camera calibration we figure out where the user is located. Secondly, a corresponding location
i
n the model is calculated. Thirdly, the point of hit is calculated and finally accuracy of the shot
by user is reported.
Section three
of the report elaborates on these concepts.
4
1
Introduction
A huge amount of resources and ammunition is waste
d by Indian army, to train
soldiers for
actual battle. Yet the soldiers do not get training experiences of real battle and even have
risk of getting injured during their training. This project aims to build a three dimensional
virtual scene where multiple users can train interactiv
ely. A virtual model of the battlefield
can be displayed to users using fast display projectors or Head Mounted Display. User with a
camera mounted gun on his hand can control the movements of a soldier at the battlefield
and using these he can train by sh
ooting at specified target. We also aim to tell the
efficiency of shots by calculating the distance between point of hit and target.
The past work done on this project includes the development of a two dimensional shooting
range simulator which enables
the user to shoot on a projection of a two dimensional
model. The point of hit can be calculated either by a visual feedback or by ca
lculating the
camera centre. Then
by homographic method a point on the virtual model corresponding to
this point on the scr
een, is calculated which is used to calculate the accuracy of shooting.
Figure 1 show an example of two dimensional models which can be projected.
Figure 1: A projected two dimensional model for shooting range simulator
5
2
Concepts Involved
Two m
ajor concepts involved in the project include:
a)
Given a three dimensional model, its two dimensional projection and a point on the
projection how to find the corresponding point on the model.
b)
Given an image of a calibrated pattern calculate the extrinsic
parameters of the
camera.
Let us deal with first concept by considering a simple example and solving it.
Problem Statement
1
:
Given a cube in 3 dimensional spaces, a screen and a point (
x
,y
) on
the screen. If this point is projection of a point (X
, Y, Z
) on the cube find the coordinates X, Y
and Z.
Solution:
Assumptions:
1)
The cube is rotated at an angle α along x

axis, β along y

axis and Ω along z

axis.
2)
Without any loss of generality assume the size of cube to be 2 units.
3)
The screen is placed at
positive z

axis and at a distance of more than 1 unit.
4)
The centre of the cube is the origin of coordinate system.
5)
Without any loss of generality we can assume that all the three angles are less than 90
degrees as the cube is at a similar position after
rotation of 90 degrees than it was at 0
degrees. Hence at a point of time the three faces visible are the ones coming from
faces along the plane z = 1, y = 1 and x =

1.
6)
The projection is orthographic projection.
Procedure:
Suppose the cube was original
ly positioned with all angles as 0 degrees as shown in the
figure
below
.
6
Consider a point (x, y, z) on the cube. Now if the cube is rotated at an angle α along x

axis
the X coordinate will remain the same while new Y and Z coordinates can be given as
:
(
)
√
(
)
√
Now taking these as (X,Y,Z) we can find new X’, Y’, Z’ along rotation in Y axis and similarly for
Z axis.
Now consider the four edge points of the three faces seen. Since we know the angles
of
rotation and the initial values of these 4 points we can tell the final rotated values hence
their projections on screen, which would look something like:
Now this projection divides the screen into three parts. As the point (x,y) on the screen is
given, we can tell which region that point lies and hence this point is a projection of a point
on which plane of the cube. Known the plane let us assume that the plane initially was
(X,Y,1). Take any general (a,b,1) point on the plane. Find the final coo
rdinates of the point
after rotation along the three axes by the above formula. The final coordinates(X,Y,Z)
7
depends on two variables. The projection of this point on the screen is simply (X,Y) which is
known to be (x,y). This can be used to calculate the
values of two parameter and hence we
can calculate the coordinate (X,Y,Z) of the point in world coordinate system.
Problem Statement 2:
Given an image of a calibration pattern how to estimate the
Intrinsic Parameters and extrinsic parameters of the cam
era.
Solution:
The main idea here is to write the projection equations linking a set of known three
dimensional points and their projections. We can then solve these equations for camera
parameters. To get this set of three dimensional points we need a
n image of calibrated
pattern which is an image having some points whose three dimensional position in space is
known and these points can be easily located in the image. The figure below shows an
image of calibrated pattern.
Figure showing an image wi
th 4 calibration points
Note that it is not necessary to calibrate image with calibration points we can calibrate
using corner of the room or any other information in image which can be easily located and
corresponded with real points in space. The
figure below shows another type of image with
calibration pattern.
8
Figure showing another type of calibration pattern
In the above image if we know the positions of edges of the two black squares then we can
use simple image processing techniques to l
ocate these edges on the image and hence solve
the projection equation using them.
Intrinsic par
ameters of camera basically include the
focal length
f
, pixel width
s
x
, pixel
height
s
y
, u and
v
the
pixel coordinates of camera center in terms of image pixe
l coordinate
system.
Extrinsic parameters of camera are basically the rotation matrix of axi
s of camera
coordinate system with respect to
world coordinate system
(R)
and the translation vector of
camera center
(T)
.
Terminologies Used:
World Coordinate
System
: A coordinate system defined on the real space and any real
point location can be defined in terms of this coordinate system with coordinates known as
world coordinates denoted as (X
w
, Y
w,
Z
w
).
Camera Coordinate System
: A coordinate system defined
with origin as the camera location
and positive z axis along the camera optical axis. Any pint in camera coordinate system will
be represented as (X
c
, Y
c,
Z
c
).
Image Pixel Coordinate System:
A two dimensional coordinate system defined with origin
at top
left hand coordinate of the image plane. Any point in image pixel coordina
te system
will be represented as
(x
im
, y
im
).
We can write the relation between a point in image pixel co
ordinate system and a
corresponding point in world coordinate system as:
(
)
(
)
(2.1)
(
)
(
)
(2.2)
Where,
9
The
se
equations suggest that given a sufficient number of corresponding points i.e. points
whose
(X
w
, Y
w,
Z
w
) and (x
im
, y
im
) are known we can
solve to get the camera parameters.
Now the algorithm to find the camera parameters can be divided into two phases:
a)
Assuming that image centre is the origin of image reference plane and then solving for
the camera parameters.
b)
Finding the coordinates of ima
ge centre.
Now for the first part:
Assuming u and v to be 0, we can rewrite the equations 2.1 and 2.2 as
(
)
(
)
(6.3)
Considering
The equation (6.3) can be seen as an equation of 8 unknowns
namely
,
,
,
,
,
,
,
.
Hence writing the equation for N corresponding points leads to the homogeneous system of
N linear equations given by
Solving
these equations we can get the
̅
.
Now all we need to do is to find the
unknown
scale factor
and hence find the parameters
using the equation:
̅
(
)
10
Since
from the first three components of
̅
we obtain
√
̅
̅
̅
̅
̅
̅
̅
̅
̅
√
(
)


Similarly, since
and α > 0 so from fifth, sixth and seventh component
we have,
√
̅
̅
̅
̅
̅
̅
̅
̅
̅
√
(
)


Using these equations we can get the value of


as well as the aspect ratio α.
Till now we have the first two rows of the rotation matrix R and first two components of the
translation vector determined up to an unknown common sign. The third row of rot
ation
matrix can be found out by cross multiplication of first and second row. Now using the
orthogonality constraints of the rotation matrix we can find out this common sign.
Now we just need to determine the third component of the translational vector a
nd
.
From equation 2.1 we can write:
(
)
(
)
Then solving
this for N different linear equations which can be represented as:
(
)
we can get the leftover two unknowns.
Now f
or the second part of the problem i.e. estimating the
image center we need to find
three vanishing points on the image. The orthocenter of the triangle formed by these three
vanishing points is the image center.
Vanishing Point
: Because of perspective
projection, projections of parallel lines in three
dimensional spaces seem to meet at a point p on the image. This point p is called vanishing
point.
11
3
Three D
imensional
Virtual Battlefield to Train S
oldiers
3.1
Aim
The main aim of this project is
to
facilitate the training process of soldiers in the army.
Currently a huge amount of money is wasted to develop training grounds for soldiers and
on ammunitions required during their training. Still when
a trainee
is training on these
grounds he did not get
a real life battle experience and even if the training is done
in a way
to get experience of real battle
the chances of trainee
getting injured
is more owing to his
inexperience.
The trainee
does not get to learn how to apply team work during a battle jus
t
by training on these grounds. There is no proper feedback to th
e trainee
as by how much he
missed his target
,
was this miss closer to the target then the last miss and whether he is
progressing well or not. Imagine the additional cost required to change
stationary targets to
targets that are moving as in real battles the targets won’t be sitting still waiting for you to
aim properly and hit them. Proper calculations and feedback of the hit including time frame
(like for example he should have hit a little
forward before 5 seconds) is almost impossible
in these situations.
Imagine a scenario where
trainee
goes to a simple training room
picks up a head mounted
display and a simple gun with just a camera mounted on it. Upon wearing the head
mounted display he
imagines himself in an actual battle field where enemies are moving
and trying to hit him. He then moves himself and tries to hit the enemy and finally gets
feedbacks on how well he performed, his performance comparison by his last training data,
how many
targets were he able to shoot, how many he missed and by how much distance
and if this would have been a real battle how many times he would be shot by the enemy
and have been dead. Further adding to this how good is it that soldiers of same battalion go
to a simple room
,
all of them wear a head mounted display
and pick
a camera mounted gun
and
visualize themselves
sitting on a battlefield and fighting the enemy using a team
strategy.
This is what we intend to do. Summarizing the whole in few technical
words

Given either a
three dimensional simulation of a model or a two dimensional
projection of the model if the
user
hits at a point on the two dimensional
projection from a given position at a given
rotation matrix with respect to the world coordinate
system we want to fin
d out the exact
point on the three dimensional
model where hit occurs
.
12
3.2
Terminologies Used
Virtual Space
/ Virtual World
: This is the space where model of battle field is defined. Model
can be defined either in opengl or in any other
language. For this project we are assuming
that it is defined in opengl.
Virtual Coordinate System:
This is the coordinate system in opengl with respect to which the
whole battlefield is defined.
Point of Projection
: This is the point in the virtual space
from where we are taking a
snapshot of the model which is displayed by the projector. In other words this is the point
from where we are doing gnulookat.
Real Space
/ Real World
: This is the space where
trainee
is actually standing. The snapshots
taken by
the camera will be of things which exist in this world.
Real Coordinate System
:
This is the coordinate system defined in the real space and location
of every point in the space will be measure with respect to this coordinate system.
Projector Screen
:
This
is the screen where projector is projecting a snapshot of the virtual
space. The traine
e does his training by seeing this snapshot only.
Trainee
:
The soldier who is here to train.
Trainee
’s
Camera
:
This is the camera mounted on the gun used by the trainer. The optical
axis of this camera shows the direction in which bullet will move in when the trigger is
pressed.
Trainee’s Real World Position:
This is a point defined in real world coordinate syste
m
showing the position where
trainee or more precisely
camera of trainee
’s gun
is lo
cated.
Trainee’
s Virtual World Position:
This is a point in virtual world coordinate system
corresponding to the trainee’s real world position. In other words trainee stand
ing at a
particular position in real world will on his head mounted display see himself standing on
this corresponding virtual world point.
Calibration Points:
These a
re the points on the real world with known position which are
used for camera calibration
and hence locating the current position of trainee.
Scaling Factor
(k)
:
This is the factor by which the
distances in
real world and virtual world
are related.
For example if the trainee moves by a distance of 5 meters in the real world
then the trainee i
n virtual world will move by a distance of 5k meters.
13
3.3
Assumptions and Validations
For the current sta
ge we are working with a projector rather than a head mounted display.
This work can be further extended to work for a head mounted display.
a)
Trainee
visualizes the three dimensional virtual model on the basis of the projected
image and then takes his shot.
This assumption can be validated by considering the following scenario:
The virtual model consists of two poles Q and R and the projector projects t
hem as Q1 and
R1 on the screen.
Figure 3.1: Image showing the projector screen for the scenario where there are two poles
Q and R on the virtual model.
14
Figure 3.2
The above picture is taken when the trainee1 is asked to hit at Q and he presses the tr
igger
of his gun.
For this case the snapshot taken by the trainee’s camera will look like
Figure 3.3
15
Now suppose the trainee did not visualize the three dimensional model then we would have
reported the point of hit as Q which is the inverse projection
of Q1. But this is not the case
as we can clearly see from figure
3.2 that the trainee has hit point R and hence R should be
reported as the point of hit not Q which so seems by figure 3.3. Also this needs to be
extended to work for a head mounted display
where user need not visualize the three
dimensional model and hence this assumption
is correct
.
b)
Projector screen is little far away from the projector.
This assumption can be verified by considering the same scenario where virtual world has
two poles Q and
R projected as Q1 and R1 on the screen
and a condition like the one in
figure below happens when the user hits
.
Figure 3.4 showing a snapshot of the whole setup when the trainee presses the trigger
In this case the trainee is asked to shoot at point
Q and he visualizes the model and hits the
correct point. But as the line joining trai
nee1 and Q does not pass through
the projective
screen we might not get any calibration point and hence
will be
unable to perform further
calculations. To avoid this situ
ation we need to have this assumption as we are having
calibrated points only on projective screen and we want them for performing our
calculations. Also in the final product the whole room will be dedicated for training purpose
and hence we can have calib
ration pattern on all these walls and hence condition for this
assumption will already be satisfied.
16
c)
Position of user in real space coincides with that of the camera on his gun.
This assumption can clearly be made as we are
currently
not interested in how
the user is
standing;
we just
want to calculate the point of hit.
d)
The projector is fixed and is projecting the snapshot from one position only
This assumption is taken because if we make the projector free to move then it will
unnecessary complicate the c
urrent stage which is not at all needed in future stages
because the display will be done by head mounted display. Also we are projecting from only
one position because if we will project from different positions then it makes sense only if
we are projecti
ng from the trainee’s virtual world position
and if we do that then there will
be problems for multiple users to use it.
e) It is safe to assume that the scale actor is 1 for now as if it is not 1 the user has to
visualize the scaling also while taking a
hit which would make things even more complicate
for him.
3.4
Pr
ocedure for development of the p
roject
The whole procedure of development can be divided into two parts:
1)
The development of virtual space
: This part mainly includes the
development of
ba
ttlefield in opengl or in any other model. This can further include additional time
frame to enable the battlefield to change with time as in real battle things wont be
stationary and hence to provide real battle training we should have targets that are
mo
ving.
2)
Interaction with virtual model
: The soldier will be training in the real world and hence
we should develop method to set relation between the snapshots taken by the camera
and the movements of user in the virtual world. This part can be further divid
ed into
four sub sections.
a)
Camera Calibration
: This includes determining the camera parameters from the
snapshots taken by the camera. As discussed
in
second problem this section can be
further divided into two subsections. First dealing with the identifi
cation of
calibrating points by image processing techniques and other estimating the camera
rotation and translation vectors using these calibrating points with known location in
the space.
b)
Finding the point and direction in virtual world corresponding to
a point and
direction in real world
: By camera calibration techniques we can determine the
location of trainee and direction of shot with respect to the real world coordinate
17
system.
We know the point and direction in the virtual world corresponding to th
e
location of projector in the real world (which corresponds to the point in the virtual
world from where we are doing gnulookat). Now suppose the z axis of camera
(optical axis or direction in which trainee shot) makes an angle α with the z axis of
the pr
ojector (direction perpendicular to the projector axis passing through the
projector) then in the vir
tual world
also the direction
of hit will make an angle α with
the direction at which projection is taken. Also the
distances between trainee’s
location an
d projector location along
x, y
and z axes will be multiplied by scale factor
k to give corresponding distances in the virtual world and hence the corresponding
point can be calculated. The figure below
explains this concept.
The left hand side of the figure shows scenario at the real world and the right hand
side of the figure shows the scenario at the virtual world. We know that p and p’ are
corresponding points. Now using camera calibration techniques we have already
calcula
ted the location of the trainee and using that we can calculate X, Y, Z and
a
(the angle).Note that Y is not shown in the figure as the figure is two dimensional.
With these information and known value of
s
(the scaling factor) we can calculate
18
sX
,
sY
,
sZ
and
a
for the virtual world; w
ith these values we can calculate the location
of T’ in the virtual world and the direction of shot.
c)
Computing the point of hit
: Once we know the direction of shot the first point where
it intersects the virtual model gives th
e point of hit.
d)
Computing the accuracy of shot
:
Knowing the point of hit and target we can easily
calculate the accuracy of the shot. Things here become complicated when we will
involve a time frame and giving more detailed feedback to user for moving obje
cts
like if you had shot in this direction 5 seconds later there would have been a hit.
The flow chart below summarizes the procedure.
Image Processing
techniques to locate
calibrating points
Computation of the
camera parameters
Camera
Calibration
Setting the
Correspondence
Computing
the Point of
Hit
Computing the
accuracy of
the shot
Development of
battlefield model in
opengl
Interaction with Virtual
Model
Development
Procedure
Development of
Virtual Space
19
3.4
Explanation of working through an example
Let us consider an example to show the complete flow of the project. The virtual model consists of
two poles Q and R projected by the projector as Q’ and R’. The figure below shows the
scenario
when the trainee1 presses his trigger.
The flow chart below shows the complete flow of the working of
this example.
20
The location of trainee and
direction of shot w.r.t.
projector is defined
The corresponding point
and direction in
the virtual
space is computed
The point of hit is
computed
The user is given feedback
of accuracy of his shot.
By methods of camera
calibration camera
parameters calculated
The image is processes to
locate the calibrating
points
Trainee1
presses the
trigger
A snapshot is taken by the
camera
21
3.5
Further additions that can be done
1)
Replacing the
projector by a head mounted system:
This will obviously a better case as
the trainee then need not visualize
anything and final aim of our project
2)
Adding body movements
:
Trainee can be asked to wear a multiple sensor suit and then
his body movements can be tracked. Many additional features can be added some of
which include:
a)
The user can see his movement in the
virtual model and hence more realistic training can be
done.
b)
Features like
the trainee needs to protect himself from the bullets of the enemies and at the
same time kill them can be added.
c)
Features like the trainee needs to infiltrate enemy base without l
etting the enemy notice can
be added.
4
Conclusion
Building virtual interaction systems is not a new concept and many projects on the same
have already been done. But this project will help Indian army to not only provide their
soldiers with training
environment close to a real battle but will also save the huge amount
of money required for training.
Acknowledgement
I would like to thank Prof. Sharat Chandran for devoting his time and efforts to provide me
with vital directions to work in the field.
It would have been difficult without his continued
support.
Nekhil Agrawal
Comments 0
Log in to post a comment